Text Mining for the Rest of Us
2.3. Mining the Communal Wisdom
Mining the "Communal Wisdom"
Communal Knowledge - is the knowledge and learning shared by the members of some community or population. This might be a country, a culture, an academic or professional speciality, etc. The sharing of knowledge by some community is part of the social fabric that helps the group in progressing, advancing, and improving.
Historically, the recording, transmission, and sharing of knowledge has been through "the literature." This means, not fiction or recreational reading, but the published record of information.
In the past, this record has involved channels such as the scientific, scholarly, academic, legal, or governmental publication of information. Those are the kinds of venues that make up the public record. But in the new Internet and Web environment, this has changed into the online record of information, of discovery, of facts and data.
This isn't the classic environment of academic publication, with review and comment by qualified specialists, and the opportunity for accuracy and experimental or research review by later investigators. However, the search engines most of us depend upon to quickly find information on the Web are contributing a rough equivalent to the old process of review.
Search engines are really quality checkers - A search engine such as Google uses algorithms or formulas to calculate and assign the importance or relevance of a Web page. Google uses factors like the words and phrases in the text, the use of metatags, page titles and headlines, etc. But they also cleverly use the record of how many other Web sites link to the information on a page. This is a rough equivalent to peer or colleague review of the content value and quality. The display ranking of a Google search report gives a much higher display position to a link of a page about topic xyz if it has links from 100 other Web sites than they will to another page that has, say, only 3 links from other sites.
Thus, the first page may appear on page 1 or 2 of the Google site list, while the second falls down to page 7. This is really a "quality check," based on the fact that a great number of other Web sites consider the information on page xyz to be worth creating a link or reference to direct their readers to it. In other words, it's roughly the equivalent of a reference or citation in an academic paper.