Types of Data mining

Ngrams

Historian Ben Schmidt created a Ngram on Google, but like most historians, he found it difficult to subject much information from it. Ngrams present data on a graph format. The relationship of both axis in producing a linear sequence of data can visually be interpreted. From the rise and falls of the information displayed on the graph, obvious correlations can be seen, but further interpretation seems quite difficult to deduce.[1] Nevertheless, more than one variable at a time can be analysed and the data presented can display a direct comparison or similarity between the lines marking different word on a graph. For these reasons, Ngrams would not be entirely useful for historians. The vertical axis has always displayed the year; therefore in searching a specific question for example, when was there depression in America? The Phrase ‘Depression’ cross referenced against ‘America’ would show the words as separate interties, counting how many times they were mentioned rather than the significance the words have together. On the other hand, Mathew Hurst analysed it from a language perspective, he accumulated data on the words themselves rather than correlating them visually. He compared words from different versions of English, American English and British English, to see how they had changed over the years, which was hardly at all. Subsequently, he compared the same word but by one beginning with a capital letter and the other one with a lower case letter. From the results he gathered, the words with capital letters were used at the start of the sentence and thus noted more than those in lower cases. [2] Mathew Hurst enjoyed coming up with these types of conclusions and seemed quite excited about the topic.

[1] Sapping Attention, http://sappingattention.blogspot.co.uk/; consulted 18 April 2012

[2] Data Mining: Text Mining, Visualization and Social Media http://datamining.typepad.com/data_mining/2010/12/more-thoughts-on-google-books-ngrams.html; consulted 13 April 2012