Conclusions
To conclude, data mining is mixture of classification, clustering and regression. Classification organises the data which can be done through a number of methods but in particular for text mining, decision tree or Naïve Bayesian would be appropriate. When the data is passed down through clustering it is grouped through systems such as K-mean, and lastly, algorithms use mathematical methods during regression to correlates the data, for future predictions. Ngram may be considered as progressing to open new questions, yet most scholars can state that it is limited in what information can be extracted from it. In comparison, Topic Modeling has been credited as a well known form of classification. It has the potential to develop the way we perceive and form new correlations between data. However, this is not the case for all historians; military historians would argue, ‘this ontology does not represent their knowledge.’[1] This shows that categorised data cannot replace the deeper meaning that historians depict when analysing documents, rather than forming parallels.
[1] Fabio Ciravegna, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie Mc Laughlin and Ravish Bhagdev, haystack, p. 72