Regression

On the right there is a RDF Wordal, using a text based on it. On the left is a mathematical formular expressed in the form of a flow chart to indicate an algorithm.

Lastly, regression is the last part of data mining that usually uses mathematical formulas; these are algorithms to determine the prediction or outcome of the results and to establish connections.[1] This is used for quant-numerical data, yet data is depicted and effectively used for conclusions and interpretations through the semantic web. Within the semantic web are web agents, one of these tools is called Armadillo which exhibit their findings on a Resource Description Framework (RDF.) Despite this useful tool, humanities resources are equipped to function without it. This does not mean that they are equipped to deal with the ‘black box’ problem in data mining.[2] Unfortunately, this means that some output data does not correspond with the input data presenting unsatisfactory results. In some ways it is similar to the Bayesian system from the previous step of clustering, as the output data does not match the input data.

[1] Introduction to data mining, http://www.youtube.com/watch?v=_QH4oIOd9nc; consulted 13 April 2012

[2] Fabio Ciravegna, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie Mc Laughlin and Ravish Bhagdev, haystack, pp.67-78