Data Mining

Almost all workplaces, including business, industry, government and medical institutes, collect data. But what does this mass of data tell us? This course introduces techniques for extracting valid and useful information from data with the emphasis on their statistical heritage and properties. The strengths and weaknesses of widely used data mining techniques will be examined through illustrative examples.

Objectives1. To provide students with an understanding where and why data mining is used.
2. To provide the theoretical and practical tools required to analyse data with a view to mining information for decision making.
3. To develop skills in learning data mining methods and concepts by reading, discussion and practical computations.
ContentTopics include:
Data organisation, selection, cleaning and quality.
Differences between experimental and observational data.
Data reduction techniques:
a) principal components
b) factor analysis
c) canonical correlation analysis
Data mining tools:
a) classification type - using logistic regression, decision trees and neural networks.
b) segmentation type - using cluster analysis, k-means and related methods.
c) dependency modelling - using association rules, regression and graphical models.
Assessment of findings:
a) validation through new data and cross-validation.
b) updating information with new data.
c) combining results from different methods.
Assumed KnowledgeIntroductory statistics and introductory regression.
