Home  /   Course Handbook  /  Data Mining (STAT3110)

Not available in 2013

Previously offered in 2007, 2005, 2004

Almost all workplaces, including business, industry, government and medical institutes, collect data. But what does this mass of data tell us? This course introduces techniques for extracting valid and useful information from data with the emphasis on their statistical heritage and properties. The strengths and weaknesses of widely used data mining techniques will be examined through illustrative examples.

Enrolment For Distance Learning requires permission from the Head of the Discipline of Statistics.

Objectives 1. To provide students with an understanding where and why data mining is used.
2. To provide the theoretical and practical tools required to analyse data with a view to mining information for decision making.
3. To develop skills in learning data mining methods and concepts by reading, discussion and practical computations.
Content Topics include:
Data organisation, selection, cleaning and quality.
Differences between experimental and observational data.
Data reduction techniques:
a) principal components
b) factor analysis
c) canonical correlation analysis
Data mining tools:
a) classification type - using logistic regression, decision trees and neural networks.
b) segmentation type - using cluster analysis, k-means and related methods.
c) dependency modelling - using association rules, regression and graphical models.
Assessment of findings:
a) validation through new data and cross-validation.
b) updating information with new data.
c) combining results from different methods.
Replacing Course(s) n/a
Transition n/a
Industrial Experience 0
Assumed Knowledge Introductory statistics and introductory regression.
Modes of Delivery Distance Learning : IT Based
Internal Mode
Teaching Methods Lecture
Computer Lab
Assessment Items
Essays / Written Assignments
Examination: Formal
Projects
Contact Hours Computer Lab: for 2 hour(s) per Week for 13 weeks
Lecture: for 2 hour(s) per Week for 13 weeks