|Course code STAT3110||Units 10||Level 3000||Faculty of Science and Information TechnologySchool of Mathematical and Physical Sciences|
Almost all workplaces, including business, industry, government and medical institutes, collect data. But what does this mass of data tell us? This course introduces techniques for extracting valid and useful information from data with the emphasis on their statistical heritage and properties. The strengths and weaknesses of widely used data mining techniques will be examined through illustrative examples.
Enrolment For Distance Learning requires permission from the Head of the Discipline of Statistics.
Not available in 2015
|Objectives||1. To provide students with an understanding where and why data mining is used.|
2. To provide the theoretical and practical tools required to analyse data with a view to mining information for decision making.
3. To develop skills in learning data mining methods and concepts by reading, discussion and practical computations.
Data organisation, selection, cleaning and quality.
Differences between experimental and observational data.
Data reduction techniques:
a) principal components
b) factor analysis
c) canonical correlation analysis
Data mining tools:
a) classification type - using logistic regression, decision trees and neural networks.
b) segmentation type - using cluster analysis, k-means and related methods.
c) dependency modelling - using association rules, regression and graphical models.
Assessment of findings:
a) validation through new data and cross-validation.
b) updating information with new data.
c) combining results from different methods.
|Assumed Knowledge||Introductory statistics and introductory regression.|
|Modes of Delivery||Distance Learning : IT Based|
|Contact Hours||Computer Lab: for 2 hour(s) per Week for 13 weeks|
Lecture: for 2 hour(s) per Week for 13 weeks