COT5230

Data mining

R Redpath

6 points - One 2-hour lecture and one 2-hour tutorial per week - Second semester - Caulfield - Prerequisites: COT4230 and COT4300 or COT4330 or equivalent level of knowledge (While not a formal prerequisite, CSC3200 or CSE3320 would provide a useful theoretical background.)

Objectives To develop student knowledge of the techniques and methods for data exploration in large databases, both those currently being used and those which are presently being researched. For students to become familiar with the currently available techniques for the extraction of information from large databases.

Synopsis This subject will study the application of database and semantics, information filtering and pattern recognition techniques for the exploration of data in databases. Tools such as Kohonen filtering, minimum message length classification and genetic algorithms will be examined. Statistical methods such as moment measures, multiple regression, significance testing and harmonic analysis will be studied. Data quality will be considered. The expression of rules using a logic representation language will be coupled with a study of basic linguistic semantics in order to quantify user research dialogues. The overall thrust of this subject is a practical one drawing on current theory. Extensive practical work using the above techniques will be undertaken. Visualisation of database relationships and of retrieved information will complete the study.

Assessment Five assignments (each 20%): 100%

Recommended texts

Fayyad U M and others (eds) Advances in knowledge discovery and data mining MIT Press, 1996
Adriaans P and Zantinge D Data mining Addison-Wesley, 1996
Bigus J P Data mining with neural networks: Solving business problems, from application development to decision support McGraw-Hill, 1996

Back to the 1999 Information Technology Handbooks