FIT5212 - Data analysis for semi-structured data - 2019

6 points, SCA Band 2, 0.125 EFTSL

Postgraduate - Unit

Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.


Information Technology

Chief examiner(s)

Professor Wray Buntine

Not offered in 2019




Semi-structured data is one of the fastest growing kinds of data in both the public and private sector, for instance in health. Email collections with sender-recipient graphs, metadata and text content is one example. This unit will explore basic forms of semi-structured data: text, time-sequence data, graphs and multiple relations in a database. Basic machine learning algorithms for these kinds of data will be analysed and applied. Some characteristic industry problems for the application of semi-structured data will also be investigated such as cohort analysis and market-basket analysis.


At the completion of this unit, students should be able to:

  1. appraise what kinds of semi-structured data exist and the problems they present for analysis;
  2. analyse different kinds of algorithms for different kinds of semi-structured data;
  3. develop and modify some standard algorithms for semi-structured data;
  4. examine some characteristic industry problems involving semi-structured data, and analyse the suitability of different algorithms.


Examination (2 hours, plus 30 minutes reading and noting time): 50%; in-semester assessment: 50%

Workload requirements

Minimum total expected workload equals 12 hours per week comprising:

  • Two hours/week lectures
  • Two hours/week laboratories

A minimum of 8 hours per week of personal study (22 hours per week for Monash Online students) for completing lab/tutorial activities, assignments, private study and revision, and for online students, participating in discussions.

See also Unit timetable information