FIT5212 - Data analysis for semi-structured data - 2019

6 points, SCA Band 2, 0.125 EFTSL

Postgraduate - Unit

Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.

Faculty

Information Technology

Chief examiner(s)

Professor Wray Buntine

Not offered in 2019

Prerequisites

FIT5197

Synopsis

Semi-structured data is one of the fastest growing kinds of data in both the public and private sector, for instance in health. Email collections with sender-recipient graphs, metadata and text content is one example. This unit will explore basic forms of semi-structured data: text, time-sequence data, graphs and multiple relations in a database. Basic machine learning algorithms for these kinds of data will be analysed and applied. Some characteristic industry problems for the application of semi-structured data will also be investigated such as cohort analysis and market-basket analysis.

Outcomes

At the completion of this unit, students should be able to:

  1. appraise what kinds of semi-structured data exist and the problems they present for analysis;
  2. analyse different kinds of algorithms for different kinds of semi-structured data;
  3. develop and modify some standard algorithms for semi-structured data;
  4. examine some characteristic industry problems involving semi-structured data, and analyse the suitability of different algorithms.

Assessment

Examination (2 hours, plus 30 minutes reading and noting time): 50%; in-semester assessment: 50%

Workload requirements

Minimum total expected workload equals 12 hours per week comprising:

  • Two hours/week lectures
  • Two hours/week laboratories

A minimum of 8 hours per week of personal study (22 hours per week for Monash Online students) for completing lab/tutorial activities, assignments, private study and revision, and for online students, participating in discussions.

See also Unit timetable information