Faculty of Information Technology

Postgraduate - Unit

This unit entry is for students who completed this unit in 2015 only. For students planning to study the unit, please refer to the unit indexes in the the current edition of the Handbook. If you have any queries contact the managing faculty for your course or area of study.

print version

6 points, SCA Band 2, 0.125 EFTSL

Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.

FacultyFaculty of Information Technology
OfferedNot offered in 2015


Monash Online offerings are only available to students enrolled in the Graduate Diploma in Data ScienceGraduate Diploma in Data Science (http://online.monash.edu/course/graduate-diploma-data-science/?Access_Code=MON-GDDS-SEO2&utm_source=seo2&utm_medium=referral&utm_campaign=MON-GDDS-SEO2) via Monash Online.


This unit introduces tools and techniques for data wrangling. It will cover different data types and formats, data cleansing and pre-processing for analytics, for instance the handling of bad data, data structures used for efficient analytics, and efficient data storage. It will also cover SQL and NOSQL in-memory distribution, text mining, web analytics. Basic NLP Python will be used for implementation and case studies will be drawn from industry.


On successful completion of this unit, it is expected that student should be able to:

  1. describe different data-types, formats and structures;
  2. demonstrate pre-processing (cleansing) data for advanced analytics;
  3. programme the use of distributed and in-memory SQL and NOSQL and streaming;
  4. demonstrate how to handle multiple, disparate data types and combine them for analysis;
  5. demonstrate how to pre-process text data and web analytics and produce linguistic artifacts for analysis;
  6. integrate a variety of tasks into a data wrangling pipeline.


In-semester assessment: 100%

Workload requirements

Minimum total expected workload equals 144 hours per semester comprising:

(a.) Contact hours for on-campus students:

  • Two hours/week lectures
  • Two hours/week laboratories

(b.) Contact hours for Monash Online students:

  • Two hours/week online group sessions
  • Online students generally do not attend lecture, tutorial and laboratory sessions, however should plan to spend equivalent time working through resources and participating in discussions.

(c.) Additional requirements (all students):

  • A minimum of 8 hours per week of personal study (22 hours per week for Monash Online students) for completing lab/tutorial activities, assignments, private study and revision, and for online students, participating in discussions.

See also Unit timetable information


(FIT5131 or FIT9131) and (FIT5132 or FIT9132)
Students should have a basic knowledge of Python.