Faculty of Information Technology

print version

This unit entry is for students who completed this unit in 2016 only. For students planning to study the unit, please refer to the unit indexes in the the current edition of the Handbook. If you have any queries contact the managing faculty for your course or area of study.

Monash University

6 points, SCA Band 2, 0.125 EFTSL

Postgraduate - Unit

Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.


Information Technology



  • Second semester 2016 (Day)

Monash Online

  • Teaching Period 3 2016 (Online)
  • Teaching Period 6 2016 (Online)


Monash Online offerings are only available to students enrolled in the Graduate Diploma in Data ScienceGraduate Diploma in Data Science (http://online.monash.edu/course/graduate-diploma-data-science/?Access_Code=MON-GDDS-SEO2&utm_source=seo2&utm_medium=referral&utm_campaign=MON-GDDS-SEO2) via Monash Online.


This unit introduces tools and techniques for data wrangling. It will cover the problems that prevent raw data from being effectively used in analysis and the data cleansing and pre-processing tasks that prepare it for analytics. These include, for example, the handling of bad and missing data, data integration and initial feature selection. It will also introduce text mining and web analytics. Python and the Pandas environment will be used for implementation.


At the completion of this unit, students should be able to:

  1. describe different data-types, formats and structures and demonstrate the methods for their handling;
  2. perform data pre-processing tasks such as data cleansing and initial feature engineering;
  3. perform exploratory analysis and calculate descriptive statistics, identify and analyse data quality issues;
  4. demonstrate familiarity with a variety of data sources and integration methods thereof;
  5. process natural language text to produce linguistic artifacts for analysis;
  6. program a variety of data pre-processing tasks and integrate them into a data wrangling pipeline.


In-semester assessment: 100%

Workload requirements

Minimum total expected workload equals 144 hours per semester comprising:

(a.) Contact hours for on-campus students:

  • Two hours/week lectures
  • Two hours/week laboratories

(b.) Contact hours for Monash Online students:

  • Two hours/week online group sessions
  • Online students generally do not attend lecture, tutorial and laboratory sessions, however should plan to spend equivalent time working through resources and participating in discussions.

(c.) Additional requirements (all students):

  • A minimum of 8 hours per week of personal study (22 hours per week for Monash Online students) for completing lab/tutorial activities, assignments, private study and revision, and for online students, participating in discussions.

See also Unit timetable information

Chief examiner(s)


(FIT5131 or FIT9131) and (FIT5132 or FIT9132)

Additional information on this unit is available from the faculty at: