6 points, SCA Band 2, 0.125 EFTSL
Undergraduate - Unit
Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.
In recent years the world has seen an explosion in the quantity and variety of data routinely recorded and analyzed by research and industry, prompting some social commentators to refer to this phenomenon as the rise of "big data," and the analysts and practitioners who investigate the data as "data scientists."
The data may come from a variety of sources, including scientific experiments and measurements, or may be recorded from human interactions such as browsing data or social networks on the Internet, mobile phone usage or financial transactions. Many companies too, are realising the value of their data for analysing customer behavior and preferences, recognising patterns of behaviour such as credit card usage or insurance claims to detect fraud, as well as more accurately evaluating risk and increasing profit.
In order to obtain insights from big data new analytical techniques are required by practitioners. These include computationally intensive and interactive approaches such as visualisation, clustering and data mining. The management and processing of large data sets requires the development of enhanced computational resources and new algorithms to work across distributed computers.
This unit will introduce students to the analysis and management of big data using current techniques and open source and proprietary software tools. Data and case studies will be drawn from diverse sources including health and informatics, life sciences, web traffic and social networking, business data including transactions, customer traffic, scientific research and experimental data. The general principles of analysis, investigation and reporting will be covered. Students will be encouraged to critically reflect on the data analysis process within their own domain of interest.
At the completion of this unit, students should be able to:
- demonstrate the ability to transform real world problems into ones that can then be solved using data analytics techniques;
- cleanse and prepare data for analysis;
- analyse large data sets using a range of statistical, graphical and machine-learning techniques;
- validate and critically assess the results of analysis;
- interpret the results of analysis and communicate these to a broad audience.
Examination (2 hours): 60%; In-semester assessment: 40%
Minimum total expected workload equals 12 hours per week comprising:
- Contact hours for on-campus students:
- Two hours of lectures
- One 2-hour laboratory
- Additional requirements (all students):
- A minimum of 8 hours independent study per week for completing lab and project work, private study and revision.
See also Unit timetable information
This unit applies to the following area(s) of study
FIT1006, ETC1000, FIT2086 or equivalent. (For example BUS1100, ETC1010, ETC2010, ETF2211, ETW1000, ETW1010, ETW1102, ETW2111, ETX1100, ETX2111, ETX2121, MAT1097, STA1010)