units

FIT2086

Faculty of Information Technology

Undergraduate - UnitFIT2086 - Modelling for data analysis

This unit entry is for students who completed this unit in 2015 only. For students planning to study the unit, please refer to the unit indexes in the the current edition of the Handbook. If you have any queries contact the managing faculty for your course or area of study.

6 points, SCA Band 2, 0.125 EFTSL

Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.

 Level Undergraduate Faculty Faculty of Information Technology Offered Not offered in 2015

Synopsis

This unit explores the statistical modelling foundations that underlie the analytic aspects of Data Science. It covers:

• Data: collection and sampling, data quality.
• Analytic tasks: statistical hypothesis testing, introductory decision theory, exploratory and confirmatory analysis.
• Probability distributions: multivariate Gaussian, Poisson, Dirichlet, linear and logistic regression, random number generation and simulation of distributions, simulation of samples (bootstrap). Estimation: parameter and function estimation, maximum likelihood and minimum cost estimators, Monte Carlo estimators, inverse probabilities and Bayes theorem, bias versus variance and sample size effects, cross validation.
• Information Theory: information and entropy, data coding and compression, entropy and likelihood, relative entropy and correlation, bounds and limits.
• Dependence models: Markov model, Bayesian and Markov network, log-linear model.
• Modelling: hypothesis testing, inference, and optimal decisions, predictive versus generative modelling, experts and assessing probabilities and models.

Outcomes

On successful completion of this unit, students should be able to:

• compare the general roles of exploratory, confirmatory and decision analysis as applied to data;
• explain how the source and providence of data affects analysis;
• summarise the role of domain experts in supporting analysis and the difficulties they may have;
• implement a computational model for statistical analysis of simple problems and construct an evaluation methodology for the results;
• compute statistical factors and diagnostics on simple problems such as entropy, likelihood, correlation, and independence;
• interpret the challenges involved in estimation from data, and implement the methods used on simple problems;
• describe basic methods of random sampling, simulation, and hypothesis testing.

Assessment

Examination (3 hours): 60%; In-semester assessment: 40%

Minimum total expected workload equals 12 hours per week comprising:

1. Contact hours for on-campus students:
• Two hours lectures
• Two hours laboratories

1. Additional requirements (all students):
• A minimum of 8 hours of personal study per week for completing lab/tutorial activities, assignments, private study and revision.