This unit explores the statistical modelling foundations that underlie the analytic aspects of Data Science. It covers:
- Data: collection and sampling, data quality.
- Analytic tasks: statistical hypothesis testing, introductory decision theory, exploratory and confirmatory analysis.
- Probability distributions: multivariate Gaussian, Poisson, Dirichlet, linear and logistic regression, random number generation and simulation of distributions, simulation of samples (bootstrap). Estimation: parameter and function estimation, maximum likelihood and minimum cost estimators, Monte Carlo estimators, inverse probabilities and Bayes theorem, bias versus variance and sample size effects, cross validation.
- Information Theory: information and entropy, data coding and compression, entropy and likelihood, relative entropy and correlation, bounds and limits.
- Dependence models: Markov model, Bayesian and Markov network, log-linear model.
- Modelling: hypothesis testing, inference, and optimal decisions, predictive versus generative modelling, experts and assessing probabilities and models.
On successful completion of this unit, students should be able to:
- compare the general roles of exploratory, confirmatory and decision analysis as applied to data;
- explain how the source and providence of data affects analysis;
- summarise the role of domain experts in supporting analysis and the difficulties they may have;
- implement a computational model for statistical analysis of simple problems and construct an evaluation methodology for the results;
- compute statistical factors and diagnostics on simple problems such as entropy, likelihood, correlation, and independence;
- interpret the challenges involved in estimation from data, and implement the methods used on simple problems;
- describe basic methods of random sampling, simulation, and hypothesis testing.
Examination (3 hours): 60%; In-semester assessment: 40%
Minimum total expected workload equals 12 hours per week comprising:
- Contact hours for on-campus students:
- Two hours lectures
- Two hours laboratories
- Additional requirements (all students):
- A minimum of 8 hours of personal study per week for completing lab/tutorial activities, assignments, private study and revision.
See also Unit timetable information
MAT1830 and one of MAT1841, MAT2003 or MTH1030