6 points, SCA Band 2, 0.125 EFTSL
Undergraduate - Unit
Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.
(or ), and one of ( , MAT2003, or )
This unit explores the statistical modelling foundations that underlie the analytic aspects of Data Science. It covers:
- Data: collection and sampling, data quality.
- Analytic tasks: statistical hypothesis testing, exploratory and confirmatory analysis.
- Probability distributions: dependence and independence, multivariate Gaussian, Poisson, Dirichlet, random number generation and simulation of distributions, simulation of samples (bootstrap).
- Predictive models: linear and logistic regression, and Bayesian classification.
- Estimation: parameter and function estimation, maximum likelihood and minimum cost estimators, Monte Carlo estimators, inverse probabilities and Bayes theorem, bias versus variance and sample size effects, cross validation, estimation of model performance.
On successful completion of this unit, students should be able to:
- perform exploratory data analysis with descriptive statistics on given datasets;
- construct models for inferential statistical analysis;
- produce models for predictive statistical analysis;
- perform fundamental random sampling, simulation and hypothesis testing for required scenarios;
- implement a model for data analysis through programming and scripting;
- interpret results for a variety of models.
Examination (2 hours): 50%; In-semester assessment: 50%
Minimum total expected workload equals 12 hours per week comprising:
- Contact hours for on-campus students:
- Two hours lectures
- Two hours studio
- Additional requirements (all students):
- A minimum of 8 hours of personal study per week for completing lab and project work, private study and revision.
See also Unit timetable information