Mathematics of Machine Learning 2022
This is the main website for the Mathematics of Machine Learning course in the spring of 2022, as part of the bachelor of mathematics at the University of Amsterdam. Visit this page regularly for changes and updates.
Instructor: | Tim van Erven | (tim@ No spam, please timvanerven. No really, no spam nl) |
Teaching Assistants: | Jack Mayo | (j.j.mayo@ No spam, please uva. No really, no spam nl) |
Boris Lebedenko | (b.lebedenko@ No spam, please uva. No really, no spam nl) |
General Information
Machine learning is one of the fastest growing areas of science, with far-reaching applications. This course gives an overview of the main techniques and algorithms. The lectures introduce the definitions and main characteristics of machine learning algorithms from a coherent mathematical perspective. In the exercise classes the algorithms will be implemented in Python and applied to a selection of data sets.
Required Prior Knowledge
- Linear algebra, gradients, convexity
- Programming in Python
- Writing in LaTeX
Although mainly targeting mathematics students, the course should be accessible to other science students (AI, CS, physics, …) with an interest in mathematical foundations of machine learning.
Lectures and Exercise Sessions
Lecture hours TBA
Examination Form
The course grade consists of the following components:
- Homework assignments. H = Average of homework grades.
- Two exams: midterm and final. E = Average of the two exam grades.
The final grade is computed as \(\max\big\{\mathrm{\textbf{E}}, \frac{1}{3}\mathrm{\textbf{H}} + \frac{2}{3}\mathrm{\textbf{E}}\big\}\), rounded.
Course Materials
The main book for the course is The Elements of Statistical Learning (ESL), 2nd edition, by Hastie, Tibshirani and Friedman, Springer-Verlag 2009. In addition, we will use selected parts from Ch.18 of Computer Age Statistical Inference: Algorithms, Evidence and Data Science (CASI) by Efron and Hastie, Cambridge University Press, 2016. Some supplementary material will also be provided, as listed in the Course Schedule.
Both books are freely available online, but it is recommended to buy a paper copy of the ESL book, because you will need to study many of its chapters. The standard edition of ESL is hard cover, but there also exists a much cheaper soft-cover edition for €24.99. To get the cheaper offer, open this link from inside the university network.
About using Wikipedia and other online sources: trust Wikipedia as much as you would trust a fellow student who is also still learning. Some things are good, but other things are poorly explained or plain wrong, so always verify with a trusted source (a book or scientific paper). This holds doubly for any ‘data science’ blogs you might find online.
Course Schedule
This schedule is subject to change during the course. TBA=To Be Announced.
Date | Topics | Literature |
---|---|---|
Week 1 | Supervised learning intro: classification and regression (overfitting 1), linear regression for classification (overfitting 2), nearest neighbor classification (overfitting 3). Curse of dimensionality. Bias-Variance decomposition (overfitting 4). |
Ch. 1.
Sect. 2.1, 2.2, 2.3. |
Week 2 |
Statistical decision theory:
expected prediction error (overfitting 5),
Bayes-optimal prediction rule.
Empirical Risk Minimization. Interpretations of least squares as ERM and as maximum likelihood. Cross-validation. |
Sect. 2.4, 2.5.
Sect. 7.10.1, 7.10.2; optionally: 7.12. |
Week 3 | Model selection for regression I: best-subset selection, shrinkage methods (ridge regression and lasso). | Sect. 3.1, 3.2 (up to 3.2.1), 3.3, 3.4 (up to 3.4.2). |
Week 4 |
Model selection for regression II:
comparison of best-subset/ridge/lasso.
Bayesian methods in a nutshell: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession, Bayes MAP interpretation of ridge regression and lasso. |
Sect. 3.4.3. |
Week 5 |
Plug-in estimators.
Naive Bayes classifier, with application to spam filtering. Linear discriminant analysis (LDA). |
Sect. 6.6.3; optionally: Wikipedia on Naive Bayes [1, 2] (see Wikipedia caveat).
Sect. 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3). |
Week 6 |
Surrogate losses.
Logistic regression. |
Sect. 4.4 (except 4.4.3). |
Week 7 |
Discriminative vs. generative models:
naive Bayes versus logistic regression.
Q&A session. |
Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NeurIPS 2001. |
Week 8 | Midterm Exam. | |
Week 9 | SVMs I: Optimal separating hyperplane, support vector machine (SVM), SVM learning as regularized hinge loss fitting. | Sect. 4.5.2, 12.2, 12.3.2. |
Week 10 | SVMs II: dual formulation, kernel trick. | Sect. 12.3.1. |
Week 11 | Decision trees for classification and regression. | Sect. 9.2. |
Week 12 |
Bagging and random forests.
Boosting (AdaBoost), boosting as forward stagewise additive modeling. |
Sect. 8.7, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (in 10.6 only the part about classification). |
Week 13 |
Unsupervised learning: K-means clustering.
Stochastic Optimization. |
Sect. 14.3 before 14.3.1; Sect. 14.3.6. NB. The book gives the wrong definition for K-means in Sect. 14.3.6, see erratum.
Handout about stochastic optimization. |
Week 14 | Neural networks/deep learning I. | From Ch. 18 of the CASI book: chapter intro, Sect. 18.1, 18.4 before 'Convolve Layer'. (The remainder of Sect. 18.4 is optional, but highly recommended.). |
Week 15 |
Neural networks/deep learning II:
gradient descent with backpropagation.
Q&A session. |
From Ch. 18 of the CASI book: Sect. 18.2 (except accelerated gradient methods). |
Week 16 | Final Exam. |
Homework Assignments
The homework assignments will be made available here.