Mathematics of Machine Learning 2022
This is the main website for the Mathematics of Machine Learning course in the spring of 2022, as part of the bachelor of mathematics at the University of Amsterdam. Visit this page regularly for changes and updates.
|Instructor:||Tim van Erven||(tim@ No spam, please timvanerven. No really, no spam nl)|
|Teaching Assistants:||Jack Mayo||(j.j.mayo@ No spam, please uva. No really, no spam nl)|
|Bharti Bharti||(b.bharti@ No spam, please uva. No really, no spam nl)|
Machine learning is one of the fastest growing areas of science, with far-reaching applications. This course gives an overview of the main techniques and algorithms. The lectures introduce the definitions and main characteristics of machine learning algorithms from a coherent mathematical perspective. In the exercise classes the algorithms will be implemented in Python and applied to a selection of data sets.
We will use Canvas for announcements, grades and submitting homework. There is also a Zoom link there for those who cannot attend in person, and there will be typed lecture notes.
Required Prior Knowledge
- Linear algebra, gradients, convexity
- Programming in Python
- Writing in LaTeX
Although mainly targeting mathematics students, the course should be accessible to other science students (AI, CS, physics, …) with an interest in mathematical foundations of machine learning.
Lectures and Exercise Sessions
- Weekly lectures on Tuesdays (for rooms, see Course Schedule):
- weeks 6-12 from 9h00-11h00
- weeks 14-21 except week 18 from 11h00-13h00
- Weekly exercise classes on Fridays, except for the first week and
holidays (room SP G3.10):
- weeks 7-12 from 15h00-17h00
- weeks 14-21 (except April 15, May 6 and May 27) from 9h00-11h00
The course grade consists of the following components:
- Homework assignments. H = Average of homework grades.
- Two exams: midterm (M) and final (F).
The final grade is computed as 0.3H + 0.3M + 0.4F, rounded.
Exams (closed book):
- Midterm: March 29, 13h00-15h00 in room REC A1.02 (Roeterseiland)
- Final exam: June 2, 18h00-21h00 in room SP H0.08
- Resit exam: July 8, 9h00-12h00 in room SP A1.06
The midterm will be about the first half of the course. The final exam will only be about the second half of the course. The resit exam (R) will cover both halves; it will replace both the midterm and the final exam, with final grade 0.3H + 0.7R. Both exams will be closed book, meaning that it is not allowed to use external resources during the exam.
The main book for the course is The Elements of Statistical Learning (ESL), 2nd edition, by Hastie, Tibshirani and Friedman, Springer-Verlag 2009. In addition, we will use selected parts from Ch. 18 of Computer Age Statistical Inference: Algorithms, Evidence and Data Science (CASI) by Efron and Hastie, Cambridge University Press, 2016. Some supplementary material will also be provided, as listed in the Course Schedule.
Both books are freely available online, but it is recommended to buy a paper copy of the ESL book, because you will need to study many of its chapters. The standard edition of ESL is hard cover, but there also exists a much cheaper soft-cover edition for €24.99. To get the cheaper offer, open this link from inside the university network.
About using Wikipedia and other online sources: trust Wikipedia as much as you would trust a fellow student who is also still learning. Some things are good, but other things are poorly explained or plain wrong, so always verify with a trusted source (a book or scientific paper). This holds doubly for any ‘data science’ blogs you might find online.
This schedule is subject to change during the course. Literature marked ‘optional’ is recommended for background, but will not be tested on the exam. TBA=To Be Announced.
|Feb. 8, SP D1.115||Supervised learning intro: classification and regression (overfitting 1), linear regression for classification (overfitting 2), nearest neighbor classification (overfitting 3).||
Sect. 2.1, 2.2, 2.3.
|Feb. 15, SP D1.115||
Curse of dimensionality.
Statistical decision theory: expected prediction error (overfitting 4), Bayes-optimal prediction rule.
Empirical Risk Minimization. Interpretation of least squares as ERM.
Sect. 2.4, 2.5.
Sect. 7.10.1, 7.10.2; optionally: 7.12.
|Feb. 22, SP G3.10||Model selection for regression I: best-subset selection, ridge regression and lasso.||Sect. 3.1, 3.2 up to 3.2.1, 3.3, 3.4 up to 3.4.2.|
|Mar. 1, SP D1.115||Model selection for regression II: comparison of best-subset/ridge/lasso, ridge and lasso as shrinkage methods, bias-variance decomposition.||Sect. 3.4.3. From lecture: derivation of formulas in Table 3.4, bias-variance decomposition. Optional: Sect. 1-3.4 about subgradients from Boyd and Vandenberghe lecture notes.|
|Mar. 8, SP D1.115||
Linear discriminant analysis (LDA).
Naive Bayes classifier, with application to spam filtering.
Sect. 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3).
Sect. 6.6.3; optionally: Wikipedia on Naive Bayes [1, 2] (see Wikipedia caveat).
|Mar. 15, SP D1.115||
|Sect. 4.4 (except 4.4.3).|
|Mar. 22, SP G3.10||
Decision trees for classification and regression.
|Mar. 29||Midterm Exam.|
|Apr. 5, SP G3.02||
Bagging and random forests.
Boosting (AdaBoost), boosting as forward stagewise additive modeling.
|Sect. 8.7, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (in 10.6 only the part about classification).|
|Apr. 12, SP A1.04||Basis expansions: splines.||Sect. 5.1, 5.2 up to 5.2.2; optionally: 5.2.3.|
|Apr. 19, SP A1.04||SVMs I: Optimal separating hyperplane, support vector machine (SVM), SVM learning as regularized hinge loss fitting.||Sect. 4.5.2, 12.2, 12.3.2.|
|Apr. 26, SP D1.113||SVMs II: dual formulation, kernel trick.||Sect. 12.3.1. Optionally: Ch. 5 from Boyd and Vandenberghe book|
|May 3||Lecture-free Week|
|May 10, SP A1.04||
Unsupervised learning: K-means clustering.
Sect. 14.3 before 14.3.1; Sect. 14.3.6. NB. The book gives the wrong definition for K-means in Sect. 14.3.6, see erratum.
Handout about stochastic optimization.
|May 17, SP A1.04||Neural networks/deep learning I: gradient descent with backpropagation.||From Ch. 18 of the CASI book: chapter intro, Sect. 18.1, Sect. 18.2 (except accelerated gradient methods).|
|May 24, SP A1.04||
Neural networks/deep learning II:
convolutional layers, generalization and overfitting, double descent.
|From Ch. 18 of the CASI book: 18.4.
Slides about double descent.
|Jun. 2||Final Exam.|
The homework assignments will be made available here.
|Homework 1||-||18 Feb||24 Feb, 13h00|
|Homework 2||Homework2-start.ipynb||25 Feb||3 Mar, 13h00|
|Homework 3||-||2 Mar||10 Mar, 13h00|
|Homework 4||Homework4-start.ipynb||11 Mar||17 Mar, 13h00|
|Homework 5||-||15 Mar||24 Mar, 13h00|
|Homework 6||-||7 Apr||21 Apr, 13h00|
|Homework 7||-||22 Apr||28 Apr, 13h00|
|Homework 8||-||29 Apr||12 May, 13h00|
|Homework 9||-||12 May||19 May, 13h00|
|Homework 10||-||20 May||25 May, midnight|
Here is a list of references for advanced further reading. These are all optional: there will be no questions about them on the exam.
- Machine Learning Theory: I recommend the free book by Shalev-Shwartz and Ben-David, which we also use in the MasterMath course Machine Learning Theory
- Convex optimization: the free book by Boyd and Vandenberghe provides a very nice introduction. For a more extensive overview, see the free book by Bubeck.
- Deep learning: if you want to get up to date on the practice of deep learning, I recommend the Dive into Deep Learning interactive online book.