This is the main website for the Statistical Learning course in autumn 2014, as part of the Master Track Statistical Science for the Life and Behavioural Sciences at Leiden University. Visit this page regularly for changes and updates.
This course gives an overview of techniques to automatically learn the structure, patterns and regularities in complicated data, and to use these patterns to predict future data. Statistical learning is very similar to an area within computer science called machine learning, since many methods have their origin in computer science (pattern recognition, artificial intelligence).
|Instructors:||prof. dr. Peter Grünwald (pdg at No spam, please cwi dot No really, no spam nl)|
|dr. Tim van Erven (tim at No spam, please timvanerven dot No really, no spam nl)|
|Contact:||For general questions, send e-mail to either Peter or Tim. For questions about the homework, send e-mail to Tim.|
- General Information
- Lectures and Exercise Sessions
- Examination Form
- Course Materials
- Course Schedule
- Homework Assignments
The course load is 4 ECTS. The studiegids contains a longer course description.
Lectures take place on Mondays on the dates indicated in the Course Schedule below, in room 409 of the Snellius Building, Niels Bohrweg 1, Leiden.
The first two weeks (Nov. 3 and Nov. 10), course hours are 10h00-16h30. The rest of the weeks they are 11h15-15h30.
In order to pass the course, one must obtain a sufficient grade (6 or higher) on both of the following two:
- Homework Projects. We will hand out two homework assignments. The final homework grade will be determined as an average of the grades for the two assignments.
- A written open-book examination: Wed. Jan. 7, 14-17h, room 412; resit: Fri. Feb. 13, 14-17h, room 412. Here you can find an example of examination questions.
The final grade will be determined as the average of the final homework grade and the final open-book examination.
We will use various chapters of The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer-Verlag 2009. The book can be downloaded for free at the above link. In addition, some supplementary material will be provided, as listed in the Course Schedule. Finally, it may also be useful to look at the Wikipedia pages (sic!) for Naive Bayes classification.
|Nov. 3: Introduction||General introduction: statistical learning, supervised learning, regression and classification, incorporating nonlinearities by extending the features, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule||All of Chapter 1 and parts of Chapter 2 (Sections 2.1-2.4)|
|Nov. 10: Regression, part I||Linear regression: least squares. 3 interpretations of least squares: as ERM, maximum likelihood and orthogonal projection. Computation of least squares estimate. Bias-variance decomposition for squared error loss.||Section 2.5 of Chapter 2, Sections 3.1, 3.2.1 of Chapter 3|
|Nov. 17: Regression Part II||Model selection and overfitting: subset selection, cross-validation, shrinkage methods (ridge regression and lasso)||Sections 3.1, 3.2.1., 3.3, 3.4.1, 3.4.2, 3.4.3 and Sections 7.10.1, 7.10.2|
|Nov. 24: Regression Part III and Classification Part I||Mini-Bayes refresher: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession.Regression: Bayes MAP interpretation of Ridge Regression and Lasso.Problems with Least Squares for classification; Linear Discrimant Analysis (LDA)||Sections 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3), and 4.4 (except 4.4.3)|
|Dec. 1: Classification Part II||Logistic Regression, Expected Prediction Error: 0/1 vs. logarithmic. Discriminative vs. generative models; Three Approaches to Modeling; special status of log loss and squared loss functions.||Sections 4.5.2, 6.6.3, 12.2, 12.3.1, 12.3.2, Additional handouts: table of the three approaches and explanatory notes|
|Dec. 8: Classification Part III||Naive Bayes classifier; Naive Bayes and Logistic Regression; Naive Bayes and spam filtering; Optimal Separating Hyperplanes; Support Vector Machines; the Kernel Trick; SVM learning as regularized hinge loss fitting.||Additional literature: Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NIPS 2001|
|Dec. 15: Classification Part IV||Classification and regression trees, bagging, boosting (AdaBoost), boosting as forward stagewise additive modeling||Sections 9.2, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (only the part about classification)|
|Dec. 22: Model Assessment and Selection, Unsupervised Learning||Model assessment: AIC, BIC, Bayes Factors, In-Sample Prediction Error, Model AveragingClustering: K-means, EM with Gaussian Mixtures||Sections 7.4-7.7, see especially last paragraph “BIC is asymptotically consistent”. Section 14.3 before 14.3.1; Sections 14.3.6, 14.3.7. NB. The book gives the wrong definition for K-means in Section 14.3.6; see the definition on Wikipedia instead.|
The homework assignments will be made available here. You are encouraged to discuss the assignments, but everyone has to write a report individually.
|Homework 1||housing data, description||December 7|
|Homework 2||The congressional voting records data set from the UCI Machine Learning Repository||Postponed to January 21|