Statistical Learning 2016

This is the main website for the Statistical Learning course in autumn 2016, as part of the master Statistical Science for the Life and Behavioural Sciences at Leiden University. Visit this page regularly for changes and updates.

Instructor:	Tim van Erven (tim@ No spam, please timvanerven. No really, no spam nl, for general questions)
Teaching assistant:	Kevin Duisters (k.l.w.duisters@ No spam, please math.leidenuniv No really, no spam .nl, for questions about the homework)

IMPORTANT: Make sure to enroll in blackboard for grades and course updates, and sign up for the (re-sit) exam in uSis as soon as possible, but no later than ten calendar days before the actual (re-sit) exam takes place. (Otherwise I cannot register your grade and you do not get credit.)

General Information

This course gives an overview of techniques to automatically learn the structure, patterns and regularities in complicated data, and to use these patterns to predict future data. Statistical learning is very similar to an area within computer science called machine learning, since many methods have their origin in computer science (pattern recognition, artificial intelligence). The course load is 4 ECTS. The e-prospectus contains a longer course description.

The entry requirements for this year are:

Familiarity with least squares linear regression
Ability to program in R or in Python

Lectures and Exercise Sessions

Lectures take place on Thursdays on the dates indicated in the Course Schedule below, in room 408 of the Snellius Building, Niels Bohrweg 1, Leiden.

The first four weeks (all lectures in November), course hours are 10h00-16h30. The last four weeks (December) they are 11h15-15h30.

Examination Form

In order to pass the course, it is required to obtain a sufficient grade (5.5 or higher) on both of the following two:

Homework Projects. We will hand out two homework assignments. The final homework grade will be determined as an average of the grades for the two assignments, without any rounding.
A written open-book examination: Tue. Jan. 17, 14-17h, room TBA; resit: Thu. Feb. 23, 14-17h, room TBA. As an example of the types of questions, here you can find the exam from 2014. NB This year it will not be allowed to use a digital copy of the book during the exam.

The final grade will be determined as the average of the final homework grade and the final open-book examination. It will be rounded to half points, except for grades between 5 and 6, which will be rounded to whole points.

Course Materials

We will use various chapters of The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer-Verlag 2009. In addition, some supplementary material will be provided, as listed in the Course Schedule.

As a study aid, some of the materials used during the lectures will also be made available. Studying these is optional.

NB Although the book can be downloaded for free at the above link, you will need a non-digital paper version for the final exam, which is open book! The standard edition is hard cover, but it might be interesting to get the much cheaper soft-cover edition for €24.99. To get the cheaper offer, open this link from an eduroam connection. Or, if that does not work, go here, sign in and use the search function to find the book. Then choose ‘View Online’ and follow the link to “SpringerLink Books Complete”.

About using Wikipedia: trust it as much as you would trust a fellow student who is also still learning. Some things are good, but other things are poorly explained or plain wrong, so always verify with a trusted source.

Course Schedule

Text in bold font indicates important changes made during the course.

Date	Topics	Literature
Nov. 3: Introduction	General introduction: statistical learning, supervised learning, regression and classification, incorporating nonlinearities by extending the features, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule	All of Chapter 1 and parts of Chapter 2 (Sections 2.1-2.5)
Nov. 10: Regression, part I	Linear regression: least squares. Interpretations of least squares as ERM and as maximum likelihood. Cross-validation. Model selection and overfitting: subset selection, shrinkage methods (ridge regression and lasso)	Sections 3.1 and 3.2 up to 3.2.1. Sections 3.3, 3.4.1, 3.4.2. Sections 7.10.1, 7.10.2. Optionally: 7.12
Nov. 17: Regression Part II, Bayesian methods	Comparison of subset selection, ridge and lasso. Bayesian methods in a nutshell: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession. Regression: Bayes MAP interpretation of Ridge Regression and Lasso.	Section 3.4.3
Nov. 24: Regression Part III and Classification Part I	Linear Discriminant Analysis (LDA), Naive Bayes classifier, Naive Bayes and spam filtering. Surrogate losses. Logistic regression	Sections 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3), 4.4 (except 4.4.3), and 6.6.3. Optionally: Wikipedia on Naive Bayes [1, 2].
Dec. 1: Classification Part II, Optimization, Unsupervised Learning	Discriminative vs. generative models; Naive Bayes versus Logistic Regression Stochastic Optimization Clustering: K-means, EM with Gaussian Mixtures	Additional literature: Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NIPS 2001. Additional handout about stochastic optimization. Section 14.3 before 14.3.1; Sections 14.3.6, 14.3.7. NB. The book gives the wrong definition for K-means in Section 14.3.6; Additional material: correct definition of K-means.
Dec. 8: TA Session	Discussion of homework 1 by Kevin Duisters, and working on homework 2
Dec. 15: Classification Part III	Optimal Separating Hyperplanes; Support Vector Machines; the Kernel Trick; SVM learning as regularized hinge loss fitting Classification and regression trees	Sections 4.5.2, 12.2, 12.3.1, 12.3.2; Section 9.2.
Dec. 22: Classification Part IV	Bagging, boosting (AdaBoost), boosting as forward stagewise additive modeling; Neural networks, deep learning, gradient descent with backpropagation	Sections 8.7, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (only the part about classification), 11.3, 11.4 and 11.5

Homework Assignments

The homework assignments will be made available here. You are encouraged to discuss the assignments, but everyone has to write a report individually. NB These assignments will be a significant amount of work, so start early.

Homework	Data	Available	Deadline
Homework 1	housing data, description	Nov. 10	Nov. 29
Homework 2	car evaluation data, description	Dec. 1	Jan. 6

Optional Material Used During Lectures

As a study aid, here are some of the slides and my personal hand-written notes, which I used to prepare the lectures and which should be more or less the same as what I wrote on the board. Studying these is optional.

Nov. 3

Handwritten lecture notes 1
Slides 1
Figures used from the book: 2.1-2.5, 2.11

Nov. 10

Handwritten lecture notes 2

Nov. 17

Handwritten lecture notes 3
Slides 3
Figures used from the book: 3.11

Nov. 24

Handwritten lecture notes 4
Slides 4
Figures used from the book: 4.5

Dec. 1

Dec. 15

Handwritten lecture notes 7
Figures used from the book: 12.1, 2.5, 12.3, 9.2, 9.3

Dec. 22

Handwritten lecture notes 8
Figures used from the book: ...