Statistical Learning 2017

This is the main website for the Statistical Learning course in autumn 2017, as part of the master Statistical Science for the Life and Behavioural Sciences at Leiden University. Visit this page regularly for changes and updates.

Instructor:	Tim van Erven (tim@ No spam, please timvanerven. No really, no spam nl, for general questions)
Teaching assistant:	Dirk van der Hoeven (d.van.der.hoeven@ No spam, please math.leidenuniv No really, no spam .nl, for questions about the homework)

IMPORTANT: Make sure to enroll in blackboard for grades and course updates, and sign up for the (resit) exam in uSis as soon as possible, but no later than ten calendar days before the actual (resit) exam takes place. (Otherwise I cannot register your grade and you do not get credit.)

General Information

This course gives an overview of techniques to automatically learn the structure, patterns and regularities in complicated data, and to use these patterns to predict future data. Statistical learning is very similar to an area within computer science called machine learning, since many methods have their origin in computer science (pattern recognition, artificial intelligence). The course load is 6 ECTS. The e-prospectus contains a longer course description.

The entry requirements for this year are:

Familiarity with least squares linear regression
Ability to program in R or in Python

Lectures and Exercise Sessions

Lectures take place on Thursdays on the dates indicated in the Course Schedule below, in room B03 of the Snellius Building, Niels Bohrweg 1, Leiden.

The first four weeks, course hours are 10h00-16h15. The last four weeks they are 11h00-15h15.

Examination Form

In order to pass the course, it is required to obtain a sufficient grade (5.5 or higher) on both of the following two:

Homework Projects. We will hand out two homework assignments. The final homework grade will be determined as an average of the grades for the two assignments, without any rounding.
A written open-book examination: probably on Thursday Jan. 4, 14-17h, room TBA; resit: probably on Friday Feb. 2, 14-17h, room TBA. NB You are allowed to bring any information on paper to the exam, and it is recommended to bring the book. However, digital copies of the book will not be allowed.

The final grade will be determined as the average of the final homework grade and the final open-book examination. It will be rounded to half points, except for grades between 5 and 6, which will be rounded to whole points.

As an example of the types of questions on the exam, here you can find the exam from 2014, and the exam from 2015. NB The questions on the exam only cover a sample of the topics covered in class. The topics on this year’s exam are therefore likely to be different from the topics on the previous exams!

Course Materials

We will use various chapters of The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer-Verlag 2009. In addition, some supplementary material will be provided, as listed in the Course Schedule.

As a study aid, some of the materials used during the lectures will also be made available. Studying these is optional.

NB Although the book can be downloaded for free at the above link, you will need a non-digital paper version for the final exam, which is open book! The standard edition is hard cover, but it might be interesting to get the much cheaper soft-cover edition for €24.99. To get the cheaper offer, open this link from an eduroam connection. Or, if that does not work, go here, sign in and use the search function to find the book. Then choose ‘View Online’ and follow the link to “SpringerLink Books Complete”.

About using Wikipedia: trust it as much as you would trust a fellow student who is also still learning. Some things are good, but other things are poorly explained or plain wrong, so always verify with a trusted source. (Incidentally, this also holds for any ‘data science’ blogs you might find online.)

Course Schedule

Text in bold font indicates important changes made during the course.

Date	Topics	Literature
Nov. 2: Introduction, Regression I	General introduction: statistical learning, supervised learning, regression and classification, incorporating nonlinearities by extending the features, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule, curse of dimensionality. Interpretations of least squares as ERM and as maximum likelihood.	All of Chapter 1 and parts of Chapter 2 (Sections 2.1-2.5)
Nov. 9: Regression II: Model Selection	Model selection and overfitting: subset selection, shrinkage methods (ridge regression and lasso). Comparison of subset selection, ridge and lasso. Cross-validation.	Sections 3.1 and 3.2 up to 3.2.1. Sections 3.3 and 3.4 up to 3.4.3. Sections 7.10.1, 7.10.2. Optionally: 7.12
Nov. 16: Bayesian methods, Classification Part I	Bayesian methods in a nutshell: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession. Regression: Bayes MAP interpretation of Ridge Regression and Lasso. Classification: Naive Bayes classifier, Naive Bayes and spam filtering.	Section 6.6.3. Optionally: Wikipedia on Naive Bayes [1, 2] (see Wikipedia caveat).
Nov. 23: Classification Part II	Linear Discriminant Analysis (LDA). Surrogate losses. Logistic regression. Discriminative vs. generative models: Naive Bayes versus Logistic Regression	Sections 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3), 4.4 (except 4.4.3). Additional literature: Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NIPS 2001.
Nov. 30: Classification Part III, Unsupervised Learning	Optimal separating hyperplanes, support vector machines (SVMs): the kernel trick, SVM learning as regularized hinge loss fitting Clustering: K-means, EM with Gaussian Mixtures	Sections 4.5.2, 12.2, 12.3.1, 12.3.2. Section 14.3 before 14.3.1; Sections 14.3.6, 14.3.7. NB. The book gives the wrong definition for K-means in Section 14.3.6; Additional material: correct definition of K-means.
Dec. 7: Homework	No lecture! (But work on homework 2!)
Dec. 14: Classification Part IV	Discussion of homework 1 by Dirk van der Hoeven Classification and regression trees. Bagging, boosting (AdaBoost), boosting as forward stagewise additive modeling.	Sections 9.2, 8.7, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (in 10.6 only the part about classification)
Dec. 21: Optimization, Deep Learning	Stochastic Optimization Neural networks, deep learning, gradient descent with backpropagation	Sections 11.3, 11.4 and 11.5. Additional handout about stochastic optimization.

Homework Assignments

The homework assignments will be made available here. You are encouraged to discuss the assignments, but everyone has to perform their own experiments and write a report individually. NB These assignments will be a significant amount of work, so start early.

Homework	Data	Available	Deadline
Homework 1	housing data, description	Nov. 9	Nov. 29
Homework 2	car evaluation data, description	Nov. 29	Dec. 20

Optional Material Used During Lectures

As a study aid, here are some of the slides and my personal hand-written notes, which I used to prepare the lectures and which should be more or less the same as what I wrote on the board. Studying these is optional.

Nov. 2

Handwritten lecture notes 1
Slides 1
Figures used from the book: 2.1-2.5, 2.11
Solution to in-class exercise: interpretation of least squares as maximum likelihood.

Nov. 9

Handwritten lecture notes 2
Figures used from the book: 3.1, 3.11

Dec. 14

Handwritten lecture notes 7
Figures used from the book: 8.9, 8.10, 8.12, 9.2, 9.3, 10.1, 10.2, 10.3
AdaBoost: Algorithm 10.1 in the book