This is the main website for the Statistical Learning course in autumn 2015, as part of the master track Statistical Science for the Life and Behavioural Sciences at Leiden University. Visit this page regularly for changes and updates.

This course gives an overview of techniques to automatically *learn the structure, patterns and regularities* in complicated data, and to use these patterns to *predict future data*. Statistical learning is very similar to an area within computer science called *machine learning*, since many methods have their origin in computer science (pattern recognition, artificial intelligence).

Instructor: |
Tim van Erven (tim@ No spam, please timvanerven. No really, no spam nl, for general questions) |

Teaching assistant: |
Kevin Duisters (k.l.w.duisters@ No spam, please math.leidenuniv No really, no spam .nl, for questions about the homework) |

## Quicklink

- General Information
- Lectures and Exercise Sessions
- Examination Form
- Course Materials
- Course Schedule
- Homework Assignments
- Extra Lecture Materials
- Optional Further Reading

## General Information

The course load is 4 ECTS. The studiegids contains a longer course description.

Make sure to enroll in blackboard for grades and course updates, and sign up for the (re-sit) exam in uSis ten calendar days before the actual (re-sit) exam takes place.

## Lectures and Exercise Sessions

Lectures take place on Thursdays on the dates indicated in the Course Schedule below, in room 412 of the Snellius Building, Niels Bohrweg 1, Leiden.

The first two weeks (Oct. 29 and Nov. 5), course hours are 10h00-16h30. The rest of the weeks they are 11h15-15h30.

## Examination Form

In order to pass the course, it is required to obtain a sufficient grade (5.5 or higher) on both of the following two:

- Homework Projects. We will hand out two homework assignments. The final homework grade will be determined as an average of the grades for the two assignments, without any rounding.
- A written open-book examination: Wed. Jan. 6, 14-17h, room 407-409; resit: Thu. Feb. 11, 14-17h, room TBA. As an example of the types of questions, here you can find last year’s exam. NB This year it will
*not*be allowed to use a digital copy of the book during the exam.

The final grade will be determined as the average of the final homework grade and the final open-book examination. It will be rounded to half points, except for grades between 5 and 6, which will be rounded to whole points.

## Course Materials

We will use various chapters of The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer-Verlag 2009. In addition, some supplementary material will be provided, as listed in the Course Schedule.

As a study aid, some of the materials used during the lectures are also available. Studying these is optional.

NB Although the book can be downloaded for free at the above link, you will need a non-digital paper version for the final exam, which is open book! The standard edition is hard cover, but it might be interesting to get the much cheaper soft-cover edition for €24.99.

About using Wikipedia: trust it as much as you would trust a fellow student who is also still learning. Some things are good, but other things are poorly explained or plain wrong, so always verify with a trusted source.

## Course Schedule

Text in **bold font** indicates changes made during the course.

Date | Topics | Literature |
---|---|---|

Oct. 29: Introduction | General introduction: statistical learning, supervised learning, regression and classification, incorporating nonlinearities by extending the features, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule | All of Chapter 1 and parts of Chapter 2 (Sections 2.1-2.4) |

Nov. 5: Regression, part I | Linear regression: least squares. 3 interpretations of least squares: as ERM, maximum likelihood and orthogonal projection. Computation of least squares estimate. Bias-variance decomposition for squared error loss. (Already covered cross-validation.) | Section 2.5, Sections 3.1 and 3.2 up to 3.2.1 |

Nov. 12: Regression Part II | Model selection and overfitting: subset selection, cross-validation, shrinkage methods (ridge regression and lasso), HW1 available | Sections 3.3, 3.4.1, 3.4.2, 3.4.3 and Sections 7.10.1, 7.10.2. Optionally: 7.12 |

Nov. 19: Regression Part III and Classification Part I | Mini-Bayes refresher: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession. Regression: Bayes MAP interpretation of Ridge Regression and Lasso. Problems with Least Squares for classification; Linear Discriminant Analysis (LDA), Naive Bayes classifier, Naive Bayes and spam filtering |
Sections 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3), and 6.6.3. Optionally: Wikipedia on Naive Bayes [1, 2] |

Nov. 26: Classification Part II, Unsupervised Learning | Logistic regression; Naive Bayes versus Logistic Regression; Expected Prediction Error: 0/1 vs. logarithmic, surrogate losses. Discriminative vs. generative models; Three Approaches to Modeling; special status of log loss and squared loss functions Clustering: K-means, EM with Gaussian Mixtures |
Additional handouts: table of the three approaches and explanatory notes. Additional literature: Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NIPS 2001. Section 4.4 (except 4.4.3), Section 14.3 before 14.3.1; Sections 14.3.6, 14.3.7. NB. The book gives the wrong definition for K-means in Section 14.3.6; Additional material: correct definition of K-means. Optionally: Wikipedia on K-means. |

Dec. 3: Classification Part III | Optimal Separating Hyperplanes; Support Vector Machines; the Kernel Trick; SVM learning as regularized hinge loss fitting Classification and regression trees |
Sections 4.5.2, 12.2, 12.3.1, 12.3.2; Section 9.2. |

Dec. 10: TA Session | Discussion of the homework by Kevin Duisters | |

Dec. 17: Classification Part IV | Bagging, boosting (AdaBoost), boosting as forward stagewise additive modeling; Neural networks, deep learning, gradient descent | Sections 8.7, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (only the part about classification), 11.3, 11.4 and 11.5 |

## Homework Assignments

The homework assignments will be made available here. You are encouraged to discuss the assignments, but everyone has to write a report individually. NB These assignments will be a significant amount of work, so start early.

Homework | Data | Available | Deadline |
---|---|---|---|

Homework 1 | housing data, description | November 12 | December 1 |

Homework 2 | car evaluation data, description | November 27 | January 13 |

## Optional Material Used During Lectures

Some students asked for my slides and for photos of the black board during the lectures. Here are some of the slides and my personal hand-written notes, which I used to prepare the lectures and which should be more or less the same as what I wrote on the board. Studying these is optional.

### Week 1

- Handwritten lecture notes 1
- Slides 1
- Figures used from the book: 2.1-2.5

### Week 2

- Handwritten lecture notes 2
- Slides 2
- Figures used from the book: 2.11

### Week 3

- Handwritten lecture notes 3
- Figures used from the book: 3.9, 3.11

### Week 4

- Handwritten lecture notes 4
- Slides 4
- Figures used from the book: 4.2, 4.3, 4.5

### Week 5

### Week 6

- Handwritten lecture notes 6
- Figures used from the book: 2.5, 9.2, 9.3, 12.1, 12.2, 12.3

### Week 8

- Handwritten lecture notes 8
- Slides 8
- Figures used from the book: 8.9, 8.10, 8.12, 10.1, 10.2, 10.3 + Algorithm 10.1

## Optional Further Reading

If you would like to know more about statistical learning, here are some links you might find interesting. These are all optional; there will be no questions about this on the exam.

- We have seen during the course that convex optimization methods are very important in statistical learning. If you want to know more, the book by Boyd and Vandenberghe is very good and freely available.
- Some blogs: Google unofficial data science blog, Larry Wasserman (stopped posting), John Langford
- Sometimes we might want to sacrifice some predictive accuracy for better interpretability of our predictive model: