Statistical Learning 2014

This is the main website for the Statistical Learning course in autumn 2014, as part of the Master Track Statistical Science for the Life and Behavioural Sciences at Leiden University. Visit this page regularly for changes and updates.

This course gives an overview of techniques to automatically learn the structure, patterns and regularities in complicated data, and to use these patterns to predict future data. Statistical learning is very similar to an area within computer science called machine learning, since many methods have their origin in computer science (pattern recognition, artificial intelligence).

Instructors:	prof. dr. Peter Grünwald (pdg at No spam, please cwi dot No really, no spam nl)
	dr. Tim van Erven (tim at No spam, please timvanerven dot No really, no spam nl)
Contact:	For general questions, send e-mail to either Peter or Tim. For questions about the homework, send e-mail to Tim.

General Information

The course load is 4 ECTS. The studiegids contains a longer course description.

Make sure to enroll in blackboard for grades and course updates, and sign up for the (re-sit) exam in uSis ten calendar days before the actual (re-sit) exam takes place.

Lectures and Exercise Sessions

Lectures take place on Mondays on the dates indicated in the Course Schedule below, in room 409 of the Snellius Building, Niels Bohrweg 1, Leiden.

The first two weeks (Nov. 3 and Nov. 10), course hours are 10h00-16h30. The rest of the weeks they are 11h15-15h30.

Examination Form

In order to pass the course, one must obtain a sufficient grade (6 or higher) on both of the following two:

Homework Projects. We will hand out two homework assignments. The final homework grade will be determined as an average of the grades for the two assignments.
A written open-book examination: Wed. Jan. 7, 14-17h, room 412; resit: Fri. Feb. 13, 14-17h, room 412. Here you can find an example of examination questions.

The final grade will be determined as the average of the final homework grade and the final open-book examination.

Course Materials

We will use various chapters of The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer-Verlag 2009. The book can be downloaded for free at the above link. In addition, some supplementary material will be provided, as listed in the Course Schedule. Finally, it may also be useful to look at the Wikipedia pages (sic!) for Naive Bayes classification.

Course Schedule

Date	Topics	Literature
Nov. 3: Introduction	General introduction: statistical learning, supervised learning, regression and classification, incorporating nonlinearities by extending the features, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule	All of Chapter 1 and parts of Chapter 2 (Sections 2.1-2.4)
Nov. 10: Regression, part I	Linear regression: least squares. 3 interpretations of least squares: as ERM, maximum likelihood and orthogonal projection. Computation of least squares estimate. Bias-variance decomposition for squared error loss.	Section 2.5 of Chapter 2, Sections 3.1, 3.2.1 of Chapter 3
Nov. 17: Regression Part II	Model selection and overfitting: subset selection, cross-validation, shrinkage methods (ridge regression and lasso)	Sections 3.1, 3.2.1., 3.3, 3.4.1, 3.4.2, 3.4.3 and Sections 7.10.1, 7.10.2
Nov. 24: Regression Part III and Classification Part I	Mini-Bayes refresher: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession. Regression: Bayes MAP interpretation of Ridge Regression and Lasso. Problems with Least Squares for classification; Linear Discrimant Analysis (LDA)	Sections 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3), and 4.4 (except 4.4.3)
Dec. 1: Classification Part II	Logistic Regression, Expected Prediction Error: 0/1 vs. logarithmic. Discriminative vs. generative models; Three Approaches to Modeling; special status of log loss and squared loss functions.	Sections 4.5.2, 6.6.3, 12.2, 12.3.1, 12.3.2, Additional handouts: table of the three approaches and explanatory notes
Dec. 8: Classification Part III	Naive Bayes classifier; Naive Bayes and Logistic Regression; Naive Bayes and spam filtering; Optimal Separating Hyperplanes; Support Vector Machines; the Kernel Trick; SVM learning as regularized hinge loss fitting.	Additional literature: Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NIPS 2001
Dec. 15: Classification Part IV	Classification and regression trees, bagging, boosting (AdaBoost), boosting as forward stagewise additive modeling	Sections 9.2, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (only the part about classification)
Dec. 22: Model Assessment and Selection, Unsupervised Learning	Model assessment: AIC, BIC, Bayes Factors, In-Sample Prediction Error, Model Averaging Clustering: K-means, EM with Gaussian Mixtures	Sections 7.4-7.7, see especially last paragraph "BIC is asymptotically consistent". Section 14.3 before 14.3.1; Sections 14.3.6, 14.3.7. NB. The book gives the wrong definition for K-means in Section 14.3.6; see the definition on Wikipedia instead.

Homework Assignments

The homework assignments will be made available here. You are encouraged to discuss the assignments, but everyone has to write a report individually.

Homework	Data	Deadline
Homework 1	housing data, description	December 7
Homework 2	The congressional voting records data set from the UCI Machine Learning Repository	Postponed to January 21