MAT 5313 (MATH 6507) - Winter 2012
Foundations of statistical machine learning and neural networks. The Vapnik-Chervonenkis theory
Course information



Lecturer: Prof. Vladimir Pestov
Department of Mathematics and Statistics
585 King Edward, Office room KED 211 D
Phone: 562-5800 ext. 3523
E-mail: vpest283@uottawa.ca
web page: http://aix1.uottawa.ca/~vpest283
brief CV: http://aix1.uottawa.ca/~vpest283/varia/cv.html
List of publications: http://aix1.uottawa.ca/~vpest283/list.html
Office Hours: t.b.a., office room 211 D (middle of the corridor on the 2nd floor of Mathematics building)
Prerequisites: Some minimal background in analysis and probability is desirable, but not absolutely necessary, because I expect to be teaching everything from scratch.
Course justification: Supervised machine learning describes a process whereby, given a set X of data and a set of values of a predictor, or a classifier, Y=Y(X), at the datapoints, one is searching for an algorithmic way to extend the classifier function Y over the entire domain so that it would make accurate predictions at future datapoints. Some well-known branches of machine learning include artificial neural networks, self-organizing maps, and kernel methods. Automatic handwriting recognition was one of the early successes in this area, and currently machine learning is a major tool in data mining. The Vapnik-Chervonenkis theory provides a conceptual model explaining why it is actually possible to make accurate predictions for future data based on the existing knowledge.
Statistical learning uses a wide variety of deep and fascinating mathematical methods, and this course aims at studying some of these tools in depth.
This course has been taught in the Fall 2007 to a mixed audience of students in statistics, pure mathematics, and computer science, and it was a success. A complete set of revised lecture notes for the course will be made available.
Course topics: I hope to be able to address most of the following, though not necessarily in this order, and in varying detail:
  • classification and regression problems;
  • universally consistent learning methods, Stone's theorem on k-nearest neighbour classifier;
  • empirical risk minimization within a class, probably approximately correct (PAC) learning model;
  • geometry of the Hamming cube, the concentration of measure phenomenon;
  • VC dimension, Sauer lemma, metric entropy, fat shattering dimension;
  • uniform Glivenko-Cantelli classes, distribution-free learnability;
  • learnability under a fixed distribution;
  • Johnson-Lindenstrauss lemma for dimensionality reduction as an application;
  • artificial feed-forward binary neural networks, perceptrons;
  • support vector machines;
  • kernels in Hilbert spaces, kernel methods, the kernel trick.
Reading materials: My lecture notes will be made available on a regular basis on the downloads page.
Here are some additional sources:
  • O. Bousquet, S. Boucheron and G. Lugosi. Introduction to Statistical Learning Theory. - In: Advanced Lectures on Machine Learning, Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, 2004. Available here in PDF.
  • M. Vidyasagar, Learning and Generalization With Applications to Neural Networks (Second Edition), Springer-Verlag, London, 2003. ISBN-13: 978-1852333737
  • Luc Devroye, László Györfi and Gábor Lugosi, A Probabilistic Theory of Pattern Recognition, Springer-Verlag, New York, 1996. ISBN 0-387-94618-7.
  • Vladimir N. Vapnik, Statistical learning theory, John Wiley & Sons, Inc., New York, 1998. xxvi+736 pp. ISBN: 0-471-03003-1
  • Martin Anthony and Peter Bartlett, Neural network learning: theoretical foundations, Cambridge University Press, Cambridge, 1999. xiv+389 pp. ISBN: 0-521-57353-X
  • Shahar Mendelson, A few notes on statistical learning theory, In: Advanced Lectures in Machine Learning, (S. Mendelson, A.J. Smola Eds), LNCS 2600, pp. 1-40, Springer 2003, available here in postscript.

Mandatory course requirements:

To submit solutions to homework assignments (posted weekly), sit the mid-term exam and the final exam, and to obtain at least 45 % in the final.

The assignments are worth up to 10 % of the overall course mark, the mid-term is worth up to 30 %, and the final exam is worth 60 %.

At most 2 assignments can be missed. The mid-term exam cannot be missed unless you get the dispensation from the lecturer ahead of time, or else you have an official excuse such as a doctor's note. In such a case, the mark 0 is entered for the mid-term, and the weight of this exam is shifted to the final.

Assessment: The overall final grade is based on 60 % final exam plus 40% internal (assignments + mid-term).

Mid-term exam: Wednesday February 29, 8:30-9:50, Tabaret Hall, room TBT 315. (Check the map of the U of Ottawa campus on which the building is marked TBT.)