Goals for January 3 - February 10 | Math-5800-Spring-2020

Goals for January 3 - February 10

All three groups are off to a good start on quite different projects:

Logistic Regression and Image Recognition
Nearest Neighbors and Recommender Systems
Game playing

Looking ahead to this week, here’s where you should be spending your time.

Develop your ability to work with your data in python/jupyter

All you of you have found example code of various sorts illustrating your problem. By looking at those examples, you should be able to use jupyter to produce an introduction to your data and the problem. This means:

loading your data into an appropriate format
displaying examples from your data set
producing summary statistics on your data as appropriate
generally getting more comfortable working in this environment

Get a grip on the theory behind your method

The second goal for this week is to understand the underlying structure of the method you are studying.

Both of the rather advanced references for the course:

contain treatments of Logistic Regression and Nearest Neighbor methods. However, there are many other resources and more elementary books on statistics may have more accessible introductions.

Particularly important for this goal is:

to clarify any underlying assumptions that your method makes about how the data is distirbuted;
to get a sense of practical limitations of your method (how much space, time and so on are involved in using it).

Schedule

Each group will make a progress report on Monday 2/10, following the standard rules:

no more than 15 minutes per group,
everybody talks.

In addition, this week I plan to do some talking on Wednesday 2/5 and Friday 2/7 about some important fundamentals:

the curse of dimensionality
the bias/variance tradeoff and overfitting
evaluating a classifier (false positives and negatives, precision and recall).