Honors Undergraduate Course on Machine Learning

For your first project, please choose one or more of the algorithms we have discussed so far:

- Linear Regression
- Naive Bayes Classification (Bernoulli or Binomial model)
- Logistic Regression

as well as other methods that we will discuss in the next few weeks, together with Gradient Descent or Newton’s Method (if appropriate) and apply these techniques to obtain information from a data set of your choice.

You may work individually or in groups of two or three.

Your project should be presented as a Jupyter notebook that includes a thorough explanation of your work together with any code that you work with and any results that you obtain.

If you need a place to start, you could pick one of the labs we have done and develop the ideas there in more depth or using different data.

We suggest meeting with you to discuss your plans before you get too far along.

The project will be due March 22, 2021.

You may choose any dataset you want. Some useful sources for interesting datasets are:

- The UCI machine learning repository
- The Kaggle ML competition site
- Data.gov – the US Government’s open data site
- Connecticut’s open data portal

Some data that we have worked with or will work with in class are: