Mathematical Aspects of Machine Learning

View the Project on GitHub jeremy9959/Math-5800-Spring-2020

Goals for January 27 - February 3

To summarize, the goals for this week are:

There is also a reading assignment:

What is data science? Chapter 1 of Doing Data Science by O’Neill and Schutt. UConn NetID required – Available through the UConn Library.

Form some preliminary working groups

As we learned last week, people in the class have diverse interests. Reading over my notes I can identify a few topic area themes that people are interested in. These included:

As the skills and interests whiteboards show, we have a mixture of expertise, with some relative newcomers to machine learning in the course, and some people with a lot of math background, along with some relative newcomers.

The first goal for this week is to form preliminary working groups. We’ve all had some opportunity to hear from other class members about their interests. I suggest that these initial groups be organized around broad application areas – so that, for instance, several people interested in working with images can form a group, even if they have different particular applications in mind.

These groups are preliminary, you’re not committing yourself to a joint project for the semester. I recommend shooting for 3-4 people. I’m not much for social engineering so if you’d prefer to work alone, that’s ok. It is a fact of life, however, that industry and academic work in data science/machine learning is a team sport.

Bottom line is that by Wednesday, January 28 I’d like to know who is working together to start.

Create GitHub sites

GitHub is a website (really a cloud services provider) that was created to support large open-source software development projects. Recently acquired by Microsoft, it is a key source for making and sharing software and documentation.

My github profile is here. It consists of some profile information plus a bunch of repositories, each of which contains code or documents related to a projects I am (or was) working on (in widely varying states of completion).

One of the repositories is the source code for the website for this course and I use GitHub to generate the page automatically. Making web pages using GitHub is pretty easy but we’ll save that for later.

The backend of the GitHub site is the software tool called git. Git is a beautiful piece of software that allows you to track versions of your work, undo changes, make experiments without messing up stuff that works, and collaborate with others. It is just as useful for working on shared documents (like joint papers) as it is for software.

To elaborate a little, you would use git to control manage the files in a directory on your computer, to keep track of changes and make checkpoints so you can get back to a working state if you mess up your project. With GitHub you can reproduce your directory in the cloud, share it with others, and also have a backup in case something happens to your local machine. There are other cloud providers that do things like GitHub, such as GitLab, but GitHub is the biggest and best known.

Knowing the basics of how to use git and GitHub is a baseline skill for any scientist – or anyone wanting to work in a technical field in industry. Therefore:

The second goal for this week is for everyone in the class to have a GitHub site and a repository associated with their project for this course.

To install Git on a Mac or Linux machine, you can use the downloads from the git site. For Windows I recommend using the gitforwindows version, which also installs a command line shell (git-bash) that works well with git.
Alternatively you can install GitHub Desktop which gives you a git-shell that also works well on windows.

For some of you this may take 15 minutes, but if you are new to Git and GitHub, here are some references:

Collect Examples and References

The final goal for this week is to begin a library of examples and references related to the general area your group is interested in, and to document that library in your github site in the README file.

You can see my example of this for my (hypothetical) twitter project in my demo github repo.

This is essentially a google/library research project. A reasonable goal is 6-10 references of at least three different types. You should provide a brief indication of what the reference contains. These links are supposed to be useful to you so don’t just blindly copy them.

The types I have in mind include:

Where should I end up?

Here is my demo github repository for my hypothetical twitter project.