- LeCun, Bottou, et. al. Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, November, 1998.
- LeCun, Cortes, Burges. The MNIST Database of handwritten digits
- Fokoue, Ernest. Model Selection for Optimal Prediction in Statistical Machine Learning,
Notices of the AMS, March, 2020.
- Elkan, C. Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training
- Logistic Regression for Classification from the Coursera Machine Learning Course.
- Sparse Logistic Regression from scikit-learn documentation.
- How the backpropagation algorithm works, Chapter 2 in
Neural Networks and Deep Learning by Michael Nielsen.
- Janssens, Martens. Reflection on modern methods: Revisiting the area under the ROC curve, International Journal of Epidemiology, January, 2020.
- Bell, R., Yehuda Koren and Chris Volinsky. Matrix Factorization Techniques for Recommender Systems, in Computer, vol 42, pp. 30-37, August 2009.
- Matrix Factorization with Tensorflow,
by Katherine Bailey.
- Goh, G. Why Momentum Really Works
- Goodfellow, I. et. al.Generative Adversarial Nets
- Zeiler, M. D. and Fergus, R. Visualizing and understanding convolutional networks