I wanted to spend a good bit of time gaining deeper knowledge and more experience with machine learning and data science, but just ran out of time this week.
Why Machine Learning?
It's the future, right now.
It's the equivalent of finding a O(log α(n)) solution for a O(n!) problem. (That α is the inverse Ackermann function, a ridiculously slow-growing function.)
Instead of spending 6 months writing an algorithm and thousands of rules to define how to solve a very hard problem, you let machine learning learn the rules from examples. It is, in a way, writing its own algorithm from the data you give it.
Most of the time spent doing data science is actually spent gathering and cleaning data (made easy with a little programming).
Google is using machine learning right now, and they see the promise and results already, but only a small percentage of Google engineers have experience with it.
Applications of machine learning at Google:
- How Google Is Remaking Itself As A Machine Learning First Company
- Large-Scale Deep Learning for Intelligent Computer Systems (video)
- Deep Learning and Understandability versus Software Engineering and Verification by Peter Norvig
Intro Course on Machine Learning
There are many online courses on the subject, including a Self-Driving Car Engineer Nanodegree (I want that!), but the best intro to understand the math, statistics, and process behind machine learning is Stanford's Machine Learning Course:
Concretely, Andrew Ng rocks.
I took this course many months ago, and I found it fascinating. It has some very serious math:
Sometimes I ask myself what I’ve gotten myself into. #MachineLearning #neuralNetworks pic.twitter.com/EvJ8mMSNSK
— John Washam (@StartupNextDoor) February 26, 2016
Don't let it scare you.
The math builds up slowly so you can follow along, and the first week includes a review of linear algebra, which I hadn't seen since high school.
The course uses Matlab and Octave. You get a free license of Matlab to use during the course but I used Octave, which is an open-source alternative with similar syntax. It can also read and write Matlab files.
Next Steps
Matlab and Octave are great, and Matlab is widely used, very expensive, and has a language of its own.
The two main languages (other than Matlab-compatible) used in data science and machine learning are Python and R. Python has packages like scikit-learn and numpy that you can use and avoid implementing your own regressors and classifiers. In just a few lines of code, you can implement some very cool technology. In addition, Tensorflow, an open-source package for building neural networks, gives you a neural network in just a few lines.
You can get started with machine learning today, without any knowledge of it. Here is a short playlist of tutorials by Josh Gordon to get you started. You'll see how easy it can be:
More Learning
Books! How I love them.
Python Machine Learning
This book is a best-seller, and very well-reviewed. It will be the first book I tackle when I have the time.
Python Machine Learning by Sebastian Raschka
Data Science from Scratch
Another best seller, by an ex-Googler, no less.
Data Science from Scratch by Joel Grus
Introduction to Machine Learning with Python
This is a preorder, but looking at the table of contents and skimming some of the content, it looks quite promising.
Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido
Resources
- Google's Cloud Machine learning tools (video)
- Tensorflow (video)
- Tensorflow Tutorials
- Courses:
- Stanford: Machine Learning
- videos only
- see videos 12-18 for a review of linear algebra (14 and 15 are duplicates)
- Neural Networks for Machine Learning
- Google's Deep Learning Nanodegree
- Google/Kaggle Machine Learning Engineer Nanodegree
- Self-Driving Car Engineer Nanodegree
- Metis Online Course ($99 for 2 months)
- Stanford: Machine Learning