Deep Learning: Things to keep in mind.
2017-08-05
If you're reading this, there's a good chance you're interested in getting started with deep learning. If so, let me point you towards some awesome resources that I personally found extremely helpful:
- Udacity's Deep Learning Foundations Nanodegree
- Michael Nielsen's Book: neuralnetworksanddeeplearning.com
- Andrew Trask's blog: iamtrask.github.io
- Stanford CS231n: Convolutional Neural Networks for Visual Recognition
- Goodfellow, Bengio and Courville's Deep Learning Book
- Andrew Ng's new project: deeplearning.ai
The above set of resources should be more than enough for you to get started and develop a level of proficiency with deep learning that should be proportional to the time you're willing to put into the subject.
I found Michael Nielsen's book and Trask's blog posts excellent for getting started with the mathematical foundations and neural network implementations in Python.
After a couple of months of tinkering with deep neural networks, I realized that had I known a few things before I ever started working on these projects, I could have made much better use of my time. So I hope these points can benefit you if you're going to be traning your own deep learning models!
Data preprocessing.
Getting the raw data formatted into the correct form for feeding into your neural network is hands down the most significant part of training your model. Deciding how you will approach your problem, sourcing your data, labelling and transforming it into the correct numerical form (neural networks can only work with numerical features) and cutting up your data into training/testing sets and batches will be the most time-consuming and pretty much the most frustrating part of working with your models. Experience with data manipulation tools like Bash, Pandas, Numpy and Scipy helps a great deal in this. Here is a great whirlwind tour to help you get up to speed with Numpy.
Debugging your networks.
Deep learning models are basically just a bunch of matrices stacked in order. To make your models deeper you add more (weight) matrices to the model. So a basic requirement to go about doing all this linear algebra is that the matrices need to be stacked in the right order of dimensions.
Imagine a 10 layer neural network, each with varying number of hidden nodes. The data to these layers is input in batches of a certain order. Having the right idea of the dimensions of the matrices in networks is important to understand what you're doing. This comes with time but initially this is gonna be the source of endless hair-pulling when you get that dreaded Incorrect Dimensions error message.
Training Feedback loop.
Deep learning models take a lot of compute power to be trained effectively. A well-performing deep neural network can take anywhere from a few hours to a couple of weeks to train. If you have a good GPU on hand, the feedback loop of implementing the network, running it to see results and then optimising hyperparameters gets minimized which is good for working efficiently.
But even if you buy a GPU, it’s not as simple as just plugging it in. You’ll need to install the associated CUDA drivers, and be using an operating system that works well with the GPU (looking at you Mac).
It's best to just work with one of the cloud providers. Amazon, Google and Microsoft Azure all have their own GPU powered isntances that can take the pressure off your stalwart but underpowered laptop. Newer companies have also come in to fill the demand for greater compute power necessitated by the recent focus on deep learning. Paperspace and FloydHub seem to be two promising ones.
Keeping up with Research.
If you want to use deep learning in your applications, just plug one of the API's or models that come pre-configured in many projects. But that's really not much fun. To train and really play around with a neural network is what it's all about and the only way to get better at not just understanding but also building new and better models is to really look into the research that's happening around deep learning. Try implementing other people's work on your own. But there is a small issue here: The research is moving too fast! It seems that every month an important result is achieved and speed at which the field is progressing is quite frankly, amazing. It can be an overwhelming experience just to keep track of all the work being done in the field. Alternate skills like knowing how to efficiently read papers become important here. This is a great post by Andrew Trask that helped me become comfortable with reading new papers.
Also, check out The morning paper by Adrian Colyer (Which by the way is a great resource to keep up with computer science topics in general). There’s also a great channel on Youtube: Two Minute papers.
All of these points are good-to-knows that I feel might help others better navigate their learning experience. If you feel I should add something to the above, let me know!