Which Way To Go To Build A Career In Deep Learning?

Dec 2, 2018 | 1275 Views

Today I met two college friends. Both of them currently work as software engineers in one of the most successful technology companies of India. They have been reading about Deep Learning from popular online sources like cs231n.stanford.edu, cs224d.stanford.edu, 3Blue1Brown, and Siraj Raval for quite some time and they regularly implement CNN's and RNNs in TensorFlow and PyTorch for practice. Fascinated by the potential and market-hype, now they are looking for an opportunity to pursue Deep Learning full-time and they wanted to discuss what should be the right entry-point - is it a job in one of the startups or is it pursuing an MS degree.

Numerous people ask me this question every other day. Unfortunately, I could never give a structured answer, until today. Today's answer was a result of an extensive discussion of possibilities with these close friends of mine who are extremely intelligent and I am proud that we could come up with such a comprehensive answer. I invite you all, the readers of this article, to express your opinions in the comments section below so that we can refine the answer together.

In today's scenario, there are two broad streams of activity in the field of Deep Learning. It is pretty much the same as that in any other field of scientific research.

Stream-1: Implementing the state-of-the-art theory in vertical applications.
Stream-2: Pushing the theoretical state-of-the-art.

Both of these streams are equally rewarding but they call for two distinct sets of skills/instincts. Let's call them S¹ and S². (Disclaimer: S¹ ? S² ? ??). Depending on whether you have S¹ or S², you fall in Category-1 or Category-2.

Next, we make non-exhaustive but sufficiently comprehensive enumerations of S¹ and S².

Let's start with S¹. We are talking about Stream-1 which is implementing the state-of-the-art theory in vertical applications. The goal is to solve a problem at hand with the state-of-the-art Deep Learning algorithm. Say, for example, the problem at hand is to build a conversational AI agent to help the millions of people who suffer from depression and related mental health issues. You do a quick literature survey and expert consultation and locate the state-of-the-art approach to making such an AI that was maybe published in the most recent iteration of NIPS. With this paper in hand, you come back to the original idea of your system and analyze the paper in light of your problem statement. You ask the following questions:

1. How much does this paper cater to the problem statement you are trying to solve?

2. What is the scale of deployment of your system? Is the solution proposed in the paper scalable enough?

3. What is the maximum allowable latency of your system? If you are handling personal/private data, its recommended that you do not ship any of that data to the cloud and your algorithm learns and infers in the user's device. Is the solution in the paper suitable for handling such a case? If not, can you make it suitable?

Add to this list the N number of deployment issues that are relevant to the domain of your application. Your job is to focus on implementing, optimizing and adapting the NIPS algorithm for your application. For this you need the following non-exhaustive set of skills which we collectively call S¹.

1. Superlative programming skills and preferably, a background in software engineering.
2. The zeal of optimizing an algorithm to the core.
3. Knowledge of different hardware platforms - like CPU (x86 vs ARM), GPU, FPGA, VPU/ASIC etc and how to write optimum code for each of these. Also, the ability to judge the suitability of a given platform for your application.
4. Knowledge of High-Performance Computing and Big Data handling.
5 Prior experience in full-stack software development.

If you have the skills in S¹, you fall in Category-1. You may call this category "Deep Learning Engineer". If you belong in Category-1, you should consider going for a job in a startup because there you will get an opportunity to solve a burning problem with the best Deep Learning solution and all your super-star engineering skills.
But before signing up with a startup, do check the following:

1. Talk to the Deep Learning boss. If he believes that Deep Learning can solve any problem in the world given enough labeled data and compute, better not work for that company ;)
2. Check if the company has enough training data and/or necessary tie-ups with institutions that can provide labeled training data. If not, reject.
3. Does the company have an existing collaboration with a domain expert? (in the anti-depression AI example, has the company signed up an experienced psychologist for regular consultation?) If not, the company is useless.
4. Does the company have an existing collaboration with an AI expert from the Academia or at least, is looking forward to forging one? If not, you have high chances of getting stuck/disoriented midway through your development process with no one to help/correct.

Now let's talk about S². It is the set of skills that are essential for pushing the theoretical state-of-the-art. You pick the paper from NIPS 2017 that intrigues you the most and pledge to surpass it, set the bar higher and publish in NIPS 2018 (or maybe 2028 ;). The farther, the better.). With this one (or maybe ten)-year plan in mind, you chalk out your steps. You need the following instincts/skills.

1. The sheer pleasure of studying Math.
2. Ability to read and summarize a large volume of literature in a short span of time. Having a broad picture of the problem at hand and being able to identify where exactly the solution proposed by the paper, you just read, fits.
4. Sufficient proficiency in implementing algorithms in code using your favorite Deep Learning library (e.g. TensorFlow, PyTorch, Caffe, etc).
5. Superior verbal presentation and collaborative skills. You should be able to present your ideas succinctly to experts and collaborators and have effective discussions with them.
5. Superior writing skills. Even the best ideas get rejected due to the poor presentation in the paper. You should be able to write really really good English.
6. Perseverance. Be prepared for serial rejections, heartbreaks, severe bugs, implementation failures, and fake research works.

If you have these traits, you are in Category-2. You may call this category "Deep Learning Scientist". You should go for an MS (or Ph.D.). You must know that at the end of your tenure, you will be judged on the basis of how many NIPS, ICLR and ICML papers you published. So choose your research lab accordingly.

Here are a few things you must check:
1. Choose your area of research carefully. Remember that your aim is to be market-ready 3??5 years down the line, after completing your MS/Ph.D. Hence, be sure to choose a topic that will be hot 3??5 years later. Not something that is hot right now, but will potentially lose the limelight to something more promising by the time you finish your tenure.

2. Select your adviser carefully - does he/she encourage futuristic research, external academic and industrial collaboration, internships and free interaction with other researchers in the team? If no, then look for someone else.

3. Do you have access to requisite hardware e.g. GPUs, Xeon Phi/Xeon-scalable processors, TPU (in Google Cloud), etc?

4. Do you have a good working environment? Remember that you will end up spending 20 hours a day in your lab.

5. How ambitious is your team? Is there a culture of aiming for top conferences and journals and publishing regularly? How good a track-record does your team have at solving problems?

The next question that naturally arises is whether to go abroad or stay in India. Well, in my opinion, today, there is a lot of scope of being a successful researcher working out of India. Most IITs, ISIs, and the IISc have state-of-the-art hardware infrastructure for Deep Learning (much of it is under construction now, but will be ready in 1-2 years). There are some extremely talented Professors and nascent but promising research groups invested in Deep Learning in almost all these institutes. Industry giants like Google, Amazon, Intel, and NVIDIA and of course Government organizations like MHRD, ISRO, and DRDO are investing heavily in these groups. So staying back in India is not a bad choice anymore? :)

However, if you are inclined to go abroad, make sure you go to one of the top-tier venues. Working in a tier-two university/research-lab is not worth all the expenses and its certainly a better option to stay back and work for your own country. Also, if you are moving to the US, stick close to the Silicon Valley because, you know? ;)

So that's it! Make sure you appropriately describe yourself in terms of the aforementioned features and use some "deep" thoughts to classify yourself as Category-1 or Category-2. Then the choice is simple. Choose a startup job if you belong to Category-1 or go for MS/Ph.D. if you are in Category-2. I agree that this analysis is grossly superficial and needs more detail and intricate treatment. And to mitigate that, I request you to please leave your valuable opinions in the comments section below. Let's have a conversation. The question being addressed in this blog is crucial and it deserves a detailed discussion. Thank you for your time! :)

Edit #1 on 06 Nov 2017
Many people (specially Arijit Patra) pointed out that nowadays, a substantial fraction of publications in non-Computer Science journals/proceedings comprise usage of Deep Learning methods to solve respective domain-related problems. Although these projects fall in Stream-1 according to the classification presented in this article, none of the skills enumerated for S¹ are essential. Rather some of the skills from S² are important. I agree and that is why I emphasized that S¹ ? S² ? ?? and the enumeration of the sets was non-exhaustive. Also, it is practically impossible for a human being to have ALL the traits and skills in S¹ or S². Arijit Patra suggested that I should make a third stream for this kind of projects, whose practitioners can be called "Application Scientists". However, I beg to differ and maintain my original classification. I would recommend people interested in this line of work to choose a startup-job over MS to begin with because being a part of an effort to make a real product teaches you important life lessons and also helps you identify which role in the project (e.g. reader, writer, programmer, strategist) suits you the most. You may have already noticed that publishing single author papers is not that fashionable in the Deep Learning community. You will most probably be working in a team even if you join MS/Ph.D. It is important to figure out which role makes you the most productive in the setting of a Deep Learning project? :)

Another suggestion from Yateen Gupta and Sanjiv Roy was to include tips for working professionals who have none of the enumerated traits from S¹ or S² but want to switch careers towards Deep Learning. Well, my advice to them would be to pick one of the streams (after some research and deliberation) and invest time in developing the essential (enumerated) skills. For example, Python programming is essential for success in Deep Learning. If the person is not conversant with Python, he/she can spend a weekend doing the Google Python Class. A good place to start is cs231n.stanford.edu. The assignments demand both paper reading and programming. So the person can easily understand which stream of work suits him/her the most and what are the loopholes in his/her training that need to be plugged.

Source: HOB