Here are our 6 predictions for data science, machine learning, and AI for 2018. Some are fast track and potentially disruptive, some take the hype off over blown claims and set realistic expectations for the coming year.
It's that time of year again when we do a look back in order to offer a look forward. What trends will speed up, what things will actually happen, and what things won't in the coming year for data science, machine learning, and AI.
We've been watching and reporting on these trends all year and we scoured the web and some of our professional contacts to find out what others are thinking. There are only a handful of trends and technologies that look to disrupt or speed ahead. These are probably the most interesting in any forecast. But it also valuable to discuss trends we think are a tad overblown and won't accelerate as fast as some others believe. So with a little of both, here's what we concluded.
Prediction 1: Both model production and data prep will become increasingly automated. Larger data science operations will converge on a single platform (of many available). Both of these trends are in response to the groundswell movement for efficiency and effectiveness. In a nutshell allowing fewer data scientists to do the work of many.
The core challenge is that there remains a structural shortage of data scientists. Whenever a pain point like this emerges we expect the market to respond and these two elements are its response. Both come at this from slightly different angles.
The first is that although the great majority of fresh new data scientists have learned their trade in either R or Python that having a large team freelancing directly in code is extremely difficult to manage for consistency and accuracy, much less to debug.
All the way back in their 2016 Magic Quadrant for Advanced Analytic Platforms, Gartner called this out and wouldn't even rate companies that failed to provide a Visual Composition Framework (drag-and-drop elements of code) as a critical requirement. Gartner is very explicit that working in code is incompatible with the large organization's need for quality, consistency, collaboration, speed, and ease of use.
Langley Eide, Chief Strategy Officer at Alteryx offered this same prediction, that ‚??data science will break free from code dependence. In 2018, we'll see increased adoption of common frameworks for encoding, managing and deploying Machine Learning and analytic processes. The value of data science will become less about the code itself and more about the application of techniques. We'll see the need for a common, code-agnostic platform where LOB analysts and data scientists alike can preserve existing work and build new analytics going forward."
The second element of this prediction which I do believe is disruptive in its implications is the very rapid evolution of Automated Machine Learning. The first of these appeared just over a year ago and I've written several times about the now 7 or 8 competitors in this field such as DataRobot, Xpanse Analytics, and PurePredictive. These AML platforms have achieved one-click-data-in-model-out convenience with very good accuracy. Several of these vendors have also done a creditable job of automating data prep including feature creation and selection.
Gartner says that by 2020, more than 40% of data science tasks will be automated. Hardly a month goes by without a new platform contacting me wanting to be recognized on this list. And if you look into the clients many have already acquired you will find a very impressive list of high volume data science shops in insurance, lending, telecoms, and the like.
Even large traditional platforms like SAS offer increasingly automated modules for high volume model creation and maintenance, and many of the smaller platforms like BigML have followed suite with greatly simplified if not fully automated user interfaces.
Prediction 2: Data Science continues to develop specialties that mean the mythical 'full stack' data scientist will disappear.
This prediction may already have come true. There may be some smaller companies that haven't yet got the message but trying to find a single data scientist, regardless of degree or years of experience, who can do it all just isn't in the cards.
First there is the split between specialists in deep learning and predictive analytics. It's possible now to devote your career to just CNNs or RNNs, work in Tensorflow, and never touch or understand a classical consumer preference model.
Similarly, the needs of different industries have so diverged in their special applications of predictive analytics that industry experience is just as important as data science skill. In telecoms and insurance it's about customer preference, retention, and rates. In ecommerce it's about recommenders, web logs, and click streams. In banking and credit you can make a career in anomaly detection for fraud and abuse. Whoever hires you is looking for these specific skills and experiences.
Separately there is the long overdue spinoff of the Data Engineer from the Data Scientist. This is identification of a separate skills path that only began to be recognized a little over a year ago. The skills the data engineer needs to set up an instance in AWS, or implement Spark Streaming, or simply to create a data lake are different from the analytical skills of the data scientist. Maybe 10 years ago there were data scientists who had these skills but that's akin to the early days of personal computers when some early computer geeks could actually assemble their own boxes. Not anymore.
Prediction 3: Non-Data Scientists will perform a greater volume of fairly sophisticated analytics than data scientists.
As recently as a few years ago the idea of the Citizen Data Scientist was regarded as either humorous or dangerous. How could someone, no matter how motivated, without several years of training and experience be trusted to create predictive analytics on which the financial success of the company relies?
There is still a note of risk here. You certainly wouldn't want to assign a sensitive analytic project to someone just starting out with no training. But the reality is that advanced analytic platforms, blending platforms, and data viz platforms have simply become easier to use, specifically in response to the demands of this group of users. And why have platform developers paid so much attention? Because Gartner says this group will grow 5X as fast as the trained data scientist group, so that's where the money is.
There will always be a knowledge and experience gap between the two groups, but if you're managing the advanced analytics group for your company you know about the drive toward 'data democratization' which is a synonym for 'self-service'. There will always be some risk here to be managed but a motivated LOB manager or experienced data analyst who has come up the learning curve can do some pretty sophisticated things on these new platforms.
Langley Eide, Chief Strategy Officer at Alteryx suggests that we think of these users along a continuum from no-code to low-code to code-friendly. They are going to want a seat at our common analytic platforms. They will need supervision, but they will also produce a volume of good analytic work and at very least can leverage the time and skills of your data scientists.
Prediction 4: Deep learning is complicated and hard. Not many data scientists are skilled in this area and that will hold back the application of AI until the deep learning platforms are significantly simplified and productized.
There's lots of talk about moving AI into the enterprise and certainly a lot of VC money backing AI startups. But almost exclusively these are companies looking to apply some capability of deep learning to a real world vertical or problem set, not looking to improve the tool.
Gartner says that by 2018, deep neural networks will be a standard component of 80% of data scientists' tool boxes. I say, I'll take that bet, that's way too optimistic.
The folks trying to simplify deep learning are the major cloud and DL providers, Amazon, Microsoft, Google, Intel, NVDIA, and their friends. But as it stands today, first good luck finding a well-qualified data scientists with the skills to do this work (have you seen the salaries they have to pay to attract these folks?).
Second, the platforms remain exceedingly complex and expensive to use. Training time for a model is measured in weeks unless you rent a large number of expensive GPU nodes, and still many of these models fail to train at all. The optimization of hyperparameters is poorly understood and I expect some are not even correctly recognized as yet.
We'll all look forward to using these DL tools when they become as reasonable to use as the other algorithms in our tool kit. The first provider to deliver that level of simplicity will be richly rewarded. It won't be in 2018.
Prediction 5: Despite the hype, penetration of AI and deep learning into the broader market will be relatively narrow and slower than you think.
AI and deep learning seems to be headed everywhere at once and there are no shortages of articles on how or where to apply AI in every business. My sense is that these applications will come but much slower than most might expect.
First, what we understand as commercially ready deep learning driven AI is actually limited to two primary areas, text and speech processing, and image and video processing. Both these areas are sufficiently reliable to be commercially viable and are actively being adopted.
The primary appearance of AI outside of tech will continue to be NLP Chatbots, both as input and output to a variety of query systems ranging from customer service replacements to interfaces on our software and personal devices. As we wrote in our recent series on chatbots, in 2015 only 25% of companies had even heard of chatbots. By 2017, 75% had plans to build one. Voice and text is rapidly becoming a user interface of choice in all our systems and 2018 will see a rapid implementation of that trend.
However, other aspects of deep learning AI like image and video recognition, outside of facial recognition is pretty limited. There will be some adoption of facial and gesture recognition but those aren't capabilities that are likely to delight customers at Macy's, Starbucks, or the grocery store.
There are some interesting emerging developments in using CNNs and RNNs to optimize software integration and other relatively obscure applications not likely to get much attention soon. And of course there are our self-driving cars based on reinforcement learning but I wouldn't camp out at your dealership in 2018.
Prediction 6: The public (and the government) will start to take a hard look at social and privacy implications of AI, both intended and unintended.
This hasn't been so much a tsunami as a steadily rising tide that started back with predictive analytics tracking our clicks, our locations, and even more. The EU has acted on its right to privacy and the right to be forgotten now documented in their new GDPR regs just now taking effect.
In the US the good news is that the government hasn't yet stepped in to create regulations this draconian. Yes there have been restrictions placed on the algorithms and data we can use for some lending and health models in the name of transparency. This also makes these models less efficient and therefore more prone to error.
Also, the public is rapidly realizing that AI is not currently able to identify rare events with sufficient accuracy to protect them. After touting their AI's ability to spot fake news, or to spot and delete hate speech or criminals trolling for underage children, Facebook, YouTube, Twitter, Instagram, and all the others have been rapidly fessing up that the only way to control this is with legions of human reviewers. This does need to be solved.
Still, IMHO on line tracking and even location tracking through our personal devices is worth the intrusion in terms of the efficiency and lower cost it creates. After all, the materials those algorithms present to you on line are more tailored to your tastes and since it reduces advertising cost, should also reduce the cost of what you buy. You can always opt out or turn off the device. However, this is small beer compared to what's coming.
Thanks largely to advances in deep learning applied to image recognition, researchers have recently demonstrated peer-reviewed and well-designed data science studies that show that they can determine criminals from non-criminals, and gays from straights with remarkable levels of accuracy based only on facial recognition.
The principle issue is that while you can turn off your phone or opt out of on-line tracking that the proliferation of video cameras tracking and recording our faces makes it impossible to opt out of being placed in facial recognition databases. There have not yet been any widely publicized adverse impacts of these systems. But this is an unintended consequence waiting to happen. It could well happen in 2018.