Data is a lucrative field to pursue, and there's plenty of demand for people with related skills. However, no career is without its challenges, and data science is not an exception. In this article, I want to explore the real challenges of data science, based on perspectives from those in the field and those who manage them. Future data professionals, here's what you should be prepared to handle.
You'll Need To Be A Specialist, Not A Generalist
The best data scientists don't try to do everything. Instead, they narrow their professional focus to a specific area.
"I would encourage new professionals to understand that data science is a bit like medicine-it's a vast and vague term that encapsulates wildly different practices under one roof," says Tal Kedar, CTO at Optimove. "Data scientists [can have] very different engineering skill sets [and be] experienced with very different platforms and tools."
That said, when you're first learning how to become a data scientist, mastering the basics comes first. After that, you can hone in on what platforms, tools, and areas you want to dive deeper into.
Understand The Business Reasons Informing Your Choices
As a data scientist, you're not just involved in the "how," but also the "why" of making things happen. You're not just randomly sifting through data looking for connections. Instead, you're using your knowledge of various business factors to form a "mental model" which can then be validated or disproved by your data.
Scott Hoover, director of data and analytics at Snowflake, says, "Having a mental model for one's objective before touching any data is incredibly valuable. Instead of aimlessly fishing for signals in the data, thinking like a scientist by formulating hypotheses that are founded on some formal model of human behavior, economics, systems, etc. and then testing those hypotheses make for more successful data-science applications."
This can specifically be a challenge for data scientists in the machine learning field. "Data scientists often run into the issue of trying to add artificial intelligence or machine learning capabilities without concrete objectives," says Greg Benson, chief scientist at SnapLogic. "This is a waste of time. Start by asking how your customer experience will improve at a high level."
As an example, says Kedar, "If you are building a self-driving machine, you need to know what makes a good driver, and be well-versed in the challenges and outcomes that accompany safe or reckless driving, and then have those reflected in the algorithms that drive the car."
Ultimately, says Kedar, "I'd encourage data scientists to always make sure they have a coherent, clear narrative linking the business problem at hand to their choice of algorithms."
It's Best To Have Cross-Department Expertise
Transitioning from another career? That will actually be an asset to you as a data scientist. "The best data scientists are not just statisticians or machine learning experts; they are also an authority in the field or business where they are applying those skills," says Kedar.
Hoover adds, "Data scientists are arguably best utilized as the glue between technical and non-technical teams. As such, in addition to having a deep technical foundation, they must have a domain expertise in whatever department or area they are focused on, be it a product, marketing, sales, or finance."
The good news is that if you've been in another career for a decade, data science is a field you can enter with confidence. Your unique background and a blend of skills will be one of your greatest strengths.
Explain Technical Concepts To Non-Technical Audiences
For data scientists who spend their workdays around technical terminology, this can be a source of frustration. However, it's essential that the data team is able to communicate effectively with audiences from other departments to executives to stakeholders, who may not understand the complexities of your job.
"It can be exciting to share all of the technical complexities that got you to your conclusions," says Andrew Seitz, a senior data analyst at Snowflake. "But what your stakeholders need are the key findings and action items. Save the details for the appendix (or Q&A)."
Hoover agrees: "A data scientist that cannot articulate what their model does and why it's of value to business stakeholders is going to have a difficult path to success."
This is something you can practice. When you're working through a data problem, think about how you'd explain it at Thanksgiving dinner (in a way that doesn't make everyone's eyes glaze over).
You'll Spend A Lot Of Time With Raw Data
Here's the biggest challenge from a purely technical standpoint. Martin Chen, the research data scientist at Shape Security, says, "The primary challenge may be how do we use the data, including how to extract data, how to clean data, how to analyze data, how to get insights or build models from data. Data scientists should have extensive domain expertise in programming languages including SQL, Python, and R."
Hoover agrees that this is often the bulk of a data science job. "The overwhelming majority of effort a typical data scientist puts forth has to do with creating a clean data set with useful information, all before any of the compelling machine learning or statistical models can be applied," he says. "This is the part of the job that's almost considered an art or a craft. Just like any artist or craftsperson, there's untold effort that largely goes unnoticed when viewing the final product."
Collaboration Is Key (You Won't Work In A Vacuum)
Since multiple departments usually work together on projects, it's necessary to collaborate, compromise, and set clear boundaries and expectations. "A common challenge I face in data science is facilitating cooperation between departments on how data should be collected and interpreted," says Seitz. "Predictive models and historical analyses are only as powerful as a team's agreement on the validity of the source data."
Engineering and data teams are often closely linked, so this pairing is also where misalignments commonly occur, says Sofus Macsk√°ssy, vice-president of data science at HackerRank. "There needs to be harmony between the two, so engineering teams can seamlessly access data and engineer an infrastructure that allows the data science team to accurately collect and analyze quality data."
Be Flexible And Consider Context
Don't marry yourself to a specific approach when handling certain types of problems. Having the flexibility to pivot based on unique situations is what will lead you to an optimal solution.
"Good data scientists know multiple ways to answer a question, each having its own advantages or disadvantages, so that they may apply the best approach for that particular business context," says Kedar.
As an example, he continues, "You may use a recurring neural network when studying something that changes over time-like the lifetime value of a customer. But you may opt for a convolutional network when you need to extract features in an image classification task, like deciding whether a picture contains a dog or a cat. An adept data scientist will know all these approaches-and not be tied to just one-and apply the one that best suits the problem he or she is trying to solve."
Regular Maintenance And Version Control Is Essential
As Seitz notes, small mistakes can be costly in data fields like machine learning by affecting your results. Catching them early is crucial.
"Invest the time in refactoring your code, validating data sources and documenting changes with version control," says Seitz. "Hidden data dependencies, unstable data sources, and undocumented assumptions can lead to unexpected changes in your results when retraining models."
The challenges of data science may be intimidating, but many can be averted with enough preparation and communication. As you learn how to become a data scientist as a beginner, keep them in mind and you'll have an advantage from the start.