WHAT TO KNOW NOW | Python
Data science is here to stay and modern Parseltongue pays.
Dubbed the sexiest job of the 21st century by the highly erotic Harvard Business Review, “Data Scientist” is a job title that will become increasingly common despite being redundant (don’t all scientists use data?). Data scientists have a growing range of varied responsibilities, but at the core of the job is the idea that the 2.5 quintillion bytes of data zipping around the planet and through Low-Earth Orbit can teach businesses a lot about what is going to happen and what already has. Data scientists become valuable when they figure out how to creatively merge computer science, statistics, engineering, and operations research in order to inform business strategy. Companies know they need that skill set and a dearth of talent means that they’re willing to pay out the nose for it.
A McKinsey study recently reported that, by 2018, the U.S. may face a demand for data scientists 60 percent greater than supply. We’re not there yet and starting salaries are already as high as $200,000. But for those in the market for a new career, experts project that individuals who can help Amazon optimize their search algorithm, or Netflix recommend better movies continue to be very valuable as big data continues to dominate. And that’s not an ill-defined skill set. There are particular programs data scientists use as the basis of that work — though many more senior practitioners of the data arts are programmers capable of creating custom products for themselves.
To learn how people should develop a skill set for a career in data science, Inverse spoke to Dr. Jungwoo Ryoo, a Professor of Information Sciences and Technology at Penn State. Ryoo hosts a data science podcast and has a grant from the National Science Foundation to research and suggest solutions for the shortage of data scientists. He has, in short, a lot of data. And he wants you to learn Python.
Why are so many of these jobs available right now?
Ten or 15 years ago it was very difficult for us to harness the power of data. The data has been around but we haven’t been able to utilize it. Now, there are a lot of commercial tools and services offering data analytics services, and the timing is right for companies to truly take advantage of the data available to them. We are actually on the verge of a new industrial revolution, which is basically a data revolution, and we are trying to exploit the data we have as an opportunity to grow our industry and economy, and that is why these jobs and opportunities are available.
What does a data scientist actually do?
A data scientist is someone who specializes and optimizes technologies for big data solutions, and knows how to leverage statistical packages, analyze data, interpret the data, and visualize the data. If you have the data, but are not able to translate it for your audience, it is not useful at all because no one understands it. In that sense the data scientist is playing a very important role. But we also need people who can automate the analysis. It is almost impossible for people to look at all of these pieces of information and come to a conclusion so it is becoming more and more necessary for an automated solution, and a lot of these data science related tasks are becoming more and more automated.
What types of companies are hiring?
The job opportunities are everywhere, and there is a shortage of people who can perform these duties. It could be small- or medium-sized company that need their data analyzed or global corporation — really any company which wants to take advantage the data to be competitive.
Let’s talk turkey. How much money can you make?
The median starting salary for data science jobs is about $60,000 depending on specialty and experience and it could go up to $200,000, with the potential to earn much more.
Sold. What skill do I need to have to get one of those jobs?
People who know the programming language Python are the most in demand, because it is like a Swiss army knife. It is used in many of these data science tools to automate these manual statistics. Python is at the forefront of the learning process, but as you transition, learning something like Hadoop or Bark, and knowledge in online data science solutions can be very helpful. You have to have some foundational, fundamental training, statistics. But these days, you can learn Python on your own very easily through many sources online sources like LinkedIn.com. The data science industry is fairly new, so the certifications are very limited. But for you to be truly effective you have to be very well rounded, but that doesn’t mean you are an expert in everything. You have to have insight in many of these different things, almost like a conductor in an orchestra — to be able to conduct you have to know a little bit of everything.
What advice do you have for someone who is trying to break into the industry?
There are a few paths. It’s possible for you to pursue a profession on your own, but there are certain topics that you have to master on your own. Another path is following a curriculum that is already available, data science majors are popping up at higher education institutions, you can major in data science, and get a degree in data science. The third option could be focus on getting your foundational knowledge and major in statistics but get a Master’s degree in data scientist.
What do employers want to see?
They are looking for is someone who is familiar with these tools of the trade, so having exposure to data science tools, including programming languages and software applications. But the next thing is someone who can teach himself or herself, who is capable of being a self-motivated learner, and when the task is given the person is capable of solving a problem as much as they can on their own. They really have to be independent in terms of developing their own solutions, and following through, until they are able to complete the task.
This interview has been edited for brevity and clarity.