Data Scientist, Data Engineer, or Technology Manager: Which Job Is Right for You?
Hal Varian, the chief economist at Google, is famous for saying in 2010 that “the sexy job in the next 10 years will be statisticians.” Building on this idea, Thomas Davenport and D.J. Patil wrote "Data Scientist: The Sexiest Job of the 21st Century."
Should you seek training in data science because the job is “sexy”? Probably not. You want to know that the job is in demand. Data scientists are in demand and so are data engineers.
Glassdoor releases annual market surveys and analyzes jobs based on three factors: median base salary, the number of job openings across the U.S., and the overall job satisfaction of employees who hold the position. Since 2016, Glassdoor has consistently ranked data scientist as one of the best jobs, and data engineer is not far behind.
According to the 2021 Glassdoor Annual Report, the median base salary for a data scientist was $113,736. We can think of data engineers as falling within the software engineering category, with a median salary of $110,245, comparable to that of data scientists. The top recruiters for these profiles include big names across all industries, including companies like Amazon, Deloitte, Capital One, and Bayer Corporation.
Data science depends on software/data engineering. For every data scientist position (five thousand in the 2021 Glassdoor survey) there were eight positions for software/data engineers.
The Work of Data Science and Data Engineering
To remain competitive in today’s world, companies need to be informed by data and to use data in their day-to-day operations. That does not happen unless there are data engineers, technical professionals (software engineers, database administrators, cloud architects, and the like). These are people who can translate research results and data science models into systems that work.
In his presentation Putting Data Science into Practice, Tom Miller, faculty director of Northwestern’s data science program, used this figure to distinguish between the roles of data scientists and date engineers:
Activities in blue in the figure are associated with data science. These are research, measurement, and modeling activities. Data scientists translate management questions into research questions, identifying relevant data sources and sampling frames, defining appropriate measures (or what is sometimes called feature engineering), and building and testing models of the data.
Activities in green in the figure are associated with data engineering. Data engineers have a key role to play at the beginning of every data science project. Data scientists depend on data engineers to gather and prepare data for analysis. Without data, there are no analyses, no models to build and test.
Some analytics and modeling projects end with a written report to management or a display of results in a dashboard or presentation. Research findings guide management decisions.
Analysis and modeling projects need not end with a report. Many models are put into practice. They become the way a company conducts its business. Data engineers have key roles to play in building data science applications and implementing information systems.
Putting Data Science into Practice Requires Data Engineering
Data engineers are essential to putting data science into practice. Examples of data engineering abound. Here are a few data science applications that you may encounter day to day:
Recommendation Engines
An online retailer provides product recommendations as you shop. The model that drives the recommendation engine may have been defined by a data scientist, but the system that delivers the recommendations was the work of data and software engineers.
Target Marketing
You receive an email message touting a new product. The targeting method, building on information about market segments, consumer preferences, and willingness to buy may have been devised by a data scientist. But implementing the method is the work of data and software engineers.
Dynamic Pricing
While scheduling your next trip by air, you notice how ticket prices vary greatly depending on the day and time of flight. Those prices are the work of scheduling models designed for to ensure that airplane seats are occupied in response to demand. The model serving an airline’s objective of profit maximization may have been developed by a data scientist specializing in operations research. The system delivering the price quotations is again the work of data and software engineers.
Fraud Detection
Your bank alerts you about a potentially fraudulent transaction, or your credit provider asks you to verify a purchase before executing a charge against your account. A data scientist was probably involved in training and testing an anomaly detection model for these financial institutions. Implementing the system, ensuring it works in near-real-time, as it must in order serve the needs of fraud detection, is the work of data and software engineers.
More About the Need for Data Engineers
Industry experts agree that data engineering skills are essential to addressing the data problems of business. Consider this article by Guillaume Moutier, Senior Principal Data Engineering Architect at Red Hat: The real issue behind the data science skills gap isn’t what you may think.
Data Science Specializations
Putting data science into practice requires data engineers, people who understand how to build end-to-end information systems. Northwestern recognizes the importance of data engineering by offering a Data Engineering specialization.
Seven courses in the Master of Science in Data Science (MSDS) program are closely aligned with the Data Engineering specialization:
- MSDS 431-DL Data Engineering with Go
- MSDS 432-DL Foundations of Data Engineering
- MSDS 434-DL Analytics Application Engineering
- MSDS 436-DL Analytics Systems Engineering
- MSDS 440-DL Real-Time Interactive Processing and Analytics
- MSDS 442-DL Real-Time Stream Processing and Analytics
- MSDS 459-DL Knowledge Engineering
Furthermore, Northwestern’s data science program is the first to feature the Go programming language, along with Python and R. Go is an excellent systems programming language for implementing data science applications, as describe by Tom Miller in Data Science and the Go Programming Language.
The Northwestern data science program offers four other specializations. Analytics and Modeling and Artificial Intelligence are most closely aligned with the data scientist role. And students seeking technology management roles may select Analytics Management or Technology Entrepreneurship.
Choosing the Right Job for You
Data scientists are like chameleons, changing their colors to match the business context. There are data scientists who could just as easily be called marketing researchers, financial analysts, or competitive intelligence professionals, depending on the work that needs to be done.
Analytics and modeling lie at the heart of data science, and many data scientists think of themselves as applied statisticians.
Data science is an eclectic discipline, drawing on many fields of study, as shown in this figure:
To serve the needs of today’s data-driven, data-intensive world, data scientists need to be multilingual, speaking the languages of information technology and business, as well as analytics and modeling.
Students interested in data science application development and systems implementation can specialize in data engineering and seek various information technology positions. Data engineers could just as easily be called software engineers, systems engineers, cloud architects, computer scientists, machine learning engineers, AI engineers, or development and operations (DevOpps) professionals. Job titles may vary from one company or one industry to the next, but the job opportunities are plentiful.
Reading job postings for data scientists and data engineers, you will find considerable overlap in the desired skillsets.
Data Scientist skills:
- Programming
- Cloud computing
- Database management
- Data visualization
- Probability and statistics
- Multivariate calculus and linear algebra
- Machine learning & deep learning
Data Engineer skills:
- Programming
- Extraction Transformation and Loading (ETL)
- System architecture
- Database design and configuration
- Interface and sensor configuration
Technology Management
Moving from the technology side to the management side of data science, presents additional opportunities. Organizations need people who can understand data science and data engineering, but also speak the language of business. This figure illustrates technology management roles:
As with data science and data engineering, there are many jobs for technology managers. Someone needs to build and manage teams of data scientists and data engineers. Someone needs make decisions about information infrastructure.
Communication skills are essential to technology management. There is great need for people who can understand business problems as well as technical solutions to problems. There is great need for people who can translate technical jargon into language that non-technologists can understand.
Whatever role is best for you—data scientist, data engineer, or technology manager—Northwestern's Master of Science in Data Science program will help you to prepare for the jobs of today and the jobs of the future. Contact an admissions adviser to learn more.