Co-authored by Saeed Aghabozorgi and Polong Lin.
Data Scientists and Data Engineers may be new job titles, but the core job roles have been around for a while. Traditionally, anyone who analyzed data would be called a “data analyst” and anyone who created backend platforms to support data analysis would be a “Business Intelligence (BI) Developer”.
With the emergence of big data, new roles began popping up in corporations and research centers — namely, Data Scientists and Data Engineers.
Here’s an overview of the roles of the Data Analyst, BI Developer, Data Scientist and Data Engineer.
Data Analysts are experienced data professionals in their organization who can query and process data, provide reports, summarize and visualize data. They have a strong understanding of how to leverage existing tools and methods to solve a problem, and help people from across the company understand specific queries with ad-hoc reports and charts.
However, they are not expected to deal with analyzing big data, nor are they typically expected to have the mathematical or research background to develop new algorithms for specific problems.
Skills and Tools: Data Analysts need to have a baseline understanding of some core skills: statistics, data munging, data visualization, exploratory data analysis, Microsoft Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Microsoft Access, Tableau, SSAS.
Business Intelligence Developers
Business Intelligence Developers are data experts that interact more closely with internal stakeholders to understand the reporting needs, and then to collect requirements, design, and build BI and reporting solutions for the company. They have to design, develop and support new and existing data warehouses, ETL packages, cubes, dashboards and analytical reports.
Additionally, they work with databases, both relational and multidimensional, and should have great SQL development skills to integrate data from different resources. They use all of these skills to meet the enterprise-wide self-service needs. BI Developers are typically not expected to perform data analyses.
Skills and tools: ETL, developing reports, OLAP, cubes, web intelligence, business objects design, Tableau, dashboard tools, SQL, SSAS, SSIS.
Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists. They are software engineers who design, build, integrate data from various resources, and manage big data. Then, they write complex queries on that, make sure it is easily accessible, works smoothly, and their goal is optimizing the performance of their company’s big data ecosystem.
They might also run some ETL (Extract, Transform and Load) on top of big datasets and create big data warehouses that can be used for reporting or analysis by data scientists. Beyond that, because Data Engineers focus more on the design and architecture, they are typically not expected to know any machine learning or analytics for big data.
Skills and tools: Hadoop, MapReduce, Hive, Pig, MySQL, MongoDB, Cassandra, Data streaming, NoSQL, SQL, programming.
A data scientist is the alchemist of the 21st century: someone who can turn raw data into purified insights. Data scientists apply statistics, machine learning and analytic approaches to solve critical business problems. Their primary function is to help organizations turn their volumes of big data into valuable and actionable insights.
Indeed, data science is not necessarily a new field per se, but it can be considered as an advanced level of data analysis that is driven and automated by machine learning and computer science. In another word, in comparison with ‘data analysts’, in addition to data analytical skills, Data Scientists are expected to have strong programming skills, an ability to design new algorithms, handle big data, with some expertise in the domain knowledge.
Moreover, Data Scientists are also expected to interpret and eloquently deliver the results of their findings, by visualization techniques, building data science apps, or narrating interesting stories about the solutions to their data (business) problems.
The problem-solving skills of a data scientist requires an understanding of traditional and new data analysis methods to build statistical models or discover patterns in data. For example, creating a recommendation engine, predicting the stock market, diagnosing patients based on their similarity, or finding the patterns of fraudulent transactions.
Data Scientists may sometimes be presented with big data without a particular business problem in mind. In this case, the curious Data Scientist is expected to explore the data, come up with the right questions, and provide interesting findings! This is tricky because, in order to analyze the data, a strong Data Scientists should have a very broad knowledge of different techniques in machine learning, data mining, statistics and big data infrastructures.
They should have experience working with different datasets of different sizes and shapes, and be able to run his algorithms on large size data effectively and efficiently, which typically means staying up-to-date with all the latest cutting-edge technologies. This is why it is essential to know computer science fundamentals and programming, including experience with languages and database (big/small) technologies.
Skills and tools: Python, R, Scala, Apache Spark, Hadoop, data mining tools and algorithms, machine learning, statistics.