Data science team roles
Describe “Who†your applied project will include. Who will assume key roles and responsibilities? Who are the stakeholders? Who will be involved in change control?
- Skillset of a data scientist
- Type A stands for Analysis. This person is a statistician that makes sense of data without necessarily having strong programming knowledge. Type A data scientists perform data cleaning, forecasting, modeling, visualization, etc.
- Type B stands for Building. These folks use data in production. They’re excellent good software engineers with some stats background who build recommendation systems, personalization use cases, etc.
- Chief Analytics Officer/Chief Data Officer. In our whitepaper on machine learning, we broadly discussed this key leadership role. CAO, a “business translator,†bridges the gap between data science and domain expertise acting both as a visionary and a technical lead. You may get a better idea by looking the visualization below.
- Preferred skills: data science and analytics, programming skills, domain expertise, leadership and visionary abilities
- Data analyst. The data analyst role implies proper data collection and interpretation activities. An analyst ensures that collected data is relevant and exhaustive while also interpreting the analytics results. Some companies, like IBM or HP, also require data analysts to have visualization skills to convert alienating numbers into tangible insights through graphics.
- Preferred skills: R, Python, JavaScript, C/C++, SQL
- Business analyst. A business analyst basically realizes a CAO’s functions but on the operational level. This implies converting business expectations into data analysis. If your core data scientist lacks domain expertise, a business analyst bridges this gulf.
- Preferred skills: data visualization, business intelligence, SQL
- Data scientist (not a data science unicorn). What does a data scientist do? Assuming you aren’t hunting unicorns, a data scientist is a person who solves business tasks using machine learning and data mining techniques. If this is too fuzzy, the role can be narrowed down to data preparation and cleaning with further model training and evaluation.
- Preferred skills: R, SAS, Python, Matlab, SQL, noSQL, Hive, Pig, Hadoop, Spark
- To avoid confusion and make the search for a data scientist less overwhelming, their job is often divided into two roles: machine learning engineer and data journalist.
-
- A machine learning engineer combines software engineering and modeling skills by determining which model to use and what data should be used for each model. Probability and statistics are also their forte. Everything that goes into training, monitoring, and maintaining a model is ML engineer’s job.
- Preferred skills: R, Python, Scala, Julia, Java
- Data journalists help make sense of data output by putting it in the right context. They’re also tasked with articulating business problems and shaping analytics results into compelling stories. Though required to have coding and statistics experience, they should be able to present the idea to stakeholders and represent the data team with those unfamiliar with statistics.
- Preferred skills: SQL, Python, R, Scala, Carto, D3, QGIS, Tableau
- Data architect. This role is critical for working with large amounts of data (you guessed it, Big Data). However, if you don’t solely rely on MLaaS cloud platforms, this role is critical to warehouse the data, define database architecture, centralize data, and ensure integrity across different sources. For large distributed systems and big datasets, the architect is also in charge of performance.
- Preferred skills: SQL, noSQL, XML, Hive, Pig, Hadoop, Spark
- Data engineer. Engineers implement, test, and maintain infrastructural components that data architects design. Realistically, the role of an engineer and the role of an architect can be combined in one person. The set of skills is very close.
- Preferred skills: SQL, noSQL, Hive, Pig, Matlab, SAS, Python, Java, Ruby, C++, Perl
- Application/data visualization engineer. Basically, this role is only necessary for a specialized data science model. In other cases, software engineers come from IT units to deliver data science results in applications that end-users face. And it’s very likely that an application engineer or other developers from front-end units will oversee end-user data visualization.
- Preferred skills: programming, JavaScript (for visualization), SQL, noSQL.
- A machine learning engineer combines software engineering and modeling skills by determining which model to use and what data should be used for each model. Probability and statistics are also their forte. Everything that goes into training, monitoring, and maintaining a model is ML engineer’s job.
Submission Requirements:
- This assignment should be at least 500 words in length.
- Submitted as a Microsoft Word document attachment.