Data science recruitment: methods and tools

Yuma Heymans
5 min readDec 9, 2021

Data scientists are some of the smartest people alive and hard to find. This guide helps you to find, assess and hire them.

Data scientists: who you’re talking to

Data scientists are increasingly sought after by companies. Especially tech companies employ data scientists to get the right data and make sense out of data. Some companies employ data scientists to improve the product they are offering and some companies need data scientists to fuel the organization’s data driven decision making process.

The role of data scientist is one where many different qualities come together. Every data scientist has to understand what data has to lead up to (the desired business outcome), use complex statistical concepts to create valid insights and use the right tools and code to extract, scrub and analyse data.

The core skills of a data scientist find their origins in domain knowledge, statistics and programming.

Domain knowledge

The ideal data scientist has a good understanding of the desired business outcomes. In every company the data they work with is different. A data scientist should understand the features of the data for the company in question in order to handle the data correctly and deploy the right algorithms.

Statistics

Data scientists need a good understanding of statistics. The models they deploy are based on math and specifically statistical concepts. Concepts and approaches like linear regression, the bell curve, central tendency, variability, variance and standard deviation should be a piece of cake for them.

Programming

Getting data, data scrubbing and data analysis requires the use of the right tools, Application Programming Interfaces (API’s) and in many cases custom code. In data science code is written in for example Python or R and has to integrate with back-end code that is written in other languages. Other tech skills include:

  • Programming languages like Scala, JavaScript, SQL, Spark, C, and C++
  • Libraries like scikit-learn, pandas, NumPy, Matplotlib
  • Data tools like Excel, Tableau, Hadoop, SAS

Different types of data scientists

Data science is a broad domain and there are many things to master within the field. Therefore in many organisations, especially larger organisations, there are usually specialized roles within a data science team that collectively work towards a shared goal.

These are the basic types of data scientists that companies need:

Data engineer

Data engineers are focussed on getting and preparing data. The goal of the data engineer is to prepare data so it can be used for further analysis and decision making. The most important activities to achieve this are data extraction, data consolidation and data cleansing (data scrubbing).

Data researcher

Data researchers are focussed on finding patterns in data, providing insights from the data to their team or customers and build analytics solutions so data rookies can use the insights derived from data.

Machine Learning expert

Machine Learning experts are specialised in learning models and algorithms. They research, build and test learning algorithms that are deployed in self learning products or for organisational purposes.

Next to the above mentioned roles there can be specializations like the data quality engineer, database administrator, data modeler, BI engineer or data architect.

Your sourcing mix to find the best data scientists

Data talent can be found across a variety of sources. Many recruiters would start their search on LinkedIn but there are niche platforms that match a lot better with the sought after data talent pool.

Kaggle

Kaggle is an online community for data scientists and machine learning experts and enthusiasts where Kagglers participate in data science challenges. On the platform users also share data sets, collaborate on code and solve data science challenges. Companies can post their challenges to Kaggle so users can choose to compete in the data challenges and have the opportunity to win prize money.

With 5 million data scientists and machine learning experts, Kaggle is the go to source for finding data talent.

Kaggler profiles are very rich in relevant information about skills and activity with particular technologies, libraries and frameworks used.

Here’s how to source Kaggle.

Stack Overflow

‍Stack Overflow is a question and answer website for engineers. Users can earn reputation points and “badges” by providing valuable answers. Next to the reputation of individual engineers you can also find a lot of information about the most recent technologies they have been working with.

Most of the information like top technology tags, reputation, badges and scores are based on actual activity rather than own input which makes the information very reliable from a sourcing perspective.

With 14 million engineers and rich technology skills information based on Q&A’s, Stack Overflow is a must for sourcing data talent.

Here’s how to source Stack Overflow.

GitHub

GitHub is a code repository and version control platform, fuelled by the functionality of Git, plus additional features. GitHub accounts are free and are frequently used to host open-source projects where engineers deposit their repositories.

The benefit of sourcing on GitHub is that the information on talents is very up to date and relevant to their technical skills. If you are willing to take some time to research candidates on, you start to see which users are active and developing and sharing relevant code.

With 65 million users GitHub is the platform with the most active engineering users, beating LinkedIn and any other platform.

Here’s how to source Stack Overflow.

LinkedIn

LinkedIn is the most actively used professional platform in the world. Many recruiters rely on LinkedIn as their single source of candidates. Even though it has a lot of users, recruiters might not find their desired data talent here because there are not a lot of data science candidates that have complete and up to date profiles. In addition to that, competition is fierce on LinkedIn. That said, LinkedIn can still be a good source to include in your sourcing channels.

If you don’t have a LinkedIn premium account or LinkedIn Recruiter seat you can learn here to source LinkedIn without premium features.

Alternative platforms to source data talent

There are many other platforms where data science and machine learning…

Continue reading…

--

--

Yuma Heymans

Co-founder of HeroHunt.ai, the talent search engine for tech companies