Discrimination remains a significant issue in the European Union. In the latest Eurobarometer survey, more than half (64%) of the respondents reported that discrimination on the grounds of ethnic origin is widespread across the EU. 1 In the same survey, 46% of the respondents reported that having equal opportunities and access to the labour market is one of the most important pillars for the EU’s economic and social development.
In this first piece of the series: Reducing bias and discrimination in the labour market with open data, I will argue that the recruitment process, like the labour market, is influenced by discrimination and that open data can play help eliminate it.
Recruitment tools can be black boxes
The recruitment process consists of the gathering and selecting of potential job candidates by matching the job’s requirements to candidates’ profiles. The process includes administrative work, data management, and strategic decisions. All these tasks can be automated by using algorithms or AI recruitment tools. Many companies make use of these tools already. Yet, algorithms and AI systems are often "black boxes". The decisions an algorithm takes are often not transparent, so it is difficult to trace why the system took a certain decision 2 , which makes it difficult for people to assess whether they were fairly evaluated. In some ways, algorithms or AI systems can lead to discrimination, let me explain why.
Correlations make AI system work, and they can be discriminatory
AI systems look for correlations in data sets. For example, when a company develops an object detection tool, the developer of the tool feeds the system with pictures or video content and the human “tells” the machine what that content represents, e.g., a house, a tree, a car. This content is the training data. On the basis of this data, the system learns what the characteristics of each object are, and uses that knowledge to label incoming, unfamiliar (new) data. The AI system thus uses historical data to discover patters and find relationships between other features.
To put it in technical terms, a target variable defines what developers are looking for, and a class label divide all possible values of the target variable into categories. Though the class labels (i.e., whether an image get labelled as a house or a tree) are typically pre-defined, sometimes, defining a target variable can result into the creation of new classes.
In our recruitment example, imagine a company is looking for new employees and will use an AI recruitment tool. The AI must be given a definition of what a “good employee” is for this company to filter and find the right candidates. Let us say the company chooses to define employees who often arrive at the office on time, as a class label for “good employees”. Say the company is located in the city centre of Amsterdam, people who live outside the city centre are perhaps less likely to be on time due to traffic jams or problems with public transport. People with a foreign background often live in the suburbs of cities 3 , potentially making them late for work more often than their colleagues who live in the city centre. In this case, the class label will put employees with a foreign background at a disadvantage and brings the risk of classifying certain groups on the basis of their postal code alone.
But let us say we take another definition of a “good employee”, perhaps, someone with an impressive educational background. Particularly in the U.S., the top schools are private [expensive] schools. If the AI system labels on this basis, people with lower incomes are discriminated against.
Proxies (substitutes used to measure variables that can’t be measured directly) also pose a risk of discrimination. A bank will predict which loan applicants might not be able to repay the loan based on a characteristic like postal code for example 4 . Based on historical data, the system has found hat, on average, people from a certain area are less likely to repay their loans. Now this postal code will be associated automatically with not being able to repay. As postal code also correlates with ethnicity, this becomes problematic.
Now that we have established the problem, we can dive into the solutions. In the next opinion piece, I will show how to reduce this bias and train AI systems that are fair.