Reducing bias and discrimination in the labour market with AI (2/2)

Khaoula El Ahmadi

Representative training data is essential to train ‘fair’ algorithms. Lack of good representation of minorities in datasets can lead to biased results in the algorithm. This occurred for example in 2018, when Amazon shut down a tool because it was biased against women of color. 1 Amazon’s hiring tool was supposed to rate job candidates with stars, from one to five stars. However, the system was not rating the job candidates in a gender-neutral way, especially the “technical” jobs such as software developer. Amazon’s hiring tool was trained to vet job candidates by observing patterns in resumes submitted to the company from previous employees from the last decade. The training data consisted -mainly- of resumes from white men. This led to Amazon’s hiring tool teaching itself that male candidates are preferable and eventually discriminating against women by penalizing resumes that contained the words “women”, for example, “women’s chess club”. 2 .

Another example that shows how important representation in training data is, is Street Bump. Street Bump is a smartphone application that uses features such as GPS feeds to report road conditions to the city council. The Street Bump site explains: "Volunteers use the mobile app to collect road condition data while they drive. The data provides governments with real-time information to fix problems and plan long-term investments." 3 If there are fewer smartphone users among people with lower incomes than among wealthier people, people with lower incomes are likely to be underestimated. The effect could be that faulty roads in low-income neighbourhoods are underrepresented in the dataset and therefore receive fewer reparations. This example shows that data collection could inadvertently lead to a biased data set, hence a biased AI system.

Using open data as a solution

So, the question remains, how can we address and prevent bias in data? Perhaps the implementation of open data could help. If the training data of the algorithms is shared as open-source code, it will give other developers the opportunity to chime in and contribute, making for better representation of diversity in society. Also, the fact that the datasets are open, makes it easier to spot “underrepresentation”.

We need more transparency

In the previous piece and here, I have showed how AI systems in recruitment can cause discrimination. In my opinion, current non-discrimination laws and data protection laws are not sufficient in safeguarding fundamental rights such as equality, yet we don’t want to overregulate AI systems that are too intrusive as it may stand in the way of innovation. I think that we should not solely rely on regulation to protect us from harm caused by AI systems. We need to have more engagement and participation from citizens in different societal groups. Opening up data may be a good starting point to make processes like recruitment more transparent. This way, we shed light on the covert biases conveyed onto AI algorithms.

Reducing bias and discrimination in the labour market with AI (2/2)
Image credit:
Michael Dziedzic via Unsplash

For questions and comments, please visit our forum on Futurium.