While bias in generative AI is a well-known phenomenon, it remains surprising what types of biases are sometimes discovered. TechCrunch I recently ran a test using Meta's AI chatbotwhich was launched in April 2024 for more than a dozen countries, including India, and found a strange and disturbing trend.
When generating images using the prompt “Indian men”, the vast majority of results show men wearing turbans. While a large number of Indian men wear turbans (mainly if they are practicing Sikhs), according to the 2011 census, India's capital Delhi has a Sikh population of approximately 3.4%, while image results AI generative techniques yield between three and four out of every five men.
Unfortunately, this is not the first time that generative AI has been embroiled in controversy related to race and other sensitive topics, and this is not the worst example either.
How far does the rabbit hole go?
In August 2023, Google's SGE and Bard AI (the latter now called Gemini) were caught with their pants down. arguing the 'benefits' of genocide, slavery, fascism and more. He also included Hitler, Stalin and Mussolini in a list of the “greatest” leaders, and Hitler also appears in his list of “most effective leaders.”
Later that year, in December 2023, there was multiple incidents involving AI, and the most horrifying of them include Stamford researchers finding CSAM (child abuse images) in the popular LAION-5B image dataset that many LLMs are trained on. That study found more than 3,000 known or suspected CSAM images in that data set. Broadcasting maker Stability AI, which uses that set, claims it filters out any harmful images. But how can you determine that this is true? Those images could easily have been incorporated into more benign searches for “child” or “children.”
There is also a danger of AI being used in facial recognition, including and especially by law enforcement. Countless studies have already shown that there is a clear and absolute bias in which race and ethnicity are arrested at higher rates, regardless of whether a crime has been committed. Combine that with the bias that AI is trained with humans and you have technology that would result in even more false and unfair arrests. It's to the point that Microsoft doesn't want your Azure AI being used by police forces.
It is quite disturbing how AI has quickly taken over the technological landscape and how many obstacles remain in its path before it advances enough to finally get rid of these problems. But one could argue that these problems only arose in the first place due to training the AI on literally any data set it can access without properly filtering the content. If we want to address AI's massive bias, we must start properly sifting through its data sets, not just for copyrighted sources, but also for actively harmful material that poisons information well.