I’m Smart In My Own Language
How do language nuances impact AI image generation?
AI GENERATIVE IMAGES | INCLUSIVE TECH | USER RESEARCH
This project delves into the intricate interplay of language subtleties and AI image generation, unearthing the complexities in capturing cultural nuances and posing critical questions about the dominance of English in AI training data sets.
Overview
We collaborated with a diverse group of international students to translate native language descriptions into English for image creation.
This experiment underscores the imperative of diverse language tagging and provokes ethical contemplation, providing pertinent insights for recruiters grappling with AI's inherent constraints.
Inspiration
Based on the Sapir–Whorf hypothesis, a language's structure can shape a speaker's worldview or cognition, implying that our perceptions are influenced by the language we speak. This concept immediately brings to light the question, "How can large language models understand the subtleties of local expressions?"
This question was inspired by a conversation with Mica, a fellow international student, who expressed her struggle to find English equivalents for her ideas and thoughts. It was a common problem shared by several other students, igniting the thought that being perceived as "smart" often correlated with one's ability to articulate in English.
Most AI image generators are trained on English-language image datasets, which could potentially narrow our perceptions and worldview, as suggested by the theory. This observation leads us to another question, "Can language foster more collaborative work?"
What did we do?
To explore this further, we reached out to our diverse cohort of classmates and asked them to describe different photographs in their native languages. These descriptions were then translated into English using Google Translate, as the base Stable Diffusion model takes in English-language input. The resulting images, paired with the prompts used to generate them, allowed viewers to reflect upon how today’s text-to-image models may enable visual communication across languages while still struggling with specific identifiers or contexts that may be highly specific to a given culture.
The results were intriguing. Some descriptions differed because of subtle cultural nuances used in the descriptions. Whether it was context, sentence framing, or using gendered terms, the images differed because the machine learning software couldn't capture the subtle language nuances, as it wasn't trained on those data sets.
The Future of AI
In collaboration with Mathew Olson and Dror Margalit for Hypercinema with Gabriel Barcia-Colombo
The future of data and technology can seem daunting, particularly considering the rapid pace of growth. The rising popularity of AI technologies like ChatGPT often offers a shortcut, but it may also facilitate the use of biased data or a lack of fact-checking. This brings us to the heart of the matter: Can we train these models to create better, unbiased data sets by including different languages? Can understanding the unique features of local customers and languages improve communication and services, and pave the way for a more inclusive