Happened upon a good example of the problem with AI generated alt text. It’s a gif of the infamous scene from Star Wars Episode II where Hayden Christensen playing Anakin Skywalker tells Natalie Portman’s character Queen Amadala, “I don’t like sand.” It’s the first phrase in the worst lines of dialogue in cinematic history. And here’s how the computer interpreted the gif…

Screenshot of the image upload interface of Bluesky that autocompletes alt-text. As described in the post there’s a gif from Star Wars Episode II where Anakin tells Queen Amadala “I don’t like sand”, but the alt text says “a man and a woman are sitting next to each other and the woman is saying I don’t like sand”

The AI said “a man and a woman are sitting next to each other and the woman is saying I don’t like sand.” Not only does this miss the context that it’s from the world famous movie Star Wars, it gets the speaker entirely wrong. It OCR’d the text fine, but missed relaying the entire context.

Context. That’s what alt text is. I’d hypothesize that context is probably one of the biggest reasons why language evolved. A caveman could yell “AHHH!H” to alert others of dangers, but I imagine after a couple rounds of those others would begin asking, “Ok, Grog, I understand you think there’s an AHHH!H out there, but what kiiiiiiind of AHHH!H?”

My ears are starting to perk up when I hear the word context. I hear all the time that “You have to give the AI more context and it will give you a better answer.” AI has a lot of vectorized text and image processing nodes, but it’s short on context I guess. Will AI one day have every frame of the expensive-to-license Star Wars franchise indexed in its brain like I do? Or are we the gatherers and keepers of context? Does our capacity for empathy (which allows us to project and inhabit other contexts we don’t necessary have) set humans apart?

I hate to leave on such an open-ended question, but I think we can all agree… that woman doesn’t like sand.