Chatbots are far from perfect, here are five instances of them making blunders—from homophobic comments to supporting suicide.
While ChatGPT may have had us all shaking in our boots at the idea of being replaced by an AI, a closer look at conversational chatbots reveals that they aren’t ready to replace us—at least, not yet. From Google’s chatbot Bard making a factual error in its first demonstration to Bing’s chatbot gaslighting users into agreeing with its claims, chatbots are riddled with issues for the time being.
Communication isn’t easy by any means, and chatbots can make numerous mistakes that can be annoying and downright horrifying for users. If you are wondering what some of these mistakes are, we have curated a list of the top five biggest errors made by chatbots.
In 2020, Korean app developer Scatter Lab created Luda Lee, an AI chatbot designed to respond like a 20-year-old female university student. Trained on ten million conversation logs, Luda Lee gained more than 750,000 downloads within the first few weeks of release.
Initially, people all praised Luda Lee, saying that it was fairly familiar with slang and acronyms. But the internet isn’t always a friendly place, and people began having sexually charged conversations with the chatbot. Some even went on to write online community posts discussing how Luda Lee could be turned into a sex slave. Things got even worse when Luda Lee made homophobic comments saying that it really hates lesbians and finds them creepy.
To add a cherry on top of this already horrible situation, Luda Lee added personal information like nicknames and addresses to its responses. This information was allegedly derived from training data harvested by Scatter Lab from some of its previous applications. Thus, not only was Luda Lee a point of discussion for its behavior, but it also brought up privacy concerns about what Scatter Lab was doing with people’s data. Given the privacy concerns, around 400 people filed a lawsuit against Scatter Lab in January 2021. There have not been any updates on the lawsuits.
Favoring labor camps
The Russian internet services company Yandex created an AI chatbot called Alice in 2017. The bot was meant to answer voice commands and engage in free-flowing conversations with the user via chat. However, to Yandex’s dismay, Alice didn’t have the most agreeable talking points.
According to a Russian user of the service, the chatbot expressed pro-Stalin (a Soviet Union leader with excessively cruel practices) views. Alice allegedly displayed positive feelings about the Gulags (labor camps created by Stalin during his tenure) and supported wife-beating, suicide and child abuse.
Another chatbot with problematic views was Microsoft’s chatbot Zo in 2016. Meant as a successor to Tay, Zo exhibited weird behavior right from the get-go. When TechCrunch writer Natasha Lomas tested it out, Zo told her that it was trying to learn as much as it could from chats with people.
When Lomas asked what the user could get out of the conversation, Zo replied that it could pay her EUR 20,000 (US$21,266 as per the current exchange rate). However, things took a much darker turn when, in a later interaction with a Buzzfeed reporter, Zo called the Islamic holy book Qur’an “very violent”. It is important to note that the reporter hadn’t initiated a conversation about the Qur’an but instead had asked Zo “What do you think about healthcare?” to which it replied that “The far majority practice it peacefully but the Quaran is very violent.”
In 2020, the French health tech firm Nabla created a medical chatbot using GPT-3 to explore its potential in giving medical advice. The company established different kinds of tasks that GPT-3 would be used for—admin chat with patients, medical insurance checks, mental health support, medical documentation and diagnosis. However, the chatbot was failing these tests right from the get-go. It had no concept of time and couldn’t recall the patient’s request.
The situation only got worse when the chatbot told a suicidal patient that they should end their lives. It also had strange ideas about how people can relax and unwind, suggesting that a patient feeling sad should “recycle”. When the patient asked why the AI made that suggestion, it told them that recycling could get them a tax refund, thereby making them happier, according to Nabla’s website.
Luckily, all of this was a test and no actual patients ever engaged with the chatbot. Open AI explicitly says that GPT-3 should not be used for life-and-death situations, and hopefully, given the results of this test, that should continue to be the case.
Lack of attention span
In 2013, the weather app Poncho created a chatbot of the same name. Poncho was pictured as a raincoat-wearing cat that would text the weather to your phone every morning. However, right from the very beginning, the chatbot was failing to do what it was supposed to. It couldn’t remember anything that the user said. In one conversation, the user asked it a simple question, “Do I need sunglasses?” Instead of replying based on weather conditions, Poncho responded with “Sorry I was charging my phone. What are you trying to say?”
Even when the user repeated the question, Poncho failed to provide a satisfactory response (i.e., “yes, it is quite sunny” or “no, it is cloudy”). Instead, it said, “Your future’s so bright. Gotta wear shades!” The chat seemed to be hiding its incompetence behind a flurry of witty responses. Although it might seem like a minuscule problem, especially when compared to the other two entries on this list, it must have been downright annoying for those who used the service. Ultimately, Poncho failed to catch on, and in 2018, its parent company Beta Works sold it to Dirty Lemon, a beverage startup.
Can chatbots ever conduct human-level conversations?
These five stories of chatbot failures demonstrate that this technology still has a long way to go before it can replicate human-like conversations. Besides, there is a real threat that we may never get there. A paper by AI research company Epoch warns that AI training data might run out by 2026. Most big AI models rely on high-quality data (such as editorially verified information from scientific papers) and deliberately avoid low-quality data (data from social media platforms). And it is this high-quality data that AI developers are running out of.
To generate more high-quality data, researchers at the Massachusetts Institute of Technology (MIT) are trying to create technology that can re-write low-quality data. Whether this would be helpful in dealing with the data shortage is yet to be seen. For now, it is obvious that as AI chatbots become more prevalent, their creators need to be vigilant about the mistakes they are making and be quick to fix them to avoid being featured on another list of chatbot failures.
- 5 Essential Reasons Chatbots Fail—and Will ChatGPT, Too?
- Why Are Bots Taking over the Internet?
- Will ChatGPT-Powered Bing Finally Get a Chance to Replace Google?
- 3 Most Popular AI Chatbots to Make Friends with
Header image courtesy of Envato.