Overview
<aside>
💡 Due to the essential nature of LLMs and their word predicting nature, these models can sometimes produce outputs that are false, misleading, or even nonsensical. These outputs can range from incorrect facts to entirely fabricated scenarios, often presented with unwarranted confidence. This phenomenon is particularly prevalent in large language models (LLMs) like GPT-3, ChatGPT, and Bard.
</aside>
<aside>
<img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/ac3b5898-58b3-48d3-876a-0f9364cb82fc/a9555a29-2c39-4373-abea-71ea4e5cea02/curt_dupe.jpeg" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/ac3b5898-58b3-48d3-876a-0f9364cb82fc/a9555a29-2c39-4373-abea-71ea4e5cea02/curt_dupe.jpeg" width="40px" />
This recent article/study by scientists at Polytechnic University of Valencia, Spain point out that the larger the LLMs scale, often, the worse they get at answering basic questions.
https://www.newscientist.com/article/2449427-ais-get-worse-at-answering-simple-questions-as-they-get-bigger/
https://www.nature.com/articles/s41586-024-07930-y
</aside>
Most Embarrassing Examples
- Google's Bard AI Blunder
In February 2023, during a public demonstration of Google's Bard AI, the system incorrectly claimed that the James Webb Space Telescope had taken "the very first pictures of a planet outside of our own solar system"[6]. This factual error led to a sharp decline in Alphabet's (Google's parent company) stock price, wiping out approximately $100 billion in market value within a day[6]. This incident highlighted the risks associated with rushing AI technology to market without thorough testing and validation.
- Air Canada's Chatbot Misinformation
In a recent case, Air Canada was ordered to pay damages to a passenger after its AI-powered virtual assistant provided incorrect information about bereavement fares[1][5]. The chatbot falsely stated that passengers could apply for bereavement discounts after purchasing tickets, contradicting the airline's actual policy. This error not only resulted in financial compensation but also raised questions about the reliability of AI-driven customer service tools and the legal implications of AI-generated misinformation.
- Zillow's Algorithmic Home-Buying Disaster
In 2021, Zillow's home-flipping unit, Zillow Offers, faced significant losses due to errors in its machine learning algorithm used to predict home prices[3]. The algorithm's inaccuracies led to Zillow overpaying for properties, resulting in the company writing down millions of dollars and laying off about 25% of its workforce. This case demonstrates how AI errors in critical business operations can have severe financial consequences.
- Legal Consequences of ChatGPT Hallucinations
In a legal context, a lawyer named Steven A. Schwartz faced a $5,000 fine after submitting a legal brief containing non-existent court cases fabricated by ChatGPT[4]. This incident not only resulted in financial penalties but also highlighted the dangers of relying on AI-generated information without proper verification, especially in professional settings.
- Microsoft's Bing Chat Errors
During a public demonstration similar to Google's Bard incident, Microsoft's Bing Chat AI provided inaccurate financial data about major companies like Gap and Lululemon[4]. While the immediate financial impact was not as severe as Google's case, it still resulted in public embarrassment and raised concerns about the reliability of AI-powered search and information tools.
Causes
- Training Data Issues: Insufficient, biased, or unrepresentative training data can lead to hallucinations. If the model encounters scenarios not well-covered in its training data, it may generate inaccurate responses[1][6].
- Model Complexity and Overfitting: Highly complex models can sometimes overfit to their training data, meaning they perform well on known data but poorly on new, unseen data. This can result in the generation of incorrect outputs when faced with unfamiliar inputs[1][6].
- Encoding and Decoding Errors: Errors in how the model encodes and decodes information can lead to hallucinations. For example, if the model incorrectly correlates different parts of the training data, it may generate responses that diverge from the input prompt[4][6].
- Lack of Constraints: Without clear boundaries or constraints, AI models may produce a wide range of outputs, increasing the likelihood of hallucinations. Implementing probabilistic thresholds and filtering tools can help mitigate this issue[1][6].
Mitigation Strategies
To reduce the frequency and impact of AI hallucinations, several strategies can be employed:
- High-Quality Training Data: Ensuring the model is trained on diverse and representative datasets can help improve accuracy and resilience[6].