Out of Domain | Notion

Out of domain - Front.svg

Overview

<aside> 💡 GenAI systems or LLMs, despite their impressive capabilities, have significant limitations in their domain knowledge compared to human experts. This is mostly related to their reliance on internet-based training data. Wikipedia is awesome, but some concepts are not covered in depth enough for them to be represented accurately in the models.

Solutions to this challenge usually involve RAG (see web search) or custom fine tuning of base models.

</aside>

The specific causes:

Domain-specific gaps: Certain specialized fields may have limited representation in internet data, leading to poor performance of AI models on tasks requiring deep domain expertise.
Rapidly changing information: Internet stories become outdated, especially for fast-moving fields like politics and science. AI models may struggle to stay current without frequent retraining. Ask Claude for “the most recent Falcon9 Launch” and it will confess it’s knowledge cutoff is April 2024. NOTE: ChatGPT, like Gemini, is using live web searches (see RAG) to mitigate this issue.
No knowledge of Physicality: Trained solely on text data lack, they lack embodied sensory understanding gained from physical interaction with the world, limiting their ability to reason about physical concepts or real-world scenarios.
Misinformation: The internet has every opinion out there, good, bad and ugly. AI models may inadvertently learn and reproduce these inaccuracies if not carefully filtered. NOTE: the large LLM providers spend a lot of resources to mitigate this issue when they work on “Alignment” and “Fine Tuning” with Reinforcement Learning with Human Feedback (RLHF). That literally means, low-wage workers in Nigeria and the Philippines ‘teach’ the LLM to behave.

Citations: [1] https://www.gartner.com/peer-community/post/most-significant-limitations-genai-today-how-maintain-awareness-these-limitations-considering-applications-possible-use-cases [2] https://libguides.usc.edu/generative-AI/limitations [3] https://www.techtarget.com/searchenterpriseai/tip/What-are-the-risks-and-limitations-of-generative-AI [4] https://lingarogroup.com/blog/the-limitations-of-generative-ai-according-to-generative-ai [5] https://www.harvardonline.harvard.edu/blog/benefits-limitations-generative-ai

Mitigation Strategies

Businesses can explore ways to supplement and enhance GenAI knowledge:

Domain-specific data integration: Incorporate specialized datasets and knowledge bases relevant to the business's industry or domain. This can help address the "prompting gap" by providing more accurate and contextual information for specific fields.
Expert knowledge capture: Develop systems to capture and integrate expert knowledge from within the organization. This can involve creating structured databases of internal expertise, best practices, and institutional knowledge that may not be readily available on the internet.
Real-world data incorporation: Integrate real-world data from sensors, IoT devices, or operational systems to provide GenAI models with up-to-date, practical information that goes beyond text-based internet data. This can help address the lack of real-world experience in current models.
Multimodal learning: Implement systems that combine text, images, audio, and video data to provide a more comprehensive understanding of concepts. This can help overcome the limitations of purely text-based interfaces and improve the model's ability to understand and generate content across different modalities.
Continuous learning and updating: Develop mechanisms for ongoing model updates with the latest information, research, and data relevant to the business. This can help address the recency bias and ensure the model stays current with rapidly changing information.
Collaborative human-AI systems: Create interfaces that allow for seamless collaboration between human experts and AI systems. This can help leverage the strengths of both human knowledge and AI processing capabilities, addressing limitations in complex reasoning and domain-specific expertise.