What are LLMs and what are their limitations?
The latest advancements of Generative Artificial Intelligence (GenAI) are revolutionizing the world. According to the New York Times, more than 56 billion dollars have been invested in Gen AI related startups. This figure shows the bet of big investors around the world for this technology. In addition, the Gartner Curve, which aims to predict the maturity, adoption and application of emerging technologies, placed Gen AI technology at the Peak of Oversized Expectations, evidencing the amount of expectation that exists today for this technology.
But what exactly is a Large Language Model? How does this technology work and what are its limitations? What are the uses of this technology in the business world? In the following article we will provide answers to these questions:
What exactly is a Large Language Model ?
An LLM is a natural language model formed by deep neural networks. Its neural networks have been trained on large amounts of data.
The application of statistical and prediction models to natural language is not new.
In the 1980s and 1990s with n-grams and hidden Markov models, the application of probabilistic mathematics to language was developed, giving rise to a variety of tools and methods for creating more flexible data-driven mathematical models.
But it was not until recently that this technology was truly consolidated with the discovery of the Transformer by Google experts, presented in the famous paper “Attention is all you need”. The Transformer is a neural network that attempts to mimic the attention we humans pay to the context of a word or set of words in a body of text. Let’s see it with an example:
When we read the previous paragraph we establish a relationship between the words coco – perro – patas – jugar. If we only read the last sentence (Coco likes to play tag), we do not know if Coco is a dog or a person. However, thanks to our inherited human attention we take into account the context of the whole paragraph. This is how the Transformer created by goodle calculates the relevance between different words in a text corpus.
This discovery led to ChatGPT3, a chatbot based on the foundational Generation Pretrained Model 3 (GPT-3) that revolutionized the world, becoming the chatbot with the highest active user growth in history. Composed of a neural network with 175 billion parameters, it is capable of generating text, understanding language and answering questions in a surprising way.
These capabilities such as reading comprehension, logical inference or even more advanced tasks for a machine, for example explaining why a joke is funny, would be within the reach of the densest models.
Does this mean the end for humans, and will AI take away our jobs as everything can be automated by these models? Not yet, says Meta’s Chief AI Scientist, Yann Lecun in this interview; LLMs have several limitations that make them unreliable if they are not accompanied by the necessary software architectures.
What are their limitations?
One of the major limitations LLMs have is that they are not able to generate data that is outside the training set. For example, if you ask ChatGPT who Steve Jobs is, it will provide an answer about the famous tech entrepreneur. However, if you ask it about the latest sales made in your company’s sales department, it will not be able to give you an accurate answer. This happens because LLMs do not have direct access to the most up-to-date information happening in the world.
But if we give these Chatbots, connected to LLMs, access to the right context, they would be able to answer any kind of question accurately thanks to their writing power and linguistic understanding.
This is why a new software architecture has recently emerged that manages to solve the aforementioned problem. It is called Retrieval Augmented Generation (RAG) and connects a database with a search engine that contains everything relevant to the user. In this way the LLM will be able to access information that he/she was not trained on.
This turns the problem of the lack of context of LLMs into a problem of information management and search, whose solutions have long been studied and developed in the information sector.
The infrastructure describing a RAG architecture is typically composed of:
- An Ingestion Pipeline that injects and fragments the documents into different parts, commonly called chunks. This pipeline will help us to implement different document fragmentation strategies depending on the data they contain.
- The pipeline will connect with an embedding model to vectorize back and forth the input and output data from the database. These models convert document fragments into sophisticated numerical representations.
- Finally, a vector database, which stores and indexes the information for later retrieval. The most common metric for searching and successfully answering user queries is cosine similarity.
Therefore, by basing answers on up-to-date data, RAG reduces the chances of generating incorrect information in the form of hallucinations, because of the tendency to always answer queries. In addition, fine-tuning or re-training of the model for specific knowledge areas (such as apps with knowledge of mining practices or logistics of fashion products) could be investigated. Updating the database may be sufficient in general use cases but there is scientific literature indicating that LLM fine-tuning can increase the accuracy of the RAG-enhanced application.
However, it is also important to identify some disadvantages:
- The effectiveness of the RAG architecture depends heavily on the quality of the search engine configuration, as well as on a good document preprocessing strategy: choosing the right embedding model.
- The contextual message of LLMs is limited: the amount of text with instructions and practical examples for the AI to perform its function. According to the scientific literature when the size of the context increases, the attention span of the actions performed by the models decreases. Therefore, we will have to write the messages following prompt engineering’s expert recommendations to make sure that everything is interpreted and nothing escapes the LLM’s attention.
- There is a notable evaluation difficulty: evaluating a RAG application is difficult due to the non-deterministic or random nature of LLMs which makes the quality of the information generated variable if the application is not properly tuned. Given the difficulty in applying traditional metrics, continuous evaluation and monitoring of these applications is required.
In conclusion, the combination of Large Language Models (LLMs) with the Retrieval-Augmented Generation (RAG) architecture has marked a breakthrough in the area of Natural Language Processing by mitigating some of the key limitations of LLMs, such as hallucinations and access to updated information. RAG improves the accuracy of LLMs by integrating a search engine, without incurring LLM retraining costs. However, the success of this solution depends on the robustness of the vector database search engine and the availability of relevant information.
LLMs can automate repetitive tasks, improve customer service and facilitate content creation, allowing your team to focus on strategic decisions. However, not all tasks benefit from LLMs. For deep analytics or very specific data-driven decisions, RAG can complement the model by providing up-to-date context.
If you want to learn more about how these technologies can transform your business, contact us at Capitole. Our team will help you identify the most effective applications to optimize your daily operations and make the most of artificial intelligence, as well as develop predictive models.
Follow us on social media!
Ignacio Rodriguez Burgos
Tech Lead Consultant at capitole
Follow us on social media!