Generative AI has seen an unprecedented surge in the market, and it’s truly remarkable to witness the rapid advancements in technology. Just 11 months since the launch of ChatGPT, we’ve made tremendous progress. What’s particularly fascinating is the swift adoption of generative AI in the open-source community, resulting in the emergence of numerous Large Language Models (LLMs). These open-source LLMs are rapidly narrowing the gap between themselves and commercial counterparts like ChatGPT and Bard (source: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither).
Among the open-source LLMs, two have captured my attention: Llama 2 and CodeLlama. What sets them apart is their accessibility, especially for users like me who can download and run their smaller models (e.g., 7B or 13B) on a CPU-only laptop. Admittedly, my gaming laptop boasts impressive specs, but it’s equipped with an AMD GPU, lacking CUDA support. While there are potential workarounds like DirectML and WSL, I chose not to delve into those complexities. Instead, I opted to work with my current setup, featuring 16 GB of RAM.
However, this decision came with limitations. It ruled out the possibility of using the base models of Llama, even the 7B variant, which alone would demand a minimum of 28 GB of RAM — without factoring in gradients and optimizer states, which would substantially increase memory requirements. In light of these constraints, I found a suitable solution by downloading “The Blokes” 4-bit Quantized model from Hugging Face, which met my development needs effectively.
Simultaneously, I am interested in the concept of “Retrieval Augmented Generation” (RAG) and how enterprises can harness this approach to extract information from backend data stores or APIs. For example, when working with CodeLlama, we can furnish the LLM with table schemas and ask questions in natural language, prompting it to generate the necessary SQL script for retrieving the desired information. To enhance this process, we can employ LangChain for multiple purposes: calling upon the LLM, executing the SQL query against a database, retrieving the results, and then feeding them back into the LLM to produce the final output. What’s particularly promising is the synergy between open-source LLMs and this approach. Enterprises can now leverage their own large language models without concerns about sharing sensitive company data with commercial LLM providers like OpenAI. This opens up exciting possibilities for businesses to maximize the value of their data in a secure and efficient manner.
In this blog post, I will walk you through a specific scenario in which I run CodeLlama on my local setup. I’ll demonstrate the integration of LangChain to interact with the LLM and execute a query against a MySQL Sakila database. My primary focus will be on showcasing the effective use of the Few Shot Prompting technique to train the LLM, enabling it to deliver the desired results we are seeking.
The Scenario — Interacting with a MySQL Database Using Natural Language
Before we delve into the code, let’s discuss the scenario at hand. Our objective is to engage with the Large Language Model (LLM) in a natural language context, wherein the LLM generates queries based on the questions we pose and subsequently executes these queries using chains. In this specific example, we have chosen to work with the MySQL database, specifically the Sakila database.
While the Sakila database may be relatively small, it presents sufficient complexity for our purposes. We anticipate the need to craft queries that span multiple tables within a single SQL statement to extract the relevant information we seek. To provide a visual representation of the database’s structure, here’s an ER Diagram sourced directly from MySQL. You can see multiple interrelations across the various tables within the database.
The Prerequisites — Setting Up the Environment and Installing Required Packages
If you wish to access the code and run it on your local system, you can find it on GitHub at https://github.com/yernenip/CodeLlama-LangChain-MySql. I have also added instructions for setting up and executing the code in the GitHub repository. Here are a few essential things to consider before you begin:
- Memory Requirements: Initially, I intended to work with 16 GB of RAM, as mentioned earlier in this post. However, I quickly realized that this allocation was insufficient. Windows itself consumes a significant portion of memory, and running a Docker container further reduces available resources. While I did manage to execute the code with 16 GB of RAM during my initial tests, the process was painfully slow, and at times, it terminated prematurely. To overcome this, I upgraded my system to a total of 24 GB of RAM, providing a more reliable, if not faster, environment.
- Setup Instructions: Once you’ve ensured your hardware meets the necessary specifications, I recommend following the instructions outlined in the GitHub link provided above. This will help you set up your Docker container (including MySQL with the Sakila database), install Jupyter if you plan to use it locally, and install the required Python packages.
By addressing these prerequisites, you’ll be well-prepared to proceed with the code implementation and exploration of CodeLlama and LangChain in your local environment.
Zero shot prompting
Initialization
Let’s now dive into the code. The initial step involves downloading the quantized LLM model and configuring it with a context length of 10,000. While this context length might seem excessive and could potentially slow down the model due to processing a vast amount of context, it’s necessary for querying the database. We need to provide schemas for all the tables and examples for few-shot prompting. I recommend experimenting with the length and other parameters mentioned in the configuration based on your specific use case.
Additionally, I’m explicitly downloading the GGUF format, a newly introduced format by the llama.cpp team, considered a replacement for the GGML format.
Please be patient as the model download may take several minutes, given its size of approximately 3.5 GB. Allow the download to complete before proceeding.
Connecting to database and fetching schema info
Once the model is downloaded, we can establish a database connection using LangChain’s wrapper on SQLAlchemy called “SQLDatabase.” We can then retrieve and print out the table information.
This table information is crucial as we will pass it along with our prompt using LangChain’s database chain.
With the LLM and the database connection in place, we can now prompt the LLM with a simple question. I’m using “verbose” to examine additional details and the generated prompt.
LangChain’s default prompt template is comprehensive, providing valuable information to the LLM. Unfortunately, due to space constraints, I’m unable to share the entire prompt here. However, I’ve included a small snippet of the screenshot below. Please feel free to run it yourself to examine the prompt’s details.
Vanilla Prompt Example
After the code has finished executing, here is the final output.
As you can see, CodeLlama generates a SQL query using subqueries. While subqueries are perfectly valid (and sometimes preferred), I have a preference for using JOINs in this context. Additionally, I’ve noticed another issue: when I asked questions that required more complex queries, CodeLlama didn’t provide a query in response. This could potentially be attributed to the 7B version I am using. However, let’s explore if we can address this by employing few-shot prompting.
Few shot prompting
Setting up the example prompts
Let’s begin by preparing an array of examples, each containing the input prompt, the expected SQL query, the result, and the final answer. We will use these examples to create a template for prompting the LLM, using the PromptTemplate provided by LangChain.
Vectorizing the examples and using an example selector
Next, we will convert our examples into vector form using sentence transformers and employ a semantic similarity example selector. While vectorization is not strictly necessary, it’s preferred as it transforms our sentences into embeddings, capturing the meaning and context of sentences. This enables LLMs to leverage semantic information, resulting in better-generated responses.
I am using Chroma DB here which is a vector store and can run in-memory in a python or Jupyter notebook.
Setting up the few shot prompt
Now, let’s configure the few-shot prompt variable that we’ll be passing to the large language model.
The _mysql_prompt and PROMPT_SUFFIX variables contain additional prompt text that provides instructions to give more context to the LLM. They also include the input variables.
Calling the LLM with few shot prompting
Finally, let’s prompt the LLM using the few-shot prompt and examine the result. My question differs slightly from the example; I replaced “United States” with “Canada.”
Here’s a snippet of what the prompt looks like. You can see that the example for “United States” was selected.
Prompt with Examples
And here is the output.
I also wanted to assess if the few-shot prompting has improved context for the LLM. I posed a different question unrelated to the tables mentioned in my example prompts to see how the LLM responds.
I prompted the LLM with the following question:
To my delight, the LLM provided an answer very close to the expected output shown above. Here is the output for reference.
Conclusion
The rapid advancements in AI, especially with open-source tools like Llama and CodeLlama, are changing how we talk to databases using everyday language. The combination of LangChain and Few Shot Prompting makes it easier for us to ask complex questions and get meaningful answers.
As we explored CodeLlama and LangChain, we saw how AI can help us get insights from data while keeping our information safe. Businesses can now use big AI models without sharing sensitive data, which is a big deal.
This journey shows us that AI can make data tasks simpler and more accessible. Whether it’s asking database questions, finding specific info, or answering tricky queries, open-source AI tools and clever techniques are making it easier for businesses to make the most of their data.