Scaling Down, Boosting Up: Converting Microsoft Phi-2 to GGUF format for Compact Deployments

March 06, 2024

Robotic Russian Dolls

In my previous blog post, I discussed how we can finely tune a small language model, specifically Microsoft Phi-2, and effectively train it on a single GPU to achieve remarkable results. Following that post, a comment caught my attention, where a reader inquired about generating a GGUF file from the fine-tuned model. While I had some knowledge on the topic, I wanted to explore this a bit more. With a little bit of research on the web, I realized that there was not a lot of content on this topic and figured that it might be worthwhile to post a more detailed article on this.

For those who are uninitiated, GGUF is a successor to GGML (GPT-Generated Model Language) developed by the brilliant Georgi Gerganov and the llama.cpp team. It allows for faster and more efficient use of language models for tasks like text generation, translation, and question answering. GGUF is quite popular among macOS users and due to its minimal setup and performance, you can run inference on pretty much any operating system, including running it on a Docker container. How cool is that!

Optimizing Phi-2: A Deep Dive into Fine-Tuning Small Language Models

January 17, 2024

Tuning a Guitar

In my previous blog, I delved into the effective utilization of few-shot prompt techniques to enhance output quality. In many enterprise scenarios, the RAG technique often stands as the primary solution. For those invested in OpenAI, combining the setup of GPTs with Functions typically covers a majority of cases. However, there are instances where this may fall short or where you are looking to create your own fine-tuned model to tailored for specific tasks.

Personally, I foresee the future of Language Model development in the enterprise revolving around the creation of “specialized” smaller models. I imagine that these models will be built to operate efficiently and exhibit a higher degree of accuracy as compared to their larger commercial or open-source counterparts. They’ll be trained on narrow, specific datasets, engineered to produce constrained outputs aimed at solving precise problems.

Imagine training compact models on a corpus comprising customer queries and company-specific FAQs. These bots could then offer more accurate and relevant responses, elevating customer satisfaction and support services. Alternatively, fine-tuning a smaller language model using educational material could aid educators in generating quizzes or providing personalized feedback and learning content to students, tailored to their progress and educational requirements. The possibilities are endless!