Effortless Fine-Tuning of Falcon Models Using QLoRA

Introduction to Falcon Models

The Falcon models have rapidly gained popularity as some of the leading large language models available today due to several compelling factors:

They excel at solving complex problems.
They are more compact than many other LLMs while outperforming them.
They are completely free to use under the Apache 2.0 License.
Different versions are available, including an instruct-version that simulates ChatGPT's behavior.

With innovative techniques like QLoRA, fine-tuning Falcon models can now be accomplished on consumer-grade hardware. Previous discussions have covered QLoRA and the fine-tuning process for Falcon models.

Finding Simplicity with Falcontune

Fine-tuning Falcon models using QLoRA is straightforward with the Hugging Face libraries. However, an even simpler solution that requires minimal coding is available: Falcontune.

Falcontune is an open-source initiative (Apache 2.0 license) created by Rumen Mihaylov. According to the project page:

Falcontune enables fine-tuning of FALCON models (e.g., falcon-40b-4bit) using just one consumer-grade A100 40GB GPU.

While fine-tuning a model with 40 billion parameters on a GPU with 40GB of VRAM sounds fantastic, referring to the A100 40GB as “consumer-grade” is a bit misleading, given its price tag of over $5,000. In contrast, we will focus on the Falcon 7B parameter model, which can comfortably run on consumer GPUs like the RTX 3060 with 12GB of VRAM.

Fine-Tuning Falcon-7B and Falcon-40B in One Command

Note: The commands below are tailored for Falcon-7B. Simply replace “7B” with “40B” to adjust for Falcon-40B.

Requirements

I conducted tests using a free instance of Google Colab.

To get started, we first need to clone Falcontune:

Next, install the required dependencies:

cd falcontune

pip install -r requirements.txt

python setup.py install

We also need the Falcon model. For this article, I utilized Falcon-7B provided by TheBloke:

Let's also download some sample datasets:

Now we're prepared!

The Command Line for Fine-Tuning

The “setup.py install” command earlier provided us with a “falcontune” command. To fine-tune Falcon-7B using the Alpaca dataset, run the following command:

falcontune finetune

--model=falcon-7b-instruct-4bit

—weights=./gptq_model-4bit-64g.safetensors

--dataset=./alpaca_data_cleaned.json

—data_type=alpaca

--lora_out_dir=./falcon-7b-instruct-4bit-alpaca/

—mbatch_size=1

--batch_size=2

—epochs=3

--lr=3e-4

—cutoff_len=256

--lora_r=8

—lora_alpha=16

--lora_dropout=0.05

—warmup_steps=5

--save_steps=50

—save_total_limit=3

--logging_steps=5

—target_modules='["query_key_value"]'

--backend=triton

Expect the process to take a while (around 24 hours on a free Google Colab instance due to disconnection after 12 hours). The Alpaca dataset is sizable, so you may want to reduce its dimensions for testing purposes. Thanks to LoRa, we are fine-tuning only 2,359,296 parameters.

If you wish to use your own dataset, refer to the format expected in the “alpaca_data_cleaned.json” file.

During the fine-tuning process, the peak memory usage was 4.0 GB for CPU RAM and 8.3 GB for GPU VRAM, which is a manageable setup for home-based fine-tuning. Keep in mind that the 40B version of Falcon will necessitate a more robust machine.

Testing Inference

To test the model's inference capabilities, execute:

falcontune generate

--interactive

—model=falcon-7b-instruct-4bit

--weights=./gptq_model-4bit-64g.safetensors

—lora_apply_dir falcon-7b-instruct-4bit-alpaca/

--max_new_tokens 50

—use_cache

--do_sample

—instruction "How to prepare pasta?"

--backend triton

And that’s all there is to it! You've successfully created an efficient chat model on your local machine.

This video demonstrates how to efficiently fine-tune models using QLoRA, showcasing its benefits and applications.

Learn how to fine-tune the Falcon 7B model with PEFT and QLoRA on a HuggingFace dataset.

If you appreciate this content and wish to support my work, consider following me on Medium.

didismusings.com

Effortless Fine-Tuning of Falcon Models Using QLoRA

Introduction to Falcon Models

Finding Simplicity with Falcontune

Fine-Tuning Falcon-7B and Falcon-40B in One Command

Requirements

The Command Line for Fine-Tuning

Testing Inference

Share the page:

Recent Post:

Exploring the Digital Athlete Challenge: Day 4 Insights

Unlock Your Potential: A Beginner's Guide to Self-Improvement

Understanding the Intricacies of AI Companionship in Tech

Essential AI Tools for Crafting YouTube Content Efficiently

Understanding the Harmful Impacts of Toxic Masculinity

Title: Understanding Your Life Desires: Four Key Challenges

The Global Slowdown of Seafloor Spreading: A New Perspective

Harnessing Video Games for Cancer Relief: Innovative Approaches