Fine-Tune LLM Agent for Tool Use with Hugging Face

Agents, agents, agents.

Everyone loves agents. Big labs are releasing them — Introducing Deep Research, Introducing Operator. Small companies are starting around this idea (Launch YC, Manus, Meet ARI), and the tooling around them is becoming more mature — Hands-On Deep Dive Into Model Context Protocol, and OpenAI adopts rival Anthropic’s standard for connecting AI models to data.

But at the core — the main part is the agent’s brain: the LLM. And to make an agent work, it needs to have a good brain (this statement isn’t limited to agents, by the way). Today, we’re going to train our own!

Why Fine-Tune LLM Agents for Tool Use?

The model is a product again! To gain a competitive edge, you need to train your models!
Your specific tools! To get the most out of your toolset, you need to train your agent on them — in a specific order, with domain-specific knowledge.
Any model can use tools! Even if the latest open-source model doesn’t natively support function calling, it can be adapted through simple fine-tuning.

How to Train a Tool-Using LLM Agent?

By training explicitly with the <tool> token! Here’s a simple end-to-end setup (based on the amazing Agents course from Hugging Face). We’re using Modal to get access to an H200!

import modal

app = modal.App("function-calling-finetune")
image = modal.Image.debian_slim().pip_install([
    "transformers==4.50.3",
    "peft==0.15.1",
    "bitsandbytes==0.45.4",
    "trl==0.16.0",
    "datasets==3.5.0",
    "torch==2.2.1",
    "accelerate==1.5.2",
    "wandb==0.19.8"
]).env({"WANDB_PROJECT": "function-calling-finetune"})

with image.imports():
    from enum import Enum
    import torch

    from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
    from datasets import load_dataset
    from trl import SFTConfig, SFTTrainer
    from peft import LoraConfig, TaskType, PeftConfig, PeftModel


DATASET_NAME = "Jofthomas/hermes-function-calling-thinking-V1"
USERNAME = "truskovskiyk"
MODEL_NAME = "google/gemma-3-1b-it"
OUTPUT_DIR = "gemma-3-1b-it-function-calling"

@app.function(
    image=image,
    gpu="H200",
    timeout=86400,
    secrets=[modal.Secret.from_name("training-config")]
)
def function_calling_finetune():
    set_seed(42)

    dataset_name = DATASET_NAME
    username = USERNAME
    model_name = MODEL_NAME
    output_dir = OUTPUT_DIR

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

    def preprocess(sample):
        messages = sample["messages"]
        first_message = messages[0]
        if first_message["role"] == "system":
            system_message_content = first_message["content"]
            messages[1]["content"] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]
            messages.pop(0)
        return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}

    dataset = load_dataset(dataset_name)
    dataset = dataset.rename_column("conversations", "messages")
    dataset = dataset.map(preprocess, remove_columns="messages")
    dataset = dataset["train"].train_test_split(0.1)

    sample = dataset['train'].select(range(1))
    print(f"Sample: {sample['text']}")

    class ChatmlSpecialTokens(str, Enum):
        tools = "<tools>"
        eotools = "</tools>"
        think = "<think>"
        eothink = "</think>"
        tool_call="<tool_call>"
        eotool_call="</tool_call>"
        tool_response="<tool_reponse>"
        eotool_response="</tool_reponse>"
        pad_token = "<pad>"
        eos_token = "<eos>"
        @classmethod
        def list(cls):
            return [c.value for c in cls]

    tokenizer = AutoTokenizer.from_pretrained(
            model_name,
            pad_token=ChatmlSpecialTokens.pad_token.value,
            additional_special_tokens=ChatmlSpecialTokens.list()
        )
    tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

    model = AutoModelForCausalLM.from_pretrained(model_name, attn_implementation='eager', device_map="auto", torch_dtype=torch.bfloat16)
    model.resize_token_embeddings(len(tokenizer))
    model.to(torch.bfloat16)

    rank_dimension = 16
    lora_alpha = 64
    lora_dropout = 0.05
    peft_config = LoraConfig(r=rank_dimension, lora_alpha=lora_alpha, lora_dropout=lora_dropout, target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], task_type=TaskType.CAUSAL_LM)

    per_device_train_batch_size = 16
    per_device_eval_batch_size = 16
    gradient_accumulation_steps = 1
    logging_steps = 5
    learning_rate = 1e-4
    max_grad_norm = 1.0
    num_train_epochs = 3.0
    warmup_ratio = 0.1
    lr_scheduler_type = "cosine"
    max_seq_length = 1500

    training_arguments = SFTConfig(
        output_dir=output_dir,
        per_device_train_batch_size=per_device_train_batch_size,
        per_device_eval_batch_size=per_device_eval_batch_size,
        gradient_accumulation_steps=gradient_accumulation_steps,
        save_strategy="no",
        eval_strategy="epoch",
        logging_steps=logging_steps,
        learning_rate=learning_rate,
        max_grad_norm=max_grad_norm,
        weight_decay=0.1,
        warmup_ratio=warmup_ratio,
        lr_scheduler_type=lr_scheduler_type,
        report_to="wandb",
        bf16=True,
        hub_private_repo=False,
        push_to_hub=False,
        num_train_epochs=num_train_epochs,
        gradient_checkpointing=True,
        gradient_checkpointing_kwargs={"use_reentrant": False},
        packing=True,
        max_seq_length=max_seq_length,
    )

    trainer = SFTTrainer(
        model=model,
        args=training_arguments,
        train_dataset=dataset["train"],
        eval_dataset=dataset["test"],
        processing_class=tokenizer,
        peft_config=peft_config,
    )
    trainer.model.config.use_cache = False
    trainer.model.generation_config.use_cache = False
    trainer.train()
    trainer.save_model()
    trainer.push_to_hub(f"{username}/{output_dir}")
    tokenizer.eos_token = "<eos>"
    tokenizer.save_pretrained(f"{username}/{output_dir}")
    tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)

The dataset comes from:

The best course of action, of course, is to build your own dataset. If you have existing usage data and good telemetry — that’s a huge advantage! Analyze API calls, look for common patterns, and create a tool-calling dataset for your agent. Not an easy task — definitely harder than just training the model.

Let’s run the training and check the W&B logs from the run.

pip install modal && modal setup
modal secret create training-config HF_TOKEN=**** WANDB_API_KEY=****
modal run -d finetune_tool.py::function_calling_finetune

And after just ~10 min on an H200, your model will be saved on Hugging Face:

Screenshot of the Hugging Face model card for koml/gemma-3-1b-it-function-calling, showing that it is a fine-tuned version of google/gemma-3-1b-it using TRL, with a quick start code snippet and training procedure via W&B.

Source: https://huggingface.co/koml/gemma-3-1b-it-function-calling

Boom! Your agent’s brain is now adapted — by you, for your data, your use case, and your unique toolset. The best part? Only you have it.

Don’t forget to evaluate it — we’ll cover that in our next blog posts!

Note: If you liked what you read, consider checking out my ML engineering course — or contact me for consultancy services!

Train Agent Brain: Fine-Tune Your Own LLM Agent for Tool Use with Hugging Face

Why Fine-Tune LLM Agents for Tool Use?

How to Train a Tool-Using LLM Agent?

Like this:

Related

Leave a ReplyCancel reply

Why Fine-Tune LLM Agents for Tool Use?

How to Train a Tool-Using LLM Agent?

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Kyryl Opens ML