Agents, agents, agents.
Everyone loves agents. Big labs are releasing them — Introducing Deep Research, Introducing Operator. Small companies are starting around this idea (Launch YC, Manus, Meet ARI), and the tooling around them is becoming more mature — Hands-On Deep Dive Into Model Context Protocol, and OpenAI adopts rival Anthropic’s standard for connecting AI models to data.
But at the core — the main part is the agent’s brain: the LLM. And to make an agent work, it needs to have a good brain (this statement isn’t limited to agents, by the way). Today, we’re going to train our own!
Why Fine-Tune LLM Agents for Tool Use?
- The model is a product again! To gain a competitive edge, you need to train your models!
- Your specific tools! To get the most out of your toolset, you need to train your agent on them — in a specific order, with domain-specific knowledge.
- Any model can use tools! Even if the latest open-source model doesn’t natively support function calling, it can be adapted through simple fine-tuning.
How to Train a Tool-Using LLM Agent?
By training explicitly with the <tool> token! Here’s a simple end-to-end setup (based on the amazing Agents course from Hugging Face). We’re using Modal to get access to an H200!
import modal
app = modal.App("function-calling-finetune")
image = modal.Image.debian_slim().pip_install([
"transformers==4.50.3",
"peft==0.15.1",
"bitsandbytes==0.45.4",
"trl==0.16.0",
"datasets==3.5.0",
"torch==2.2.1",
"accelerate==1.5.2",
"wandb==0.19.8"
]).env({"WANDB_PROJECT": "function-calling-finetune"})
with image.imports():
from enum import Enum
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig, TaskType, PeftConfig, PeftModel
DATASET_NAME = "Jofthomas/hermes-function-calling-thinking-V1"
USERNAME = "truskovskiyk"
MODEL_NAME = "google/gemma-3-1b-it"
OUTPUT_DIR = "gemma-3-1b-it-function-calling"
@app.function(
image=image,
gpu="H200",
timeout=86400,
secrets=[modal.Secret.from_name("training-config")]
)
def function_calling_finetune():
set_seed(42)
dataset_name = DATASET_NAME
username = USERNAME
model_name = MODEL_NAME
output_dir = OUTPUT_DIR
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
def preprocess(sample):
messages = sample["messages"]
first_message = messages[0]
if first_message["role"] == "system":
system_message_content = first_message["content"]
messages[1]["content"] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]["content"]
messages.pop(0)
return {"text": tokenizer.apply_chat_template(messages, tokenize=False)}
dataset = load_dataset(dataset_name)
dataset = dataset.rename_column("conversations", "messages")
dataset = dataset.map(preprocess, remove_columns="messages")
dataset = dataset["train"].train_test_split(0.1)
sample = dataset['train'].select(range(1))
print(f"Sample: {sample['text']}")
class ChatmlSpecialTokens(str, Enum):
tools = "<tools>"
eotools = "</tools>"
think = "<think>"
eothink = "</think>"
tool_call="<tool_call>"
eotool_call="</tool_call>"
tool_response="<tool_reponse>"
eotool_response="</tool_reponse>"
pad_token = "<pad>"
eos_token = "<eos>"
@classmethod
def list(cls):
return [c.value for c in cls]
tokenizer = AutoTokenizer.from_pretrained(
model_name,
pad_token=ChatmlSpecialTokens.pad_token.value,
additional_special_tokens=ChatmlSpecialTokens.list()
)
tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
model = AutoModelForCausalLM.from_pretrained(model_name, attn_implementation='eager', device_map="auto", torch_dtype=torch.bfloat16)
model.resize_token_embeddings(len(tokenizer))
model.to(torch.bfloat16)
rank_dimension = 16
lora_alpha = 64
lora_dropout = 0.05
peft_config = LoraConfig(r=rank_dimension, lora_alpha=lora_alpha, lora_dropout=lora_dropout, target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"], task_type=TaskType.CAUSAL_LM)
per_device_train_batch_size = 16
per_device_eval_batch_size = 16
gradient_accumulation_steps = 1
logging_steps = 5
learning_rate = 1e-4
max_grad_norm = 1.0
num_train_epochs = 3.0
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
max_seq_length = 1500
training_arguments = SFTConfig(
output_dir=output_dir,
per_device_train_batch_size=per_device_train_batch_size,
per_device_eval_batch_size=per_device_eval_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
save_strategy="no",
eval_strategy="epoch",
logging_steps=logging_steps,
learning_rate=learning_rate,
max_grad_norm=max_grad_norm,
weight_decay=0.1,
warmup_ratio=warmup_ratio,
lr_scheduler_type=lr_scheduler_type,
report_to="wandb",
bf16=True,
hub_private_repo=False,
push_to_hub=False,
num_train_epochs=num_train_epochs,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={"use_reentrant": False},
packing=True,
max_seq_length=max_seq_length,
)
trainer = SFTTrainer(
model=model,
args=training_arguments,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
processing_class=tokenizer,
peft_config=peft_config,
)
trainer.model.config.use_cache = False
trainer.model.generation_config.use_cache = False
trainer.train()
trainer.save_model()
trainer.push_to_hub(f"{username}/{output_dir}")
tokenizer.eos_token = "<eos>"
tokenizer.save_pretrained(f"{username}/{output_dir}")
tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)The dataset comes from:
- NousResearch/hermes-function-calling-v1
- Salesforce/xlam-function-calling-60k
- Jofthomas/hermes-function-calling-thinking-V1
- NESTFUL: Nested Function-Calling Dataset
The best course of action, of course, is to build your own dataset. If you have existing usage data and good telemetry — that’s a huge advantage! Analyze API calls, look for common patterns, and create a tool-calling dataset for your agent. Not an easy task — definitely harder than just training the model.
Let’s run the training and check the W&B logs from the run.
pip install modal && modal setup
modal secret create training-config HF_TOKEN=**** WANDB_API_KEY=****
modal run -d finetune_tool.py::function_calling_finetuneAnd after just ~10 min on an H200, your model will be saved on Hugging Face:

Source: https://huggingface.co/koml/gemma-3-1b-it-function-calling
Boom! Your agent’s brain is now adapted — by you, for your data, your use case, and your unique toolset. The best part? Only you have it.
Don’t forget to evaluate it — we’ll cover that in our next blog posts!
Note: If you liked what you read, consider checking out my ML engineering course — or contact me for consultancy services!