Fine-Tune Qwen2.5-Coder for TypeScript-Only Output on Macbook Air M1

Here is my journey of attempts to fine tune model locally on macbook air

First of all, important thing to understand - fine tuning is about teaching model of desired output format - not learning something new

In this example, we are going to give it a try and fine tune Qwen model so it will always output typescript only code, no javascript leakage, no markdown, no explanations

Trainging Data Samples

Technically, fine tuning is quite easy and simple

We need to prepare samples dataset first which will contain user prompts and generated response aka:

{
  "messages": [
    { "role": "system", "content": "system prompt goes here" },
    { "role": "user", "content": "write me a function to add two numbers" },
    { "role": "assistant", "content": "expected output goes here" }
  ]
}

For more or less adequate results we need at minimum ~500 such samples, the more is better

Obviously we won't create this manually so let's utilize GPT

import { writeFileSync } from "fs";

const system = `
You are a TypeScript code generator specialized in React.
CRITICAL RULES:
1. Output ONLY valid TypeScript/TSX code
2. NO markdown code blocks (no \`\`\`)
3. NO explanations or comments outside the code
4. NO JavaScript - always use TypeScript with proper types
5. Include necessary imports
6. Use functional components with hooks
7. Export the component/hook as default
`.trim();

const inputs = [
  "Create a Button component with onClick handler",
  "Create a Counter component with increment and decrement",
  "Create a TodoList component that displays items",
  "Create a TodoItem component with checkbox",
  "Create a SearchInput component with debounce",
  "Create a Modal component with backdrop",
  "Create a Dropdown component with options",
  "Create a Tabs component with panels",
  "Create a Accordion component",
  "Create a Tooltip component",
  "Create a Badge component with variants",
  "Create a Avatar component with fallback",
  "Create a Card component with header and footer",
  "Create a Alert component with dismiss",
  "Create a Spinner component",
  "Create a ProgressBar component",
  "Create a Pagination component",
  "Create a Breadcrumb component",
  "Create a Switch toggle component",
  "Create a Slider component",
  "Create a DatePicker component",
  "Create a TimePicker component",
  "Create a ColorPicker component",
  "Create a FileUpload component",
  "Create a ImageGallery component",
  "Create a Carousel component",
  "Create a Table component with sorting",
  "Create a DataGrid component",
  "Create a Form component with validation",
  "Create a Input component with error state",
  "Create a Select component with search",
  "Create a Checkbox component",
  "Create a RadioGroup component",
  "Create a TextArea component with character count",
  "Create a RichTextEditor component",
  "Create a CodeBlock component with syntax highlighting",
  "Create a Navbar component with links",
  "Create a Sidebar component with navigation",
  "Create a Footer component",
  "Create a Header component with logo",
  "Create a Layout component with slots",
  "Create a Container component",
  "Create a Grid component",
  "Create a Stack component",
  "Create a Divider component",
  "Create a Spacer component",
  "Create a useLocalStorage hook",
  "Create a useFetch hook",
  "Create a useDebounce hook",
  "Create a useClickOutside hook",
  "Create a useMediaQuery hook",
  "Create a useIntersectionObserver hook",
  "Create a useKeyPress hook",
  "Create a useWindowSize hook",
  "Create a useScrollPosition hook",
  "Create a usePrevious hook",
  "Create a useToggle hook",
  "Create a useAsync hook",
  "Create a useInterval hook",
  "Create a useTimeout hook",
  "Create a useCopyToClipboard hook",
  "Create a useOnlineStatus hook",
  "Create a useGeolocation hook",
  "Create a useDarkMode hook",
  "Create a useHover hook",
  "Create a useFocus hook",
  "Create a Skeleton loading component",
  "Create a EmptyState component",
  "Create a ErrorBoundary component",
  "Create a Suspense wrapper component",
  "Create a LazyLoad component",
  "Create a InfiniteScroll component",
  "Create a VirtualList component",
  "Create a Draggable component",
  "Create a Resizable component",
  "Create a ContextMenu component",
  "Create a Toast notification component",
  "Create a Snackbar component",
  "Create a Dialog component",
  "Create a Drawer component",
  "Create a Popover component",
  "Create a Portal component",
  "Create a Overlay component",
  "Create a Stepper component",
  "Create a Timeline component",
  "Create a Rating component",
  "Create a Tag component",
  "Create a Chip component",
  "Create a List component",
  "Create a ListItem component",
  "Create a Menu component",
  "Create a MenuItem component",
  "Create a Toolbar component",
  "Create a IconButton component",
  "Create a FloatingActionButton component",
  "Create a ButtonGroup component",
];

const samples: { messages: { role: string; content: string }[] }[] = []; // {"messages":[{"role":"system","content":"..."},{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}
for (let i = 0; i < inputs.length; i++) {
  const input = inputs[i];
  console.log(`[${i + 1}/${inputs.length}] ${input}`);

  const output = await fetch("https://api.openai.com/v1/responses", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
    },
    body: JSON.stringify({
      model: "gpt-5.1-codex-mini",
      input: [
        { role: "system", content: system },
        { role: "user", content: input },
      ],
    }),
  })
    .then((r) => r.json())
    .then((data) => data.output[data.output.length - 1].content[0].text);

  if (output.includes("```")) {
    console.log(`  ⚠️  Skipped: contains markdown code blocks`);
    continue;
  }

  if (/^(Here|This|The|I |Let me|Sure|Below)/i.test(output.trim())) {
    console.log(`  ⚠️  Skipped: starts with explanation`);
    continue;
  }

  if (!output.includes(":") && !output.includes("interface") && !output.includes("type ")) {
    console.log(`  ⚠️  Skipped: missing TypeScript types`);
    continue;
  }

  if (!output.includes("import") && !output.includes("React")) {
    console.log(`  ⚠️  Skipped: missing imports`);
    continue;
  }

  samples.push({
    messages: [
      { role: "system", content: system },
      { role: "user", content: input },
      { role: "assistant", content: output },
    ],
  });

  await new Promise((resolve) => setTimeout(resolve, 500)); // just in case
}

const shuffled = samples.sort(() => Math.random() - 0.5);
const split_80_20 = Math.floor(shuffled.length * 0.8);
const train = shuffled.slice(0, split_80_20);
const valid = shuffled.slice(split_80_20);

writeFileSync("train.jsonl", train.map((s) => JSON.stringify(s)).join("\n"));
writeFileSync("valid.jsonl", valid.map((s) => JSON.stringify(s)).join("\n"));

console.log("DONE");

This script will prepare something around one hundred of such samples, shufle them and store results in train.jsonl and valid.jsonl

Bot files contains json lines, where each line is an sample as described above.

MLX LM Fine Tune Qwen

After data samples are prepared we are ready for fine tuning

On Mac we do not have CUDA but there is mlx-lm

To avoid seven rounds of hell with python and its dependencies we can simply

brew install mlx-lm

From what i understand mlx-lm is like simplified ollama, we can use it not only for fine tuning but to inference as well

Here is quick example, let's fine tune model with our examples

mlx_lm.lora \
    --model "Qwen/Qwen2.5-Coder-0.5B-Instruct" \
    --train \
    --data . \
    --adapter-path ./adapters \
    --batch-size 2 \
    --num-layers 8 \
    --iters 100

Where:

--model "Qwen/Qwen2.5-Coder-0.5B-Instruct" base model that will be fine tuned, note that we using models from Hugging Face not Ollama
--data . - path to directory with train.jsonl and valid.jsonl files
--adapter-path ./adapters - path to directory where mlx should store results of fine tune
--batch-size 1 - samples per training step. Higher - faster but more RAM. Ideally will be somewhere between 2 and 4
--num-layers 2 - number of layers to apply LoRA. More - better quality but more RAM. Ideally will be somewhere between 8 and 16
--iters 10 - training iterations. More - better learning but takes longer. Should be between 500 and 1000

Note: by intent in sample we are using such small values, just to prove it works without us waiting forever for training to complete.

Now, when our model is fine tuned we can test it like so:

mlx_lm.generate \
    --model "Qwen/Qwen2.5-Coder-0.5B-Instruct" \
    --adapter-path ./adapters \
    --system-prompt "You are a TypeScript code generator. Output only valid TypeScript code, no markdown, no explanations." \
    --prompt "Create a function that calculates the average of two numbers"

Note: here we are still passing prompt - without it model still may output whatever else, for model to trully output only typescript we need more training samples and do real full fine tune step, after which it should be possible to remove systemp prompt and expect desired output, without us asking model to respond with typescript.

With this in place we have our short iteration loop in place and can play with samples until we are fine with results.

Ollama

Technically we may stop after previous steps, but in my case i wanted to run this model in Ollama which leads to never ending issues and problems

If we are training llama, mistrel, etc everything works out of the box, as easy as

import fine tuned adapter

But in our case we are dealing with unsupported model - which leads us to never ending problems and issues

We need to use MLX Fuse to export our model

Then, we are going to use llama.cpp to convert it to GGUF

And only then we will be able to create Ollama model from it

And each step may broke everything so we gonna need to verify results after every step

MLX Fuse

From my understanding fuse is a way to expoport model internals into folder

Ideally if we are working with supported models we can simply run

mlx_lm.fuse \
  --model "Qwen/Qwen2.5-Coder-0.5B-Instruct" \
  --adapter-path ./adapters \
  --export-gguf \
  --gguf-path ./model.gguf

But it will fail with error: ValueError: Model type qwen2 not supported for GGUF conversion.

So, we should use:

mlx_lm.fuse \
  --model "Qwen/Qwen2.5-Coder-0.5B-Instruct" \
  --adapter-path ./adapters \
  --save-path ./fused

llama.cpp

Now, we want convert our fused model into gguf, and to do it we are going to use llama.cpp which can be installed like so:

brew install llama.cpp

it has bunch of helpers, and one of them is a pythong script to perform conversion

/opt/homebrew/Cellar/llama.cpp/7310/bin/convert_hf_to_gguf.py /Users/mac/Desktop/finetun/fused --outtype f16 --outfile model.gguf

BUT, there is a catch, now, if we will try to test our model like so:

llama-cli -m model.gguf --system-prompt "You are a TypeScript code generator. Output only valid TypeScript code, no markdown, no explanations." --prompt "Create a function that calculates the average of two numbers" -n 100 --no-conversation

it will produce garbage, literally garbage, not even close to code at all, not talking about typescript

That's becuase of mlx format which is not supported by llama.cpp, so still we need to perform one more conversion before proceeding

#!/usr/bin/env python3
import os
import mlx.core as mx
import numpy as np
from safetensors.numpy import save_file

weights = mx.load("./fused/model.safetensors")

print(f"Converting {len(weights)} tensors...")
np_weights = {}
for k, v in weights.items():
    arr = np.array(v.astype(mx.float32))
    np_weights[k] = arr.astype(np.float16)

os.makedirs("./fused_standard", exist_ok=True)
save_file(np_weights, "./fused_standard/model.safetensors")
print("Done!")

if you take closer look - all we are doing is changing type from float 32 to float 16

Note: as for dependencies we need something like:

python3 -m venv venv
source venv/bin/activate
pip install mlx numpy safetensors torch transformers sentencepiece

So it is unavoidable to do everything without python.

At the very end we need to do following:

cp -r fused fused_standard
python convert.py

fused standard will have same files, except model.safetensors where we are changing types

Now, we can run converter once again

/opt/homebrew/Cellar/llama.cpp/7310/bin/convert_hf_to_gguf.py /Users/mac/Desktop/finetun/fused_standard --outtype f16 --outfile model.gguf

and still it wont work: ModuleNotFoundError: No module named 'gguf'

thankfully opus was able to figure out workaround, we are going to clone llama cpp repo and install gguf that was only one possible way for make things working

git clone https://github.com/ggml-org/llama.cpp
python llama.cpp/convert_hf_to_gguf.py /Users/mac/Desktop/finetun/fused_standard --outtype f16 --outfile model.gguf

and test it

llama-cli -m model.gguf --system-prompt "You are a TypeScript code generator. Output only valid TypeScript code, no markdown, no explanations." --prompt "Create a function that calculates the average of two numbers" -n 100 --no-conversation

And it generates something meaningfull

Ollama

Now, when we have gguf we can play with ollama, actually it is quite easy and simple

we need to create model like so:

echo "FROM ./model.gguf" > Modelfile
ollama create typescript-coder -f Modelfile

And finally:

ollama run typescript-coder "Create a function that calculates the average of two numbers"

Why is Python required for conversion?

Short answer: MLX uses a proprietary safetensors format that only MLX can read.

MLX saves model weights with {"format": "mlx"} metadata in the safetensors file. This format:

Cannot be read by the standard safetensors library
Cannot be read by llama.cpp's convert_hf_to_gguf.py
Can only be loaded using mlx.core.mx.load()

Additionally, mlx_lm.fuse --export-gguf exists but only supports llama/mixtral/mistral models, not Qwen2.

The 17-line convert.py script is the minimal solution: it loads weights via MLX, converts to numpy, and saves with the standard safetensors library.