The labs

The course did the explaining; here the knobs are in our hands. The lab holds three kinds of experiment. Real trainings: LoRAs we trained on Klein 9B changing a single parameter each time. Staged demonstrations: examples we recreated and exaggerated so we can get to know each failure properly. And tests: two little games that tell us whether the course actually stuck.

Real trainings

In every experiment the dataset, seed and prompt stay fixed. So whatever changes on screen, the only thing changing it is that one parameter.

Scale: 0 to 4

This knob lives on the inference side. The subject fades in slowly, then settles, and eventually tips over into caricature. Let's scrub through and watch.

Output at LoRA scale 0Output at LoRA scale 0.5Output at LoRA scale 1Output at LoRA scale 1.5Output at LoRA scale 2Output at LoRA scale 2.5Output at LoRA scale 3Output at LoRA scale 3.5Output at LoRA scale 4scale: 1.00
00.511.522.533.54

The seed and the prompt never change: a photo of TOK ceramic cat figurine sitting on a sunny windowsill beside a potted plant

Steps: 300 to 2500

We ran five separate trainings on an album-like prompt. The differences are very subtle, and that subtlety is exactly why overfitting slips past people; the learning arc below shows what it grows into over time.

Output after 300 training stepsOutput after 500 training stepsOutput after 1000 training stepsOutput after 1500 training stepsOutput after 2500 training steps1000 steps

Everything except steps is identical: a photo of TOK ceramic cat figurine sitting on a sunny windowsill beside a potted plant

Learning rate: three real runs

We ran three trainings at 1e-5, 5e-5 and 2e-4. Steps and seed are the same in all three.

Output trained at learning rate 1e-5

1e-5 · timid

Look at the body patterns: simplified, and partly made up. The identity is on its way, but at this pace 1000 steps wasn't enough time.

Output trained at learning rate 5e-5

5e-5 · default

The trainer's default. This is the closest match to the real figurine, and the model's base skills are right where we left them.

Output trained at learning rate 2e-4

2e-4 · frantic

It still doesn't look bad here, and that's the real lesson: an overly high learning rate often gets by on album-like prompts and falls apart on everything else.

All three trainings share the same dataset, the same 1000 steps, the same prompt and seed: a photo of TOK ceramic cat figurine sitting on a sunny windowsill beside a potted plant

Multi-LoRA mixer

We wear the TOK subject bracelet and the TOKSTYLE style bracelet together at different scale values. Let's see how the painter balances the two.

Subject scale 0.5, style scale 0.5Subject scale 0.5, style scale 1Subject scale 0.5, style scale 1.5Subject scale 1, style scale 0.5Subject scale 1, style scale 1Subject scale 1, style scale 1.5Subject scale 1.5, style scale 0.5Subject scale 1.5, style scale 1Subject scale 1.5, style scale 1.5TOK 1 · TOKSTYLE 1

Both bracelets worn on Klein 9B at the same time: TOK ceramic cat figurine on a harbor pier, a TOKSTYLE painting

The edit LoRA at work

We take the sketch-to-painting LoRA we trained in chapter 9 and try it on sketches it has never seen before.

Edit LoRA output painting
Input sketch
sketch inpainting out

Staged demonstrations

We deliberately exaggerated the failure examples here so we can get to know their faces properly. We made them with GPT Image 2 and put a 'staged' label on every one.

The learning arc

One prompt the album never contained, and the entire run on a single slider. First we watch the identity arrive, then we watch the painter stop listening as the album takes over the scene.

Staged output around not learned yetStaged output around the sweet spotStaged output around the album starts leaking inStaged output around the photocopier1000 stepsstaged
100100025005000
identity · does it look like TOK?100%
obedience · does it follow the prompt?100%

The sweet spot. We have identity and freedom at the same time: this is unmistakably TOK, in a scene the album never showed. This is exactly where we want to stop.

the prompt we used at every checkpoint: a photo of TOK on a striped beach towel by the sea. This is a staged demonstration; we exaggerated it a little so the whole arc fits on one slider. In real trainings the drift is much slower, which is exactly why it slips past so easily.

The eagerness knob

Let's picture the learning rate as a dial with three detents: a whisper, the default, and frantic.

Output trained at learning rate 1e-5Output trained at learning rate 5e-5Output trained at learning rate 2e-4learning_rate = 5e-5
1e-55e-52e-4

5e-5 = 0.00005 · the default

A real output from our Klein training at the default learning rate. The identity came through cleanly and the model's base skills are still in place. This is exactly what we're aiming for.

The middle detent is a real training output; the two ends are staged. We exaggerated them on purpose so the direction of each failure is obvious. In a real run the drift is far subtler, which is exactly why it slips past people.

The bicycle test

The two-second overfit check from chapter 4: we ask the painter for something the album never showed and see what he does.

prompt: a photo of TOK riding a bicycle

A healthy LoRA can improvise

A healthy LoRA can improvise

The painter really learned what TOK is, so he can drop TOK into a scene the album never showed. The collar, the patterns and the proportions all carry over untouched.

An overfit LoRA hands the album back

An overfit LoRA hands the album back

So where's the bicycle? Gone. Instead of learning TOK as a concept, this LoRA memorized the photos; no matter what we ask for, it gives us back the windowsill it studied.

This one is staged too: we made the left image with an image editor to show what healthy behavior looks like, while the right is a real output from a late checkpoint, standing in for the memorized answer.

Let's test ourselves

We saved two little games for the end. I'd say they teach more than they let on.

Stop the run

This time we watch a training run in real time. We have exactly what a trainer would show us, the loss curve and validation samples, and we decide when to stop. The curve that actually matters only appears after we do.

steps →loss
album loss · what the trainer shows usquality on new prompts · hidden until we stop
Validation sampleValidation sampleValidation sampleValidation sample

validation samples will show up here during the run

Our job is simple: stop the run at the sweet spot. We have what every trainer gives us, the loss curve and validation samples, and nothing else. Let's see if we can catch the right moment.

The diagnosis clinic

There are six outputs and four possible diagnoses. Let's see if we can read a LoRA's health from a single image.

Output to diagnosecase 1 of 6

prompt: a photo of TOK on a striped beach towel by the sea