So far we've kept training behind a closed door: we hand the album to our painter, we wait, and at some point a bracelet comes back. So what actually goes on behind that door? Time to step into the kitchen. The machinery inside is surprisingly simple: one small ritual, repeated thousands of times. Once we can picture that ritual, the parameters stop being folklore we picked up somewhere and turn into knobs we genuinely know how to use.
What actually happens in one step?
Every training step is the same five moves:
- A card is drawn from the album. One of the images comes to the table, caption in tow.
- It gets smudged. A random amount of noise is poured over the image. Sometimes it's a light haze, sometimes there's almost nothing left of the picture but raw static. So the difficulty of each repetition is decided, in effect, by a roll of the dice.
- We ask for it back. We ask our painter, bracelet on, to undo the smudge. The only clues in hand are the ruined image and the caption.
- The attempt is scored. The painter's reconstruction is compared with the original, pixel by pixel in the model's internal space, and the difference collapses into a single number: loss. A zero would have meant a perfect guess.
- The bracelet gets its fine crafting. For every weight in the LoRA, the trainer works out which tiny change would lower that score. That direction has a name: the gradient. Each weight moves along it, gradient times
learning_rate. The base model sits frozen the whole time; only the bracelet takes the lesson.

Two lessons fall out of this ritual. First, the caption isn't decoration; it's half of the question we put to the painter (which is exactly why chapter 3 made such a fuss about it). Second, because the noise amount is re-rolled at every step, some repetitions come easy and some are genuinely hard, and the score bounces around accordingly. Keep that one in your back pocket; we'll need it in a minute.
Reading the loss curve
Plot the score of every step in a run and we get the loss curve: the heart monitor of training, if you like. Almost every chart we'll ever face settles into one of these three rhythms:
Healthy run
The curve is noisy, but the overall trend heads downhill and flattens out at the end. The jitter is perfectly normal: the painter attempts a randomly difficult exercise at every step.
Learning rate too high
There's a fast drop at the start, then the corrections begin overshooting and the loss climbs and thrashes. At this point the run is breaking more than it teaches.
Overfitting
The album loss (indigo) keeps inching down forever. But quality on prompts the album never showed (copper, dashed) turns around and gets worse. The marked dot is where we actually needed to stop.
A few habits make these curves much easier to read:
- Look at the trend, not the dots. That jitter comes from the difficulty dice we just talked about; it isn't a sign of trouble. Squint until fifty steps blur into one.
- A flattening curve isn't a failing curve. Once the trend has gone horizontal, extra steps have little left to give. A flat, low curve usually means the run is finished rather than stuck.
- Loss isn't quality. This number only measures how well the album images get rebuilt. Past the overfit elbow it keeps improving while the LoRA gets worse at everything we actually care about. See that dashed copper line? That's the line we're paying for, and no trainer prints it as a number. The only way to see it is through validation samples.
- Don't chase zero. Part of every repetition goes toward rebuilding noise that is genuinely random, and nobody predicts randomness. Every run has a floor it will oscillate around forever; forcing it toward zero buys us nothing except memorization.
A tour of the bench
Now that we've watched the loop, that long parameter list loses its bite; we can walk it like a single tour of the bench. First, the knobs that decide what gets studied:
image_data_url
The album itself: the URL of the zip we packed, holding the images and any caption files riding along with them.
Some trainers call this field images_data_url or training_data_url. Same album, different label stuck on the shelf.
trigger_phrase
The call word we work into the captions, so the learned skill gets a name we can summon when writing prompts.
Pick something that isn't in the dictionary, like TOK. Forgetting to type this word at inference is the number one reason a LoRA looks dead.
default_caption
One shared caption applied to every image that doesn't bring its own caption file.
In edit LoRAs, this single line usually carries the entire instruction. Watch out on the Klein trainers: no captions and no default means a plain old error.
create_masks
Cuts the subject out automatically, so training spends its attention on the actual subject instead of the background.
Leave it on when training a person or a product. Turn it off for styles, where the lesson lives in the whole image.
Then the ones that decide how hard and how long the studying goes:
steps
How many repetitions our painter gets before the bracelet is sealed.
Chapter 4 in one sentence: around 1000 is a good start for a subject set of 15 to 30 images, and tiny sets need less. One more thing: Klein trainers want multiples of 100.
learning_rate
How big each correction is allowed to be.
Chapter 5 in one sentence: trust the default, halve it if the outputs look fried, and whatever we do, never raise it together with steps.
rank
The width of the LoRA's correction matrices, in other words how much nuance fits in the bracelet.
Think of it as the bracelet's thickness. 16 is plenty for most subjects. We can go higher for complex styles, knowing the file grows and memorization sets in earlier.
is_style
Flips training from subject mode to style mode, switching off subject-only tricks like masking along the way.
It's the knob version of telling the painter: don't learn that object, learn this way of painting.
And finally the ones that decide the shape of the practice and the output:
resolution / aspect_ratio
The size and proportions of the images the trainer works on.
Train at the resolution we plan to generate at, or above it. Going higher than that shows up on both the bill and the clock.
validation
Samples generated at set intervals during the run; this is our window into how the skill is coming along.
Cheap insurance. It's how we catch the overfit elbow while there's still time to stop the run.
output_lora_format
The naming scheme of the saved weights: fal for fal endpoints, comfy for ComfyUI.
Classic trap: a LoRA in the wrong key format loads without a peep and does absolutely nothing. If a LoRA seems lifeless, we check here first.
number_of_frames / frame_rate
Video trainers only: how much motion each training clip carries.
Subject LoRAs learned from photos; motion LoRAs learn from clips. Same album logic, with a time axis bolted on.
So why does our trainer show five knobs instead of fifteen?
Not every bench puts every knob in front of us. The Klein 9B base trainer deliberately shows just five: the album, steps, learning_rate, default_caption and output_lora_format. Every other decision has already been made on our behalf, tuned across a frankly absurd number of datasets. Richer benches like flux-lora-fast-training put more on the table: triggers, masks, the style mode.
