Chapter 9 of 10

Edit LoRAs: before and after

Some LoRAs learn a change rather than an object. We hand the painter a sketch and he hands us back a painting.

Up to here we've always taught our painter a thing: a cat, a sneaker, a way of drawing. With edit LoRAs we drill a change into them instead. We're no longer saying "this is TOK"; we're saying "when you see this, turn it into that." It's the difference between teaching a noun and teaching a verb.

The album becomes a deck of before/after cards

An edit album has no room for photos standing alone; everything here comes in pairs. Picture each pair as a playing card: the front shows the world as it is, the back shows it transformed. The one rule the painter asks of us is a simple naming scheme: ROOT_start.ext and ROOT_end.ext.

Here's a real pair from a set we trained: a pencil sketch, and the same scene brought to life in gouache:

p01_start.png
p01_start.png
p01_end.png
p01_end.png

# dataset.zip

p01_start.png # the before

p01_end.png # the after

p02_start.png

p02_end.png

p01_start2.png # optional: up to 4 reference images per pair

After some fifteen cards like this, the painter starts to grasp that magic mapping: how do lines turn into brushstrokes, and white paper into deep layers of paint? Once the run is done, we can hand them a sketch they've never seen and they'll apply the very same transformation to it.

The one rule that never bends

Inside a pair, the only thing allowed to change is the thing we want to teach. Same framing, same light, same subject; the only difference between start and end should be the transformation itself. Any small difference that sneaks in (a slightly shifted angle, a different crop) gets mistaken for part of the transformation and learned right along with it.

This is why collecting an edit album is a bit of a chore; two versions of the same scene aligned to the millimeter are hard to come by. In the next chapter we'll see how we manufactured these cards ourselves.

The caption carries the instruction

When we work with pairs, the caption usually describes the transformation itself rather than the image. For our own set we used one shared caption:

turn this pencil sketch into a finished gouache painting in TOKSTYLE style

At painting time, we give the system a photo and a prompt in this same spirit; the learned skill takes care of the rest.

Reference images: extra hints

The Klein edit trainer accepts up to four extra references per pair (ROOT_start2ROOT_start4). They come to the rescue whenever the transformation needs information from outside the frame: if the edit says "add this vase to that room", a standalone photo of the vase is exactly that kind of hint. Sketch-to-painting needs none of this, but here's what a full card looks like:

The room as it stands. The starting frame.

p07_start.png

The room as it stands. The starting frame.

Reference: a studio shot of the product to be added. Where the painter looks when we say "add this".

p07_start2.png

Reference: a studio shot of the product to be added. Where the painter looks when we say "add this".

Result: the same room, the product on the floor, consistent down to its shadow.

p07_end.png

Result: the same room, the product on the floor, consistent down to its shadow.

We staged this trio to show the logic, with the KIVO sneaker you might remember from chapter 7. A set of 10 to 15 cards like this is enough to teach an edit that places the referenced product into a scene.

Where to train it, where to run it