Chapter 2 of 10

So we hand the painter an album

The painter learns whatever he sees in the album, so we need to think a little about what we fill it with.

Our painter has exactly one window to draw inspiration from: the album we hand him. He cannot learn from the intent in our head, and he cannot learn from the prompt we will eventually write; he only believes what he sees. That is why every decision we make about the dataset turns into a sentence of the lesson, and every flaw that sneaks into the album gets studied with the same seriousness as the things we actually meant.

Less but better: quality wins every time

15 to 30 sharp, well-lit, varied images produce a far better subject LoRA than 200 images scraped off the internet. A bad image does not just sit there harmlessly; it actively teaches something wrong. A blurry photo teaches the painter blur, and a frame with the head cut off teaches him cut-off heads.

The real work is in the variety

The painter has to work out, on his own, what the subject is, separated from whatever happens to surround it. And the only way he can do that is by watching the subject survive change with his own eyes:

  • Angles: front, profile, three-quarter, from behind, from above, from below.
  • Light: daylight, lamp light, overcast skies, dramatic shadows.
  • Backgrounds: indoors, outdoors, a plain studio, a crowded shelf.
  • Distance: full views and close-up details.
If the subject sits in the same pose on the same desk in every photo, the desk quietly becomes part of the subject. The painter has no way of knowing that desk was a coincidence.

The album we built for this guide

Every real example in this course runs on a single dataset: TOK, a ceramic cat figurine we generated 20 photos of ourselves. The figurine is identical in every frame, and we deliberately changed everything else. Here is a sample from the album:

TOK dataset image anchor
TOK dataset image v01
TOK dataset image v03
TOK dataset image v05
TOK dataset image v09
TOK dataset image v11
TOK dataset image v13
TOK dataset image v16
TOK dataset image v18
TOK dataset image v19
TOK dataset image v07
TOK dataset image v12

Take a look at what stays the same every time (the figurine itself, its indigo patterns, the copper collar) and what never repeats (angle, light, background, distance). The lesson the trainer will extract lives exactly in that contrast.

The shot list (for anyone who wants to copy it)

For a 20-image subject album, this is the budget we used and will keep using:

  • 8 full views: we take a lap around the subject; front, both profiles, three-quarter from each side, from behind, from above, and from below eye level.
  • 4 close-ups: the details that carry the identity. For TOK those were the face, the collar, the floral pattern, and the paws.
  • 4 lighting variations: otherwise ordinary frames in bright daylight, warm lamp light, the flat light of an overcast day, plus one hard, dramatic shadow.
  • 4 wildcard scenes: places the subject would rarely be found; outdoors on stone, on a crowded shelf, on fabric, by a window. These buy us flexibility later on.

There is also one thing we deliberately did not do: no two photos taken from the same spot with different settings. Near-duplicate frames look like free data at first glance, but they behave like a hidden finger on the scale, quietly multiplying the weight of a single viewpoint.

Practical rules

  1. Use the subject's real proportions, and never mix toy versions with the real thing.
  2. One subject per dataset. If we put two in, what comes out is a hybrid of the pair.
  3. Keep the resolution at or above the resolution we plan to generate at.
  4. Weed out the near-duplicates; as we just said, they quietly multiply the weight of a single viewpoint.
  5. For style LoRAs the rule flips: we vary the subjects and keep the style constant.