Wan 2.2 14B Image Trainer

subject trainer

fal-ai/wan-22-image-trainer

Wan 2.2 as an image model: subject and style LoRAs with video-grade detail.

Trains a text-to-image LoRA on Wan 2.2 14B with subject-focused options: face detection, optional face cropping, masking, and an is_style switch that disables them. A trigger_phrase is required here, not optional. The catalog's Wan 2.2 LoRA endpoint accepts Wan 2.2 LoRAs, so you can also try the result on video.

Open in fal playground ↗Official API docs ↗

What goes in the zip

Zip of training images. Captions are optional: the trainer can generate synthetic captions, and your required trigger_phrase ties everything together.

Good starting point

steps: 1000learning_rate: 0.0007

Parameters

Schema facts come straight from the fal API; the notes are ours.

Required

training_data_urlstringrequired

URL to a zip archive of your training images, optionally with matching .txt caption files.

In the atelier: The album you hand the painter. It is the single biggest factor in what the LoRA becomes.

Tip: 15 to 30 sharp, varied images beat 200 sloppy ones. Vary angle, lighting and background; keep the subject consistent.

Watch out: Duplicate or near-duplicate images push the LoRA toward memorizing instead of learning.

Raw schema description

URL to the training data.

trigger_phrasestringrequired

A unique word or phrase baked into your captions that activates the LoRA at inference time.

In the atelier: The skill's calling word. Say it in the prompt and the painter knows to use the bracelet.

Tip: Pick something that is not a real word, like TOK or OHWX, so it does not collide with anything the base model already knows.

Watch out: If you train with a trigger and forget it in your prompts later, the LoRA will seem weak or broken.

Raw schema description

Trigger phrase for the model.

Optional

include_synthetic_captionsbooleandefault: false

Generates captions for your images automatically instead of relying on the trigger phrase alone.

Tip: Turn on for varied datasets; richer captions help the model separate subject from scene.

Raw schema description

Whether to include synthetic captions.

use_face_detectionbooleandefault: true

Centers each training crop on the detected face when resizing. The related use_face_cropping goes further and crops to the face itself.

Tip: Keep detection on for people. Enable cropping only when faces are small in the frame.

Raw schema description

Whether to use face detection for the training data. When enabled, images will use the center of the face as the center of the image when resizing.

use_face_croppingbooleandefault: false

Whether to use face cropping for the training data. When enabled, images will be cropped to the face before resizing.

use_masksbooleandefault: true

Whether to use masks for the training data.

stepsintegerdefault: 100010 – 6000

How many training iterations the model runs on your dataset. More steps means the LoRA sees your images more times.

In the atelier: Practice repetitions. Too few and the painter never picks up the skill. Too many and he stops learning and starts memorizing your exact photos.

Tip: Around 1000 is a solid default for a 15 to 30 image subject dataset. Small datasets need fewer steps, not more.

Watch out: If outputs start reproducing your training photos almost exactly (same pose, same background), you overtrained. Go back down.

Raw schema description

Number of training steps.

learning_ratenumberdefault: 7e-40.000001 – 0.1

How big each learning update is. Controls how aggressively the model changes per step.

In the atelier: The painter's eagerness. A high rate is frantic practice: fast but sloppy, and it can wreck habits he already had. A low rate is careful practice: slow, but precise.

Tip: Stay near the trainer's default unless you have a reason. If results look fried or oversaturated, lower it. If the subject barely shows after many steps, raise it slightly or add steps.

Watch out: Learning rate and steps trade off against each other. Doubling both at once is how datasets get burned.

Raw schema description

Learning rate for training.

is_stylebooleandefault: false

Switches training from subject mode to style mode, disabling subject-focused tricks like masking.

In the atelier: Telling the painter: do not learn this person, learn this way of painting.

Tip: Style datasets should show the style across many different subjects, or the style and the subject will fuse.

Raw schema description

Whether the training data is style data. If true, face specific options like masking and face detection will be disabled.

Call it

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/wan-22-image-trainer", {
  input: {
    "training_data_url": "https://your-cdn.com/dataset.zip",
    "trigger_phrase": "...",
    "steps": 1000,
    "learning_rate": 0.0007
  },
  logs: true,
});
console.log(result.data);

Run the result with

Wan v2.2 A14B Image-to-Video A14B with LoRAs

fal-ai/wan/v2.2-a14b/image-to-video/lora