Ltx 2.3 Quality

video

fal-ai/ltx-2.3-quality/image-to-video/lora

Image-to-video with sound on LTX-2.3, LoRAs on top.

Animates a start image on LTX-2.3 with up to three LoRAs and can generate a matching audio track. The 15-step default follows the distilled workflow; raise it toward 30 for more refinement. This is where ltx23-video-trainer output runs.

Open in fal playground ↗Official API docs ↗

Parameters

Schema facts from the fal API; the notes are ours.

Required

promptstringrequired

The text description of what to generate.

Tip: If your LoRA used a trigger word, include it. Describe the scene around the subject normally.

Raw schema description

The prompt to guide the video generation.

image_urlstringrequired

Input image for image-to-image, editing or video conditioning.

Raw schema description

The URL of the starting image.

loraslistrequired

List of LoRA weights to load, each with a path (URL or HF repo) and a scale.

In the atelier: Which bracelets the painter wears for this painting, and how hard he leans on each.

Tip: Order does not matter; scales do. Start every LoRA at 1.0 and adjust one at a time.

Watch out: Stacking three strong LoRAs usually degrades all of them. Lower each scale when combining.

Raw schema description

Up to 3 LoRAs to apply on top of LTX-2.3. Each path is downloaded through the registry SSRF-safe downloader before ComfyUI loads it from local disk. Max size: 3 GB per LoRA.

Optional

num_framesintegerdefault: 1219 – 481

Number of video frames to generate.

Raw schema description

The number of frames to generate.

resolutionobjectdefault: auto

Output or training resolution.

Tip: Higher costs more and trains slower. Match it to how you will actually generate.

Raw schema description

The size of the generated video. 'auto' derives the size from the input image aspect ratio.

frames_per_secondnumberdefault: 241 – 60

Frames per second of the generated video.

num_inference_stepsintegerdefault: 158 – 30

Number of denoising steps per image. More steps, more refinement, more latency.

Tip: Defaults are tuned per model. Cutting steps in half is the quickest way to trade quality for speed.

Raw schema description

Number of inference steps. Defaults to 15 for this distilled ComfyUI workflow and can be increased up to 30.

guidance_scalenumberdefault: 11 – 20

How strictly generation follows the prompt (classifier-free guidance).

In the atelier: How tightly you hold the painter to the brief. Too tight and the work gets stiff and oversaturated; too loose and he wanders.

Tip: Stay near the endpoint default. Adjust in steps of 0.5.

Raw schema description

Classifier-free guidance scale. The default follows the distilled LTX-2.3 workflow.

generate_audiobooleandefault: true

Includes a generated audio track in the output video. Turn off for a silent MP4.

Raw schema description

Whether to include audio in the returned video. When disabled, the final MP4 is returned without an audio track.

image_strengthnumberdefault: 0.70 – 1

How strongly the video is pinned to the start image: 1.0 reproduces it exactly as frame one, lower gives the model freedom.

Tip: The 0.7 default keeps the subject while allowing motion. Raise it if the first frame drifts.

Raw schema description

Conditioning strength on the start image. 1.0 = exact first-frame match, lower = more freedom for the model.

negative_promptstringdefault: color distortion, overexposure, static, blurry details, subtitles, style, artwork, painting, frame, still, dim overall tone, worst quality, low quality, JPEG compression artifacts, ugly, mutilated, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards

What the model should avoid generating.

Raw schema description

The negative prompt to steer generation away from.

seedinteger

Random seed. Same seed plus same inputs gives a nearly identical image.

Tip: Fix the seed when comparing LoRA scales or parameters, so the only thing changing is the thing you are testing.

Raw schema description

Random seed for reproducibility. If None, a random seed is chosen.

enable_prompt_expansionbooleandefault: true

Lets the endpoint rewrite your prompt with more detail before generating.

Raw schema description

Whether to enable prompt expansion.

enable_safety_checkerbooleandefault: true

Runs a safety filter on outputs.

Raw schema description

Whether to enable the safety checker.

video_qualityenumdefault: highlow | medium | high | maximum

The quality preset of the generated video.

video_write_modeenumdefault: balancedfast | balanced | small

The write mode of the generated video.

sync_modebooleandefault: false

Returns media as a data URI instead of a hosted URL, and skips storing it.

Tip: Useful for privacy; awkward for big files. Most workflows leave it off.

Raw schema description

If True, the media is returned as a data URI inline in the response. Useful for short-lived requests and tests.

Call it

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/ltx-2.3-quality/image-to-video/lora", {
  input: {
    "prompt": "a photo of TOK on a sunny windowsill",
    "image_url": "https://your-cdn.com/input.jpg",
    "loras": [
      {
        "path": "https://your-cdn.com/lora.safetensors",
        "scale": 1
      }
    ]
  },
  logs: true,
});
console.log(result.data);

Train the LoRA with

LTX-2.3 22B Video Trainer

fal-ai/ltx23-video-trainer