January 15, 2026

FLUX.2 [klein]: Towards Interactive Visual Intelligence

    Models
FLUX.2 [klein]: Towards Interactive Visual Intelligence

FLUX.2 [klein]: Towards Interactive Visual Intelligence

Today, we release the FLUX.2 [klein] model family, our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware with as little as 13GB VRAM.

Try it now for free here

Demo showing editing with FLUX.2 [klein]

Why go [klein]?

Visual Intelligence is entering a new era. As AI agents become more capable, they need visual generation that can keep up; models that respond in real-time, iterate quickly, and run efficiently on accessible hardware.

The klein name comes from the German word for "small", reflecting both the compact model size and the minimal latency. But FLUX.2 [klein] is anything but limited. These models deliver exceptional performance in text-to-image generation, image editing and multi-reference generation, typically reserved for much larger models.

What's New

  • Sub-second inference. Generate or edit images in under 0.5s on modern hardware.
  • Photorealistic outputs and high diversity, especially in the base variants.
  • Unified generation and editing. Text-to-image, image editing, and multi-reference support in a single model while delivering frontier performance.
  • Runs on consumer GPUs. The 4B model fits in ~13GB VRAM (RTX 3090/4070 and above).
  • Developer-friendly & Accessible: Apache 2.0 on 4B models, open weights for 9B models. Full open weights for customization and fine-tuning.
  • API and open weights. Production-ready API or run locally with full weights.

Note: The “FLUX [dev] Non-Commercial License” has been renamed to “FLUX Non-Commercial License” and will apply to the 9B Klein models. No material changes have been made to the license.

Text to Image collage using FLUX.2 [klein]

The FLUX.2 [klein] Model Family

FLUX.2 [klein] 9B

Our flagship small model. Defines the Pareto frontier for quality vs. latency across text-to-image, single-reference editing, and multi-reference generation. Matches or exceeds models 5x its size - in under half a second. Built on a 9B flow model with 8B Qwen3 text embedder, step-distilled to 4 inference steps.

Combine multiple input images, blend concepts, and iterate on complex compositions - all at sub-second speed with frontier-level quality. No model this fast has ever done this well.

License: FLUX NCL

Imagine editing collage using FLUX.2 [klein]


FLUX.2 [klein] 4B:

Fully open under Apache 2.0. Our most accessible model, it runs on consumer GPUs like the RTX 3090/4070. Compact but capable: supports T2I, I2I, and multi-reference at quality that punches above its size. Built for local development and edge deployment.

License: Apache 2.0

FLUX.2 [klein] Base 9B / 4B:

The full-capacity foundation models. Undistilled, preserving complete training signal for maximum flexibility. Ideal for fine-tuning, LoRA training, research, and custom pipelines where control matters more than speed. Higher output diversity than the distilled models.

License: 4B Base under Apache 2.0, 9B Base under FLUX NCL

Output Diversity using FLUX.2 [klein]

Quantized versions

We are also releasing FP8 and NVFP4 versions of all [klein] variants, developed in collaboration with NVIDIA for optimized inference on RTX GPUs. Same capabilities, smaller footprint - compatible with even more hardware.

  • FP8: Up to 1.6x faster, up to 40% less VRAM
  • NVFP4: Up to 2.7x faster, up to 55% less VRAM

Benchmarks on RTX 5080/5090, T2I at 1024×1024
Same licenses apply: Apache 2.0 for 4B variants, FLUX NCL for 9B.


Performance Analysis

FLUX.2 [klein] Elo vs Latency (top) and VRAM (bottom) across Text-to-Image, Image-to-Image Single Reference, and Multi-Reference tasks. FLUX.2 [klein] matches or exceeds Qwen's quality at a fraction of the latency and VRAM, and outperforms Z-Image while supporting both text-to-image generation and (multi-reference) image editing in a unified model. The base variants trade some speed for full customizability and fine-tuning, making them better suited for research and adaptation to specific use cases. Speed is measured on a GB200 in bf16.

Into the New

FLUX.2 [klein] is more than a faster model. It's a step toward our vision of interactive visual intelligence. We believe the future belongs to creators and developers with AI that can see, create, and iterate in real-time. Systems that enable new categories of applications: real-time design tools, agentic visual reasoning, interactive content creation.

Resources

Try it

Build with it

Learn more