Training Data Disclosure
Last Revised on June 9, 2026
The FLUX family of models was trained on a proprietary mix of textual and image data, including data privately acquired from third parties, data provided by data-labeling services and paid contractors, usage data, synthetic data, and data generated internally by Black Forest Labs. Our models are trained on datasets comprised of billions of tokens. We take steps, including deduplication and regularization of datasets, to filter training data for harmful and illegal content, and engage in practices consistent with applicable law. We minimize the collection of personal data, including through deidentification.
Datasets are used to train BFL's models, develop and test safety mitigations, support reinforcement learning, develop and test new capabilities, and improve the efficacy and performance of our models.
We started collecting data in approximately 2024 and continue to collect data today. This disclosure is published pursuant to California Civil Code Section 3111 (AB2013) and is subject to update as models are released or substantially modified.