Last Updated Version Changes
7/28/2023 1.0 First version! This document will be kept up to date with SDXL 1.0 Developments
8/1/2023 1.1
8/3/2023 1.2 Training SDXL

What is SDXL 1.0?

SDXL 1.0 is a groundbreaking new text-to-image model, released on July 26th. A precursor model, SDXL 0.9, was available to a limited number of testers for a few months before SDXL 1.0’s release.

Technologically, SDXL 1.0 is a leap forward from SD 1.5 or 2.x, boasting a parameter count (the sum of all the weights and biases in the neural network that the model is trained on) of 3.5 billion for the base model and a 6.6 billion for the second stage refiner. In contrast, SD 1.4 has just ~890 million parameters. Further information about the inner workings of SDXL can be found on Stability AI’s SDXL research paper here;

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Aside from ~3x more training parameters than previous SD models, SDXL runs on two CLIP models, including the largest OpenCLIP model trained to-date (OpenCLIP ViT-G/14), and has a far higher native resolution of 1024x1024, in contrast to SD 1.4/5’s 512x512, allowing for greatly improved fidelity and depth.

Unlike previous SD models, SDXL uses a two-stage image creation process. The base model generates the initial latent image (txt2img), before passing the output and the same prompt through a refiner model (essentially an img2img workflow), upscaling, and adding fine detail to the generated output.


What does all that mean for image generation?

The most immediately apparent difference between SDXL and previous SD models is the range and depth of output in photorealistic images.

In short; qualitydepthlightingflexibility, and fidelity.