Without the heavy processing load of past models, this new image generation model creates images in seconds running on consumer GPUs.
Stability AI has released stable diffusion for both researchers and the public. This text-to-image generation model can run on consumer GPUs and creates images at 512×512 pixels in seconds.
The model greatly speeds up image generation without the heavy processing load of past models. It is housed under a Creative ML OpenRAIL-M license that allows for commercial and non-commercial usage. The software package also includes a safety classifier so that users can remove unnecessary or undesirable outputs.
Both researchers and commercial users are encouraged to provide feedback on the image model and to note discrepancies between inputs and the final images. The organization notes that the models were trained on image-text pairs from a broad internet scrape and may still result in some biases. With feedback, they’re confident they can improve the model to reduce and even eliminate such biases.
The team plans for future datasets to expand generation options
The release will also lay the foundation for future datasets and projects expected to come out at a later date. The output will also provide the basis for an open synthetic dataset for research. The team will continue to share updates as they refine the new models and are still accepting benchmark collaborators to work through any further kinks and to refine output.
The ultimate goal is to reduce the processing required to build models and enable more developers to leverage image generation for various projects. Patrick Esser from Runway and Robin Rombach from the Machine Vision & Learning research group at LMU Munich (formerly CompVis lab at Heidelberg University) led the way to the release, building on their prior work on Latent Diffusion Models at CVPR’22. In addition, communities at Eleuther AI, LAION, and Stability AI’s generative AI team offered full support.