Modern text-to-image generative models have gained popularity due to their high-quality output and the ability to generate a wide range of images. These models are trained on large datasets from the internet, allowing them to mimic various concepts. However, there is a concern about these models generating explicit or undesirable content.
Researchers from NEU and MIT have developed a method to address this issue. Instead of focusing on post-generation filtering or dataset filtering, their method involves selecting and eliminating a single idea from the weights of a pretrained text-conditional model. This eliminates the need for expensive retraining of big models and provides immediate results.
To make this technology accessible to a larger audience, the researchers have released the Stable Diffusion text-to-image diffusion model as open-source. The initial version of the software included an NSFW filter to prevent the generation of hazardous photos. However, users can easily turn off the filter since both the code and model weights are publicly available.
To ensure the model doesn’t produce sensitive content, the researchers trained the subsequent SD 2.0 model on filtered data that excludes explicit photos. This experiment took a significant amount of time and computational resources, but it effectively reduced the explicit content in the model’s output.
Despite their efforts, the researchers found that the SD 2.0 model still produced explicit images, albeit in reduced quantities compared to the previous version. This indicates the challenge of completely eliminating explicit content from text-to-image models.
Another concern related to these models is the potential copyright infringement. AI-generated art can mimic the styles of genuine artists, leading to allegations of stealing ideas. To address this, researchers have explored adding adversarial perturbations to the artwork before publication to prevent the model from copying it.
In response to safety and copyright concerns, the researchers introduced the Erased Stable Diffusion (ESD) technique. This technique fine-tunes the model’s parameters using only undesirable concept descriptions, without additional training data. Erasing unwanted concepts from the model proves to be more effective than simple blacklisting or post-filtering methods.
User studies were conducted to analyze the effects of erasure on the perception of artistic style and image quality. The researchers compared their approach to Safe Latent Diffusion and found it to be just as effective in removing objectionable images.
The researchers also tested the erasure technique on removing whole object classes, further demonstrating its capabilities. The model weights and code have been open-sourced for further exploration.
In conclusion, these researchers have made significant progress in addressing safety and copyright concerns related to text-to-image generative models. Their Erased Stable Diffusion technique provides a quick and effective way to remove unwanted concepts from the model’s output. However, challenges still exist in completely eliminating explicit content and preventing copyright infringement.