EA - The Slippery Slope from DALLE-2 to Deepfake Anarchy by stecas

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Slippery Slope from DALLE-2 to Deepfake Anarchy, published by stecas on November 5, 2022 on The Effective Altruism Forum.OpenAI developed DALLE-2. Then StabilityAI made an open source copycat. This is a dangerous dynamic.Stephen Casper ([email protected])Phillip Christoffersen ([email protected])Rui-Jie Yew ([email protected])Thanks to Tan Zhi-Xuan and Dylan Hadfield-Menell for feedback.This post talks about NSFW content but does not contain any. All links from this post are SFW.AbstractSince OpenAI published their work on DALLE-2 (an AI system that produces images from text prompts) in April, several copycat text-to-image models have been developed including StabilityAI’s Stable Diffusion. Stable Diffusion is open-source and can be easily misused, including for the almost-effortless development of NSFW images of specific people for blackmail or harassment. We argue that OpenAI and StabilityAI’s efforts to avoid misuse have foreseeably failed and that both share responsibility for harms from these models. And even if one is not concerned about issues specific to text-to-image models, this case study raises concerns about how copycatting and open-sourcing could lead to abuses of more dangerous systems in the future.To reduce risks, we discuss three design principles that developers should abide by when designing advanced AI systems. Finally we conclude that (1) the AI research community should curtail work on risky capabilities–or at the very least more substantially vet released models (2) the AI governance community should work to quickly adapt to heightened harms posed by copycatting in general and text-to-image models in particular, and (3) public opinion should ideally not only be critical of perpetrators for harms that they cause with AI systems, but also originators, copycatters, distributors, etc. who enable them.What’s wrong?Recent developments in AI image generation have made text-to-image models very effective at producing highly realistic images from captions. For some examples, see the paper from OpenAI on their DALLE-2 model or the release from Stability AI of their Stable Diffusion model. Deep neural image generators like StyleGan and manual image editing tools like Photoshop have been on the scene for years. But today, DALLE-2 and Stable Diffusion (which is open source) are uniquely effective at rapidly producing highly-realistic images from open-ended prompts.There are a number of risks posed by these models, and OpenAI acknowledges this. Unlike conventional art and Photoshop, today’s text-to-image models can produce images from open-ended prompts by a user in seconds. Concerns include (1) copyright and intellectual property issues (2) sensitive data being collected and learned (3) demographic biases, e.g. producing images of women when given the input, “an image of a nurse” (4) using these models for disinformation by creating images of fake events, and (5) using these models for producing non-consensual, intimate deepfakes.These are all important, but producing intimate deepfakes is where abuse of these models seems to be the most striking and possibly where we are least equipped to effectively regulate misuse. Stable Diffusion is already being used to produce realistic pornography. Reddit recently banned several subreddits dedicated to AI-generated porn including r/stablediffusionnsfw, r/unstablediffusion, and r/porndiffusion for a violation of Reddit’s rules against non-consensual intimate media.This is not to say that violations of sexual and intimate privacy are new. Before the introduction of models such as DALLE-2 and Stable Diffusion, individuals have been victims of non-consensual deepfakes. Perpetrators often make this content to discredit or humiliate people from marginalized groups, taking advantage of the negative sociocultural ...

Visit the podcast's native language site