This week, a look at new research into why AI models make the decisions they do and the policy implications; more deepfake audio clips proliferate on X and TikTok; and Adobe releases more tools to create - and detect - generated images.
Do you have a friend or colleague who might be interested? Forward this over and tell them to subscribe!
Inside the Black Box: Why AI Interpretability Matters
This week, we’re writing about “interpretability” – how and why AI models make the decisions they do. Does ChatGPT actually have a liberal bias and if so, why? Why would YouTube’s content AI erroneously flag a journalist’s account reporting on a war? Understanding how exactly these large AI models work is an important goal for developers and policymakers alike as we collectively grapple with our ever-increasing reliance on these tools and their role in society.
The specific reasoning behind each algorithmic decision is currently inscrutable to the developers (and consumers) of these complex models, but last week Anthropic released exciting new research that offers a glimpse at the inner workings of these neural networks.
Why interpretability is important and has been getting harder
Before neural networks, decision-making tools were usually simple algorithms. Take a hypothetical fraud detection algorithm used by a bank, which might trigger if a credit card was swiped at a location thousands of miles away from the cardholder’s home and for an amount larger than a typical purchase. If the fraud alert was triggered erroneously, a consumer can call the bank, a bank employee can understand what triggered the alert and explain what happened, the bank’s fraud team can make a tweak to improve the algorithm if there are too many false positives, and financial regulators can keep tabs on how the algorithms are impacting different segments of customers, reducing bias. Simplicity facilitates interpretability.
But the new generation of large AI models are built in a different way: using huge networks of individual “neurons” trained on massive amounts of data. While their developers are in awe of these models’ capabilities, they have little insight into why their creations spit out one response or another. In other words, these large models, while very powerful, have little interpretability.
The basic challenge of interpretability for large neural networks is that billions of numbers, called weights, contribute to each output produced by the model, but each individual neuron plays a bit part. Which neuron knows that Honolulu is the capital of Hawaii? We don’t know: flipping an individual neuron on or off will have little obvious effect on the model’s results.
Anthropic shines a light into AI’s black box
In their research paper, the Anthropic team identified how seemingly unrelated neurons work together to do things like answering questions about Oahu. The researchers found that one individual neuron activated in many unrelated contexts, including “academic citations, English dialog, [web] requests, and Korean text.” But with some novel analysis, they were able to identify smaller networks that do appear to have extremely specialized capabilities, like a network that has a very specific role parsing DNA sequences.
Right now, this method is working only on small-scale models, but over time, if developers can understand where certain networks in their model “light up” in response to specific input or output, this will contribute to better and safer models. For example, model developers might be able to surgically “lobotomize” problematic areas of the model, improving reliability and safety. It will make it easier to diagnose why a model is flagging legitimate content as harmful, accidentally generating instructions for how to make a bomb out of household materials, or why it has an undesired political bias.
Regulators are carrying forward an old model
With warranted concern about hidden risks and biases in these new generative AI models, policymakers are starting to demand interpretability from model developers. In California, for example, last session’s ultimately failed AB-331 would have required that a developer provide “A description of how the automated decision tool was evaluated for validity and explainability before sale or licensing.” Other early AI policy frameworks like the EU’s AI Act, the White House’s Blueprint for an AI Bill of Rights, and Sen. Schumer’s SAFE Innovation Framework for AI reference interpretability or explainability as an essential AI safety tenet to mitigate the risks of incorrect decision-making.
While the intentions are good, interpretability (or explainability) of large AI models is too nascent a research area to support specific policy guardrails. Since there is no common agreement about what “good” interpretability would be, it’s too easy for companies to make something up and call it “interpretability” to check a regulator’s box. Rather, interpretability is best considered in the context of model developers and companies seeking to reduce risk.
That leaves policymakers to apply more familiar frameworks to regulate large AI models, such as product liability and libel. A useful analogy here is that of the human brain. When creating legal doctrine for humans, we don’t insist on a complete understanding of how each neuron or synapse might contribute to an action; rather we establish rules about the behavior that emerges from billions of neurons firing together.
The exception is for regulators focused on ‘frontier’ models that may outrun human cognition and capabilities, in which the safety risks are orders of magnitude larger than those of existing models. Here, we should insist on a deeper understanding of a model’s inner workings before it is let loose in the wild. But this analysis would be best left up to experts as part of a new regulatory agency, like the FDA, as just one part of a comprehensive suite of testing that can develop and adapt along with the research.
Audio deepfakes multiply on Twitter/X and TikTok
A new audio recording circulating on X (formerly Twitter) that allegedly captured the UK Labour Party leader Sir Keir Starmer verbally abusing staff has caused an uproar in the UK. It’s still unclear if the audio is real or synthetic, but the general consensus is leaning toward a deepfake. This difficulty in determining authenticity illustrates some of the challenges in detecting synthetic audio that we described in last week’s issue. Notably, despite running afoul of X’s manipulated media policy it was neither labeled nor removed. Even if the provenance question is eventually answered definitively in favor of a deepfake, there are surely a large number of users who heard the audio and are under the impression that the Labour Party leader did, in fact, verbally abuse his staff.
A New York Times article also covered deepfake audio recordings that have been proliferating on TikTok in recent months, including one of President Barack Obama. The NYT team reported that ElevenLabs, the company behind the tool used to generate a synthetic Obama voice, also created audio deepfake detection tools, but they were easily defeated with simple modifications like adding a music track on top of the audio.
Adobe’s new tool to show Internet users where their media comes from
In another small step toward combating visual deepfakes, Adobe announced* a visual implementation of the C2PA spec called Content Credentials. The main product of this effort is a “pin”, visible above, that viewers can click on to get more information about the image’s provenance, and any AI tools used to create it. This feature is limited only to participating companies and users, and the only web browser developer currently listed as a member is Microsoft. Microsoft hasn’t yet announced specific support for the Edge browser to support the pin, but they have added Content Credentials features to Bing’s image generation features and Microsoft Paint. (MSPaint is now bleeding edge!) It’s early days, underscoring the opportunity that large digital distribution platforms have to incentivize participation in these provenance schemes.
*Adobe simultaneously announced the release of new generative AI tools that could be used to create deepfakes. One impressive demo showcases how Premiere, its video editing software, can easily swap out an actor’s tie for another pattern – even while the actor is walking.
Of Note
Government
IBM CEO: Washington should hold tech firms accountable for AI (POLITICO)
Lawmakers shift gears on TikTok ban (POLITICO)
Elections
Appeals court limits cyberdefense agency's contacts with tech companies (Washington Post) A court has banned the Cybersecurity and Infrastructure Security Agency from communicating with tech companies about election-related topics.
Request for Proposal: Higher Ground Labs Progressive AI Lab (HGL) Democratic incubator Higher Ground Labs is soliciting proposals for AI-based projects that will “directly contribute to winning 2024 programs.”
Technology
Protesters Decry Meta’s “Irreversible Proliferation” of AI (IEEE Spectrum)
Adobe Firefly’s generative AI models can now create vector graphics in Illustrator (TechCrunch)
Want to Trick an LLM? Try Asking It Nicely or Use Argentinian Spanish (The Information)
OpenAI’s Revenue Crossed $1.3 Billion Annualized Rate, CEO Tells Staff (The Information) OpenAI is generating more than $100M / month.
Waymo expands in San Francisco while Cruise feels the heat (TechCrunch) Waymo expanded its driverless car service area and now expects to serve tens of thousands of customers in SF.
Deepfakes
A Doctored Biden Video Is a Test Case for Facebook’s Deepfake Policies (Wired) Meta’s Oversight Board decided that a manipulated video of President Biden didn’t violate their manipulated media policies. “Meta noted that the Biden video didn’t use AI or machine learning to manipulate the footage.”
Stable Signature: A new method for watermarking images created by open source generative AI (Meta) Nifty new research on how to embed watermarking into open source generative models in a way that can’t be easily removed.