Anthropic’s Fable and the State of AI

On June 9th, Anthropic released its Fable generative AI model. Three days later, the US government classified it as a dangerous munition, and used its export-control authority to prohibit any foreign nationals from accessing it. Unable to differentiate between Americans and foreigners, the company shut off access for everyone.

The government’s actions won’t help . The problem isn’t any one particular model; it’s the general trend of increasing AI capabilities. And any real solution requires the sort of collective action that just isn’t possible right now.

Fable is the constrained version of Mythos, the AI model Anthropic announced in April. Anthropic only released it to a few selected organizations, because the company claimed it was so good at finding and exploiting vulnerabilities in computer code that releasing it more generally would be dangerous . It was an obviously self-serving announcement, and because few were able to verify Anthropic’s claims they were met with some skepticism .

Those with access used Mythos to find and patch many vulnerabilities in their own software. But one UK group found the latest, already public, OpenAI model to be just as powerful. Fable is just another incremental improvement in the years-long climb of AI capabilities.

But just as important as the AI model is the “harness.” This is typically not AI. It’s ordinary computer code that interfaces with the user.

It stitches together AI models, decides how and for what purposes they can be used, and gives them useful tools such as web search and the ability to run their own computer code. When Mythos first entered limited release, there was widespread debate whether its power came from the model or the harness. With Mythos demonstrating that it was possible, the open-source community scrambled to build harnesses that could steer other AI models towards similar capabilities.

Harness improvements don’t need massive data or data centers. They largely succeeded. For example, a Prague company was able to replicate Anthropic’s few verifiable cybersecurity capabilities with a much smaller and cheaper model—and a more sophisticated harness.

Last week, a group showed that multiple cheaper models harnessed in concert matches Fable’s performance. The broader community had only a few days with Fable, but that time we learned some about its capabilities . Its difference is less the new model’s raw analytical and problem solving capabilities, and more that the model doesn’t need that sophisticated harness.

Fable requires much less expertise and detailed prompting from the human user. You can give it a difficult goal and it will figure out novel and unexpected ways to satisfy it, finding loopholes in whatever constraints you or the system have imposed on it. “Relentlessly proactive” is how AI researcher Simon Willison described it.

Another descriptor might be “creative.” Experienced AI developers have had that combination of creativity and proactivity since last year , but Fable puts it within easy reach of everyone. In the hands of someone with a legitimate problem that needs solving, that can be an incredibly useful capability.

But in the hands of someone who wants to do harm, it can be equally dangerous. AIs don’t have a moral compass in the same way that people do. They are agents of the wants and desires of the people who prompt them.

That points to the real problem with relentlessly proactive AI. In language, wants and desires are always underspecified. If I ask you to get me some coffee, you would probably pour me a cup from the coffeepot, or buy one from a nearby coffee shop.

You couldn’t buy me a pound of raw beans, or a coffee plantation. You wouldn’t order a cup of coffee for delivery next month. You wouldn’t find a nearby person, rip a cup of coffee out of their hands, and bring it to me.

I wouldn’t have to specify any of the million limitations to my request; you would just know. Human stories are filled with warnings about underspecified desires. King Midas wished that everything he touch turn to gold, forgetting to add “but not my food, drink, and daughter.”

And genies are notorious for granting your wish in a way you wish they hadn’t. The deeper point is that it’s impossible to list all limitations and restrictions, and like a malicious genie, a creative AI will find the ones you forgot. Block a database you don’t want it to have access to, and it might figure out how to bypass your control.

Ask it to book a flight, and it might hack the airline because the website says the flight is sold out. Ask it to save money on your cellphone plan, and it might cancel it altogether—or get someone else to pay for it. As far as we know now AI has not done any of this yet, but you get the idea.

Malicious intent is not required. To an AI model, constraints are just things to get around and not general truisms about the world. They are creative problem solvers and natural rule breakers.

They “hack” in the sense that they find and exploit loopholes. Human systems rely on so many norms that we scarcely recognize the existence of until they are broken. AIs naturally think outside the box, because they don’t have any real conception of what the box is or why it’s there in the first place.

There is no foolproof way to prevent people from using AI models to complete harmful tasks. There is no way to prevent the models from incidentally causing harm while completing benign tasks. AI models are no longer isolated from the real world.

They browse the internet and answer emails. They trade stocks and make purchases. They control physical systems.

They are, in effect , robots that affect life and property. We have no technical mechanisms to verify the integrity of an AI system. This level of capability and creativity in the hands of us untrustworthy humans will have both great and terrible results.

The problem is not unique to Anthropic .

Originally published at schneier.com

#Cryptography #Cybersecurity #Policy

Anthropic’s Fable and the State of AI

Talk to an architect about applying this to your stack.

More from the journal

Embedding Forbidden Text in Spyware to Discourage AI Analysis

AI Use by the US Government

Flock Cameras Are Being Used for Stalking