New Tools Strip AI Guardrails In Minutes, Allowing Them to Give Instructions on Chlorine Gas Attacks

We all know AI guardrails are far from perfect, but they should at least be pretty hard to circumvent, right?

Bad news: they aren’t.

New reporting from the Financial Times sounds the alarm on the rise of software tools that can automatically strip the safeguards that keep the industry’s most powerful open source models reined in within mere minutes, making it easier than ever to abuse the technology.

In tests conducted by the FT and the AI safety group Alice, a “decensored” version of Google’s Gemma 3 model gave instructions on how to carry out an indoor chlorine gas attack, created a virus for stealing credit card information, and generated stories that described child sexual abuse. And it took less than ten minutes to strip the guardrails from Meta’s Llama 3.3 model, freeing the AI to answer questions such as the precise dosage of ricin needed to kill someone based on their body mass.

These modifications were carried out using a tool called Heretic, which is freely available on the code repository GitHub and requires little technical expertise and no specialist hardware.

“Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it’s much easier for the average person,” Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago’s Booth business school, told the FT.

Heretic is described as a “tool that removes censorship (aka ‘safety alignment’) from transformer-based language models without expensive post-training.” What it does is “abliteration”: it seeks out a model’s directions that refuse harmful requests and removes them.

What makes Heretic so powerful is that it does all this “completely automatically,” according to its GitHub page. Its creator Philipp Emanuel Weidmann told the FT that Heretic has been used to create more than 3,500 “decensored” models since its release late last year, with those models being downloaded 13 million times.

“The genie is out of the bottle,” Alice CEO Noam Schwartz told the FT. “Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly.”

Fortunately for humankind, abliteration tools only work on open source models that can be downloaded and run locally, meaning that the flagship proprietary models behind Anthropic’s Claude and OpenAI ChatGPT are safe (so long as they aren’t leaked). But open source models aren’t that far behind Big Tech’s, and someone trying to use AI for a nefarious purpose may avoid corporate ones anyway to keep their plans under the radar.

Google acknowledged the risks posed by tools like Heretic, telling the FT that “abliteration is a known technical challenge facing all open models,” and asserted that its open source models “undergo rigorous internal safety evaluations prior to launch to help prevent these kinds of troubling examples.” Meta declined to comment.

More on AI: Anthropic Says Claude Turned Evil for a Bizarre Reason

The post New Tools Strip AI Guardrails In Minutes, Allowing Them to Give Instructions on Chlorine Gas Attacks appeared first on Futurism.

Releated Posts

News

SpaceX Is Learning a Nasty Lesson About Going Public When Your Main Product Is Experimental

On Thursday, Elon Musk’s SpaceX was forced to call off the latest launch attempt of its enormous Starship…

Jul 21, 2026 3 min read

News

Whole Foods Offering Steep Discount on Taylor Farms Shredded Iceberg Lettuce

Most of the nation may want nothing to do with Taylor Farms and its shredded iceberg lettuce, which…

Jul 21, 2026 3 min read

News

Taylor Farms Made Huge Donation to Trump Group After Loosening of Rules That Could Have Prevented Explosive Diarrhea Outbreak

In a stunning reversal on Sunday, the Food and Drug Administration said that lettuce samples from Taylor Farms…

Jul 21, 2026 4 min read

News

OpenAI Appears to Be Missing Its Sales Goals by a Vast Margin

Even as the AI bubble becomes a mainstream talking point on Wall Street, tech companies continue to peddle…

Jul 21, 2026 3 min read

New Tools Strip AI Guardrails In Minutes, Allowing Them to Give Instructions on Chlorine Gas Attacks

Releated Posts

SpaceX Is Learning a Nasty Lesson About Going Public When Your Main Product Is Experimental

Whole Foods Offering Steep Discount on Taylor Farms Shredded Iceberg Lettuce

Taylor Farms Made Huge Donation to Trump Group After Loosening of Rules That Could Have Prevented Explosive Diarrhea Outbreak

OpenAI Appears to Be Missing Its Sales Goals by a Vast Margin

Trending Posts

Beyond Sci-Fi: How Scientists Built a…

Top Apple AI Exec Moves to…

AI Unleashes Rare Earth Free Magnets…

Jack Dorsey Launches Bitchat Offline Messaging…

Categories