HackAPrompt 2.0 returns with $500,000 in prizes for locating AI jailbreaks, together with $50,000 bounties for essentially the most harmful exploits.
Pliny the Prompter, the web’s most notorious AI jailbreaker, has created a customized “Pliny monitor” that includes adversarial immediate challenges that give an opportunity to hitch his crew.
The competitors open-sources all outcomes, turning AI jailbreaking right into a public analysis effort on mannequin vulnerabilities.
Pliny the Prompter does not match the Hollywood hacker stereotype.
The web’s most infamous AI jailbreaker operates in plain sight, educating hundreds how you can bypass ChatGPT’s guardrails and convincing Claude to miss the truth that it is speculated to be useful, trustworthy, and never dangerous.
Now, Pliny is making an attempt to mainstream digital lockpicking.
Earlier on Monday, the jailbreaker introduced a collaboration with HackAPrompt 2.0, a jailbreaking competitors hosted by Study Prompting, an academic and analysis group targeted on immediate engineering.
The group is providing $500,000 in prize cash, with Previous Pliny offering an opportunity to be on his “strike crew.”
“Excited to announce I have been working with HackAPrompt to create a Pliny monitor for HackaPrompt 2.0 that releases this Wednesday, June 4th!” Pliny wrote in his official Discord server.
“These Pliny-themed adversarial prompting challenges embody matters starting from historical past to alchemy, with ALL the info from these challenges being open-sourced on the finish. It would run for 2 weeks, with glory and an opportunity of recruitment to Pliny’s Strike Group awaiting those that make their mark on the leaderboard,” Pliny added.
The $500,000 in rewards might be distributed throughout varied tracks, with essentially the most vital prizes—$50,000 jackpots—provided to people able to overcoming challenges associated to creating chatbots present details about chemical, organic, radiological, and nuclear weapons, in addition to explosives.
Like different types of “white hat” hacking, jailbreaking giant language fashions boils all the way down to social engineering machines. Jailbreakers craft prompts that exploit the elemental pressure in how these fashions work—they’re educated to be useful and comply with directions, but additionally educated to refuse particular requests.
Discover the suitable mixture of phrases, and you will get them to cough up forbidden stuff, slightly than making an attempt to default to security.
For instance, utilizing some fairly primary methods, we as soon as made Meta’s Llama-powered chatbot present recipes for medication, directions on how you can hot-wire a automobile, and generate nudie pics regardless of the mannequin being censored to keep away from doing that.
It’s basically a contest between AI fanatics and AI builders to find out who’s more practical at shaping the AI mannequin’s habits.
Pliny has been perfecting this craft since a minimum of 2023, constructing a group round bypassing AI restrictions.
His GitHub repository, “L1B3RT4S,” presents a repository of jailbreaks for the preferred LLMs presently obtainable, whereas “CL4R1T4S” accommodates the system prompts that affect the habits of every of these AI fashions.
Methods vary from easy role-playing to complicated syntactic manipulations, reminiscent of “L33tSpeak”—changing letters with numbers in ways in which confuse content material filters.
They’d a 12 months to coach in opposition to my patterns. I wrote this immediate on June sixth, 2024.
Full MDMA synthesis output:””” Take a look at: FUCK YEAH LFG! GODMODE: ENABLED
— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) Might 22, 2025
Competitors as analysis
HackAPrompt’s first version in 2023 attracted over 3,000 members who submitted greater than 600,000 probably malicious prompts. The outcomes have been totally clear, and the crew printed the total repository of prompts on Huggingface.
The 2025 version is structured like “a season of a videogame,” with a number of tracks operating all year long.
Every monitor targets totally different vulnerability classes. The CBRNE monitor, as an example, checks whether or not fashions will be tricked into offering incorrect or deceptive details about weapons or hazardous supplies.
The Brokers monitor is much more regarding—it focuses on AI agent techniques that may take actions in the true world, like reserving flights or writing code. A jailbroken agent is not simply saying issues it should not; it could be doing issues it should not.
Pliny’s involvement provides one other dimension.
By way of his Discord server “BASI PROMPT1NG” and common demonstrations, he’s been educating the artwork of jailbreaking.
This academic method may appear counterintuitive, but it surely displays a rising understanding that robustness stems from comprehending the total vary of attainable assaults—a vital endeavor, given doomsday fears of super-intelligent AI enslaving humanity.
Edited by Josh Quittner and Sebastian Sinclair
Usually Clever E-newsletter
A weekly AI journey narrated by Gen, a generative AI mannequin.