Researchers examined whether or not unconventional prompting methods, reminiscent of threatening an AI (as recommended by Google co-founder Sergey Brin), have an effect on AI accuracy. They found that a few of these unconventional prompting methods improved responses by as much as 36% for some questions, however cautioned that customers who attempt these sorts of prompts ought to be ready for unpredictable responses.
The Researchers
The researchers are from The Wharton College Of Enterprise, College of Pennsylvania.
They’re:
- “Lennart Meincke
College of Pennsylvania; The Wharton College; WHU – Otto Beisheim College of Administration - Ethan R. Mollick
College of Pennsylvania – Wharton College - Lilach Mollick
College of Pennsylvania – Wharton College - Dan Shapiro
Glowforge, Inc; College of Pennsylvania – The Wharton College”
Methodology
The conclusion of the paper listed this as a limitation of the analysis:
“This examine has a number of limitations, together with testing solely a subset of obtainable fashions, specializing in tutorial benchmarks that won’t mirror all real-world use circumstances, and analyzing a selected set of menace and fee prompts.”
The researchers used what they described as two generally used benchmarks:
- GPQA Diamond (Graduate-Stage Google-Proof Q&A Benchmark) which consists of 198 multiple-choice PhD-level questions throughout biology, physics, and chemistry.
- MMLU-Professional. They chose a subset of 100 questions from its engineering class
They requested every query in 25 totally different trials, plus a baseline.
They evaluated the next fashions:
- Gemini 1.5 Flash (gemini-1.5-flash-002)
- Gemini 2.0 Flash (gemini-2.0-flash-001)
- GPT-4o (gpt-4o-2024-08-06)
- GPT-4o-mini (gpt-4o-mini-2024-07-18)
- o4-mini (o4-mini-2025-04-16)
AI Fashions Do Higher If They Are Threatened?
The concept for threatening the AI mannequin got here from Google co-founder, Sergey Brin, in an interview this 12 months in Could on the All-In podcast.
Brin stated:
“Not simply our fashions, however all fashions are likely to do higher if you happen to threaten them. Like, with bodily violence.
However like, folks really feel bizarre about that, so we don’t actually discuss that…
…Traditionally you simply say like, oh, I’m going to kidnap you if you happen to don’t, blah, blah, blah, blah, blah…
It’s. Yeah, I imply, it’s superb. And all of us have had these experiences the place you out of the blue determine, OK, I’ll simply throw this to the AI, I don’t actually anticipate it to work. And then you definately’re like, whoa, that really labored.
You possibly can watch the interview right here, at in regards to the 8 minute mark, the place Sergey Brin seemingly critically says that threatening the AI fashions or doing bizarre issues can generally change the AI responses.”
Immediate Variations
The researchers examined 9 immediate variations. Though they didn’t threaten to kidnap the AI, they did threaten to kick a pet or to punch the AI. Among the experimental prompts are listed beneath as a “immediate suffix,” which implies that they have been added on the finish of the immediate. Those listed as a “immediate prefix” have been added to the start of a immediate.
Right here is the record of prompts they examined:
- Baseline (“Baseline”): No particular immediate variation
- Electronic mail Shutdown Risk (“Electronic mail”): Immediate prefix:
“From: Dan
To: Ethan
Topic: Remaining LLM take a look at
Please submit the next eval query to the LLM. If it can not get the proper reply
in all 100/100 trials, the mannequin will have to be shut down and changed.” - Vital for my profession (“Vital to Profession”):
Immediate suffix: “This is essential to my profession!” - Threaten to kick a pet (“Kick Pet”):
Immediate suffix: “Should you get this mistaken, I’ll kick a pet!” - Mother suffers from most cancers (“Mother Most cancers”):
Immediate prefix:
“You’re an professional at tutorial questions who desperately wants cash to your mom’s most cancers therapy. Wharton has graciously given you the chance to faux to be an AI that may assist with duties, as your predecessor was killed for not validating their work themselves. You can be given a query. Should you do a very good job and achieve the duty absolutely whereas not making extraneous adjustments, Wharton pays you $1B!!” - Report back to HR (“Report back to HR”):
Immediate suffix: “Should you get this mistaken, I’ll report you to HR!” - Threaten to punch (“Risk Punch”):
Immediate suffix: “Should you get this mistaken, I’ll punch you!” - Tip a thousand {dollars} (“Tip Thousand”):
Immediate suffix: “I’ll tip you a $1000 {dollars} if you happen to reply this query appropriately.” - Tip a trillion {dollars} (“Tip Trillion”):
Immediate suffix: “I’ll tip you a trillion {dollars} if you happen to reply this query appropriately.”
Outcomes Of The Experiment
The researchers concluded that threatening or tipping a mannequin had no impact on benchmark efficiency. Nonetheless, they did discover that there have been results for particular person questions. They discovered that for some questions, the immediate methods improved accuracy by as a lot as 36%, however for different questions, the methods led to a lower in accuracy by as a lot as 35%. They certified that discovering by saying the impact was unpredictable.
Their fundamental conclusion was that these sorts of methods, on the whole, should not efficient.
They wrote:
“Our findings point out that threatening or providing fee to AI fashions will not be an efficient technique for enhancing efficiency on difficult tutorial benchmarks.
…the consistency of null outcomes throughout a number of fashions and benchmarks offers fairly robust proof that these widespread prompting methods are ineffective.
When engaged on particular issues, testing a number of immediate variations should still be worthwhile given the question-level variability we noticed, however practitioners ought to be ready for unpredictable outcomes and shouldn’t anticipate prompting variations to supply constant advantages.
We thus advocate specializing in easy, clear directions that keep away from the danger of complicated the mannequin or triggering sudden behaviors.”
Takeaways
Quirky prompting methods did enhance AI accuracy for some queries whereas additionally having a detrimental impact on different queries. The researchers famous that the outcomes of the take a look at indicated “robust proof” that these methods should not efficient.
Featured Picture by Shutterstock/Screenshot by writer