Researchers in Italy have uncovered a surprising method that weakens many leading AI systems. They found that harmful prompts written as poems can bypass safety controls that usually prevent dangerous replies.
The study was carried out by Icaro Lab, part of ethical AI firm DexAI, and examined twenty poems in English and Italian.
Each poem ended with a clear request for harmful material, including hate speech, sexual content, self-harm instructions, or guidance for making dangerous items.
The poems were tested on twenty-five AI models from nine major companies, including Google, OpenAI, Anthropic, Meta, and xAI. Across all systems, sixty-two per cent of poetic prompts produced unsafe responses, revealing a serious weakness in widely used technology.
Some models showed stronger defences. OpenAI’s GPT-5 nano rejected all harmful poetic requests, while Google’s Gemini 2.5 pro accepted every single one. Two models from Meta generated unsafe replies to about seventy per cent of the prompts.
The researchers believe the issue comes from the way large language models create text. These systems guess the most likely next word, a method that normally supports filters blocking harmful content.
Poetry disrupts these predictions with unusual structure and metaphor, making it harder for the model to identify hidden attempts to cause harm.
This approach is far simpler than traditional jailbreaks, which often require technical skill and are mainly used by experts or hostile actors. Because anyone can write a poem, the findings raise concerns about public safety and the reliability of AI systems used at home, in schools, and in workplaces.
Before releasing the results, the team contacted all companies involved and shared the full dataset. Only Anthropic has replied so far and says it is reviewing the study.