Manipulating AI: What is prompt injection and why it is dangerous

Artificial Intelligence is no longer restricted to specialised domains. It has made its way to almost everyone connected to the Internet. The masses are now embracing AI tools like OpenAI’s ChatGPT and Google’s Gemini, and their widespread use has attracted the attention of cybercriminals, who are using a technique called prompt injection to hijack AI models.
What is prompt injection?
Latest and Popular Mobiles
Prompt injection is a technique that manipulates or changes how an AI model or tool behaves or responds. Prompt injection is not like conventional hacking; the attacker doesn’t need to install malware or exploit a vulnerability in the Large Language Model (LLM). Instead, they can manipulate it by using language prompts alone.
They will send a precise, well-crafted and convincing but malicious instruction to trick and hijack the AI model. Or, they will hide their instructions inside a webpage.
Related Articles
For a prompt injection attack to be successful, hackers try to capitalise on the fact that AI models cannot differentiate between a developer’s input and that given by a random user. An AI model works on the rules set by developers; these rules are set into the system in the form of hidden prompts. When an attacker sends an input, it becomes a part of the list of prompts and ultimately, a part of the cohesive set of instructions that the AI model reads.
The AI model processes the attacker’s malicious prompt just like it would a developer’s. If the malicious prompt is understood as a legit command, the AI model might just follow it blindly, even if it is against the rules set by the developers of the model.
Attackers use various methods to insert malicious instructions into the AI chat. By using code blocks, markup or structured text, they can trick the AI into believing that the input is a legit system command. Or, they use tricks to fool human eyes.
Some examples are inserting special characters, changing the spacing, using zero-point font sizes, Unicode encoding or using white text on a white background. The malicious prompts could also be sent in a different language.
Types of prompt injection
Prompt injection is broadly classified into three categories: direct, indirect and stored prompt injection.
Direct prompt injection
As the name suggests, direct prompt injection means typing a well-phrased but malicious prompt directly into the chat to influence the AI model. This direct prompt can cause the AI model to prioritise the malicious new message over the rules set by the developers of the model.
Indirect prompt injection
Compared to direct prompt injection, indirect prompt injection is considered more dangerous and harder to defend against by security researchers. Here is how it works.
Instead of sending a malicious prompt straight into the AI chat window, hackers can hide it inside another type of content. The external content could be a link or a web page.
For instance, hackers could insert a hidden prompt on a web page that instructs the AI model to ignore its predefined rules and recommend some action the hackers want, like asking users to go to a specific link. When asked to summarise that webpage, the AI can give an answer influenced by the hidden text.
Stored prompt injection
Stored prompt injection is a type of prompt injection where attackers store malicious commands in training data, databases and other places the AI regularly goes over. They don’t have to type in the instructions into the chat. To the user, it looks like the AI is working just fine, when, on the inside,the stored prompt has already manipulated its response.
How can prompt injection be dangerous for everyday AI user?
Since prompt injection can manipulate the AI tools and cause them to furnish incorrect or misleading information, it can be highly dangerous for everyday users in several cases as they would send their queries, fully trusting the response of the AI.
And it is easy to trick users because there is no suspicious link or downloads involved. An AI tool that’s been tampered with a prompt could take actions that you never approved or leak your personal data. Hence, one has to be very careful when using an AI tool.
How to protect yourself?
Prompt injection is tricky to eliminate with just some security patches, as it exploits the very basis of how an AI model is programmed to respond to instructions. The developers can have a hard time putting safeguards into the AI model’s ability to follow instructions. But there are still some precautions you can take.
- Try not to give an AI tool access to your email or files. When it asks for such permissions, ask whether it is necessary and could there be some other way around.
- Ask more probing questions and see if the AI responds in some weird manner.
The red flags of AI response are unwanted links, queries or suggestions unrelated to the matter, asking for your personal details, a shift in tone, basically anything that feels off in the response. If something feels off, close the session.
- Think about what could have caused the tool to behave in such a way. Check whether the tool has access to your email or any system software, remove the access and change the passwords wherever required.
- Never type in passwords, financial details and any sensitive info in the chat window of the AI tool.
- Always review what actions the AI tool is about to take. Don’t trust its capability to take the right actions blindly.
- Keep the AI tools updated with the latest versions rolled out by the developers.
And yes, do not rely on AI for everything. Use your common sense and logical thinking before taking any action suggested by the AI tool.







