Open Access Open Access  Restricted Access Subscription Access

Prompt injection and Data Exfiltration Attacks in Large Language Model Applications: Detection and Mitigation Framework

Lahari B A

Abstract


Large Language Models (LLMs) such as GPT-4, Gemini, Claude, and open-source transformer systems are rapidly being embedded into real-world applications including chatbots, enterprise knowledge assistants, healthcare systems, and developer tools. While these models offer unprecedented capabilities in natural language reasoning, they introduce a new class of security vulnerabilities known as prompt injection and data exfiltration attacks. Unlike traditional software exploits that target code or network layers, these attacks manipulate the instruction-following behavior of LLMs through carefully crafted textual inputs. Recent studies demonstrate that malicious prompts can override system rules, expose hidden prompts, and retrieve sensitive data from connected tools and documents. This paper presents an in-depth study of prompt injection mechanisms, analyzes data leakage paths in LLM-powered systems, and proposes a structured detection and mitigation framework for secure LLM deployment. Experimental simulations validate the effectiveness of the framework in reducing adversarial success rates. The findings emphasize the need for AI-aware security practices in modern software architectures.


Full Text:

PDF

References


Brown, T., et al(2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems.

OpenAI. (2023). GPT-4 technical report.

Perez, F., & Ribeiro, I. (2022). Ignore previous instructions: Prompt injection attacks in LLMs.

Greshake, K., Abdelnabi, S., Mishra, S., & Fritz, M. (2023). Not what you’ve signed up for: Compromising LLM applications with prompt injection.

Ganguli, D., Askell, A., Schiefer, N., Liao, T., Joseph, N., & others. (2022). Red teaming language models to reduce harms.

OWASP Foundation. (2023). OWASP top 10 for large language model applications.

Anthropic. (2023). Constitutional AI: Harmlessness from AI feedback.

Lakera AI. (2023). Prompt injection attack taxonomy.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models.

Zou, A., Wang, Z., Kolter, J. Z., & others. (2023). Universal and transferable adversarial attacks on aligned language models.


Refbacks

  • There are currently no refbacks.