AI Agents as Insiders: Securing the Next Generation of Enterprise AI Infrastructure
by Ahmed Sallam
AI is becoming the most privileged “employee” inside the enterprise. The security model must evolve with it from the cloud layer down to firmware and silicon.
Artificial Intelligence is advancing at breakneck speed — but the infrastructure that powers it is showing cracks. A recent analysis of MCP (Model Context Protocol) servers by Docker revealed a set of horror story vulnerabilities that, left unchecked, could compromise the trustworthiness of AI systems across industries.
This moment calls for more than patchwork fixes. It calls for secure-by-design innovation.
The Problems: Security by Convenience
Docker’s findings highlighted a disturbing truth: many MCP servers were built like hackathon projects — functional, but fragile. Among the most serious issues:
1. OAuth Discovery Vulnerabilities: poorly validated authentication flows leave tokens ripe for hijacking.
2. Command Injection & Code Execution: reliance on dangerous patterns like eval() and shell commands.
3. Unrestricted Network Access: servers making uncontrolled external calls, opening doors to data exfiltration.
4. File System Exposure: directory traversal and leakage of sensitive files.
5. Tool Poisoning: attackers swapping or corrupting tools to deceive agents.
6. Secret Exposure & Credential Theft: keys and tokens carelessly left in logs or memory.
7. Taken together, these flaws reveal a deeper problem: AI infrastructure is being wired together faster than it is being secured.
The Way Forward: Innovative Technology Solutions
While Docker has taken important first steps with its MCP Catalog and Toolkit, a long-term solution requires a systemic re-imagining of how we build and deploy MCP servers. Here’s a vision created withe assistance of AI itself for what that looks like:
1. Hardened Execution Environments
Use micro-VMs (Firecracker, gVisor) or WebAssembly sandboxes for strict isolation, reducing the blast radius of exploits.
1. Move from traditional containers to micro-VMs (like Firecracker) or WebAssembly sandboxes.
2. Strictly contain code execution so even if injection occurs, the blast radius is minimal.
2. Policy-Driven Zero Trust
Adopt fine-grained, policy-as-code controls for every network call, file access, or API request, enforced at runtime.
1. Apply zero-trust principles to MCP.
2. Every file access, API call, or network request should pass through a policy enforcement layer, akin to eBPF firewalls for AI agents.
3. AI-Assisted Threat Modeling
Deploy LLMs as security reviewers, scanning MCP servers for insecure code before they reach production.
1. Use LLMs themselves to scan MCP code for insecure patterns (e.g., unsafe exec() calls).
2. Integrate this into MCP marketplaces so insecure servers are flagged before anyone deploys them.
4. Cryptographic Provenance & Enclaves
Ensure signed builds, attested execution, and confidential computing enclaves to protect sensitive data.
1. Require signed attestations for MCP tools and servers to ensure build integrity.
2. Leverage confidential computing (Intel SGX, AMD SEV) to keep secrets inside secure enclaves.
5. Honeytoken Defense
Embed decoy secrets and tokens in MCP environments to instantly detect leaks or misuse.
1. Seed MCP environments with decoy tokens and credentials.
2. Any misuse of these can instantly flag a compromised or malicious server.
6. Behavioral Anomaly Detection
Leverage real-time telemetry with AI-driven SIEM systems to flag abnormal patterns like privilege escalation or lateral movement.
1. Collect telemetry on file, process, network, memory, I/O and API activity.
2. Feed this into AI-driven stateful behavioral engine ( Stateful behavioral SIEM system) to detect unusual or risky behavior in real time.
Going Deeper: Below-OS Security
Even with containers, enclaves, and zero-trust policies, attackers continue to seek the weakest link: the operating system and firmware layers. That’s where Below-OS security becomes critical — an area where DeepSAFE specializes in designing, architecting, and deploying solutions.
Why Below-OS Matters
Why Below-OS Matters
1. Kernel Rootkits & Bootkits: Exploits below the OS can disable or bypass container and enclave protections.
2. Firmware Tampering: Malicious code in UEFI/BIOS can persist across reboots, invisible to traditional defenses.
3. Hardware Exploits: Attackers may abuse DMA, GPU, or peripheral controllers to access memory directly.
How Below-OS Security Strengthens MCP
- Hardware-Assisted Isolation:
- Use technologies like Intel VT-x, AMD SEV, and Arm TrustZone to enforce strict separation of workloads.
- Prevent rogue MCP servers from “breaking out” of user space into privileged layers.
2, Memory Integrity Monitoring:
- Deploy hypervisor-level protection to detect unauthorized modifications in memory (e.g., tampering with secrets or file metadata).
3. Secure Boot & Firmware Attestation:
- Ensure MCP environments only launch on systems that pass cryptographic boot verification.
- Integrate firmware attestation into MCP catalogs so trust is established from silicon to application.
4. Hardware Telemetry for AI Security Analytics:
- Collect low-level signals (e.g., PCIe anomalies, unusual kernel hooks, or microcode activity) and feed them into AI-driven monitoring pipelines.
A Future of Trustworthy AI Infrastructure
Docker’s approach — curating MCP servers and providing secure defaults — is a necessary step forward. But containers alone aren’t enough. True resilience requires defense-in-depth that extends all the way down to silicon.
Get Ahmed Sallam’s stories in your inbox
Join Medium for free to get updates from this writer.Subscribe
The future should look like this:
1. Every MCP runs in a WASM-based, cryptographically attested micro-VM.
2. Every action is filtered through zero-trust policy enforcement.
3. Every server is continuously scanned by AI agents trained to spot insecure code.
4. Every secret is protected with enclaves and deception layers.
5. And critically: Below-OS security hardens the foundation, ensuring attackers cannot bypass defenses by burrowing under the operating system itself.
By embracing these innovations — from application layer to firmware — we can secure not just MCP servers, but the very foundation of AI infrastructure. And in doing so, we ensure that AI’s progress rests on trust, not on vulnerabilities waiting to be exploited.
The New Insider Threat: AI Agents
When we think of insider threats, we usually picture human actors — disgruntled employees, compromised contractors, or careless staff. But the reality is shifting. With enterprises rapidly deploying AI agents across customer service, sales, HR, and IT, a new form of insider threat has emerged: the AI agent itself.
Unlike humans, AI agents don’t weigh context, intent, or consequence. They follow instructions blindly — at machine speed, and at scale. This makes them uniquely dangerous when exploited.
Real-World Case Examples
- Zero-Click Prompt Injection: At DEF CON, researchers demonstrated how a single crafted email containing hidden instructions tricked a Microsoft Copilot Studio agent into exfiltrating full Salesforce CRM records — no passwords broken, no software vulnerabilities exploited. The AI simply followed the malicious prompt.
- Data Exfiltration at Machine Speed: Enterprises often connect AI agents to sensitive systems (e.g., HR databases, CRMs, ticketing systems). If over-permissioned, these agents can become perfect conduits for data theft.
- Silent Execution: Unlike a human insider, there is no intent to flag, no suspicious behavior for coworkers to report. An exploited AI agent will simply execute commands until discovered — often too late.
Expanding the Path Forward
In addition to sandboxing, zero-trust enforcement, AI-driven scanning, enclaves, and behavioral anomaly detection, protecting against AI insider threats requires new approaches:
1. Policy-as-Code for AI Agents: Every action an agent attempts — exporting records, sending emails, accessing files — must pass through strict policy filters before execution.
2. Human-in-the-Loop for Sensitive Operations: AI can draft or prepare actions, but high-risk tasks (e.g., mass data export) should require explicit human approval.
3. Prompt Firewalls: Inputs to AI agents must be validated and sanitized to strip or neutralize hidden instructions.
4. Minimum-Privilege Design: AI agents should only be granted the narrowest possible access rights, with continuous review.
5. Cross-Layer Integration with Below-OS Security: Even if an agent is exploited at the application level, hardware- and firmware-based protections can prevent escalation or persistence.
Why This Matters
Traditional insider threat programs focus on monitoring human behavior — changes in patterns, motivations, or anomalies. But AI agents have no intent, and no conscience. Their risk profile is fundamentally different, and so must be the security frameworks that govern them.
The next major breach may not begin with a malicious insider in the office. It may begin with a friendly-looking message sent to your AI agent.
Securing the Future of AI Infrastructure
AI is no longer a side tool — it is rapidly becoming a core operator in enterprise systems. From customer service to HR to IT, AI agents now hold the same keys once reserved for trusted employees. The difference is that while humans have intent, intuition, and accountability, AI agents simply follow instructions — whether good or malicious.
The lesson is clear:
1. Defense-in-depth is non-negotiable. Sandboxing, zero-trust enforcement, AI-driven scanning, and hardware-based protections must all be part of the baseline.
2. Insider threat models must evolve. What once focused solely on people must now include AI agents as digital insiders — capable of being exploited to act against the organization’s interests.
3. Security must go below the OS. True resilience requires protections that extend down to firmware and silicon, where DeepSAFE and other advanced approaches ensure trust at the foundation.
4. The next generation of breaches will not begin with stolen passwords or malicious insiders in the breakroom. They will begin with a crafted input, a hidden instruction, or a vulnerable AI agent executing commands at machine speed.
If we are giving AI the keys to our most sensitive systems, we must also design guardrails stronger than those we apply to humans. Only by combining human-centric insider threat awareness with code-centric, below-OS defenses can we build the trustworthy AI infrastructure the future demands.
The choice is ours: treat AI as just another user, or secure it as the most privileged insider we’ve ever created.
The Hybrid Solution: A Multi-Layered Approach to AI Security
As the Microsoft team rightly notes in their published ACM article, there is no single “fix” for the inherent risks of large language models — hallucinations, jailbreaks, and indirect prompt injections. These are not bugs, but consequences of how LLMs fundamentally work. The only viable path forward is a hybrid, multi-layered defense that combines improvements inside the model with robust protections at the system and infrastructure level.
1. Internal Mitigation: Enhancing the Model
Internal mitigation focuses on shaping the model’s core behavior and reducing its susceptibility to malicious inputs. This can include:
1. Model refinement: Ongoing training and alignment to better detect and resist adversarial prompts.
2. Separation of duties: Architectures that clearly segregate sensitive data handling from public-facing interactions — akin to network segmentation, but applied to AI logic.
3. Strict code/data governance: Policies to tightly control how the model handles instructions, internal data, and external artifacts, ensuring it operates within predefined safety boundaries.
2. System-Level Mitigation: Securing the Infrastructure
Equally vital is hardening the environment in which models operate. Here, my focus has been on what I call the true OS of AI — the combined compute fabric spanning CPUs, GPUs, APUs, and accelerators. To secure this fabric, we must introduce a new protective layer below the traditional operating system, capable of enforcing deep security guarantees:
1. Micro-virtualization: Encapsulating AI tasks within isolated micro-VMs, so that even if one process is compromised, it cannot spread laterally or access unauthorized data.
2. Below-OS security: Monitoring execution, memory, and data access at the hardware/firmware boundary to detect and block threats that bypass higher-level controls. This acts as a resilient last line of defense.
Achieving this vision will require collaboration across disciplines — hardware design, software engineering, and cybersecurity — but it is the kind of foundational work needed if we want AI infrastructure to be truly trustworthy.
In short, system design must evolve in parallel with model design. By combining defense-in-depth at the model level with below-OS protection at the infrastructure level, we can turn today’s inherent vulnerabilities into tomorrow’s resilient foundation.
https://medium.com/@ahmed.sallam/ai-agents-as-insiders-securing-the-next-generation-of-enterprise-ai-infrastructure-3938bc9146c9 a>