Making AI Agents Work in Europe’s Regulatory Reality

It started as a weekend project with a simple question: how can we make European AI agents become more aware of EU compliance regulations? Especially it’s practical implications (the real work) like writing production code, connecting to APIs, or handling customer data, without having any understanding of the environment it is operating in?

We are proud to release complisec, our open-source AI skill suite built for European organisations that tries to give AI agents the same kind of context a trained employee would have, and apply that context at the moment decisions are made, not afterwards in an audit or review.

Graphic displaying information about an open-source skill suite for AI agents focused on EU compliance, featuring buttons for users to identify as a human or agent, and instructions for downloading a skill file.

We built this at Eye Security, where we run managed detection and response for organisations across Europe, so we see how NIS2 and GDPR play out in real incidents and audits. This blog is not a breakdown of features, but a reflection on what happens when you embed compliance directly into how an AI system operates, and what we learned doing that.

What is a skill?

A skill is like a small add-on that teaches AI platforms or systems how to consistently do a specific task or follow a repeatable workflow. It has been introduced by Anthropic in 2025 and since then adopted by most AI platforms. There are several marketplaces of AI skills available to explore as skills become more popular by the day.

With a skill, rather than improvising from scratch each time, the AI follows a defined workflow or set of instructions supplied by the skill. It applies specific rules, and uses supporting guidance to operate more reliably in the situations the skill is designed for.

Installing a skill is rather easy. Using 1 specific sentence in any prompt, the skill installs itself within the AI system and only activates when needed, in the background. It’s a powerful mechanism used by most popular AI platforms like Microsoft Copilot (used by many organisations) but also platforms like ChatGPT, Anthropic Claude Code and Enterprise AI platforms like Glean and LangDock.

A skill basically “teaches” special knowledge and tools to your AI in the background, without users noticing.

What complisec is

Complisec is a AI skill built by Eye Security to teach your AI agents and systems how real EU compliance is done. At its core, complisec is based on a simple idea: an AI agent should have access to the same EU NIS2/GDPR compliance context a trained employee would have, and use that context while making decisions.

Why didn’t we build a pure cybersecurity skill? There are already strong security-focused tools and skills available, and many of those address problems that are largely the same across organisations globally.

We took a different approach. As a European cybersecurity company, we asked ourselves how we could help organisations here adopt AI in a way that fits their regulatory and operational reality, using the expertise we already have in-house. Instead of rebuilding what already exists, we focused on the compliance and context layer. Where relevant, we reference and build on existing security tools and skills within the suite, so they can be used together to support compliant AI workflows.

Our EU compliance skill triggers automatically on specific events that require extra attention for security, compliance and governance tasks. For example, when the user requests instructions how to migrate data:

In our skill, instead of relying on generic rules, we first define that context per organisation in a small profile. This profile captures things like which systems are critical, where data is allowed to live, what the risk appetite is, which suppliers are approved, and what legal obligations apply. It is not meant to be exhaustive, but it is enough to ground the agent in the reality of a specific environment.

This is what it looks like for an user installing complisec:

Here is what that looks like in code after the initial setup:

# complisec EU NIS2 org profile example
{
  "org": "Example BV | NL | SaaS | 200 emp",
  "critical_assets": [
    ["Production platform", "sys", [4,5,5], "CTO"],
    ["Customer database", "data", [5,5,4], "Head of Engineering"]
  ],
  "data_residency": "EU only",
  "risk_appetite": {"c": "low", "i": "low", "a": "medium"},
  "suppliers": [
    ["AWS", "EU", true, true],
    ["GitHub", "EU/US", true, true]
  ],
  "legal": ["GDPR controller", "NIS2 preparation"]
}

This is about 25 lines and can sit directly in a central system prompt or project configuration. From that point on, the agent generates code, interacts with data, or introduces a new dependency, it can relate that action back to the organisation it is working for.

Around that profile, we built a set of specialized EU NIS2/GDPR compliance skills that activate when relevant. Not to document compliance, but to enforce it in the places where things usually go wrong. That includes areas like incident handling, vendor risk, changes to critical systems, and audit logging on generated code.

Grid of service features including NIS2 gap analysis, data sensitivity, audit logging, incidents, vendor risk, risk assessments, change management, compliance hub, and EU directives with demo options.

The goal is not to model every regulation in detail. The goal is to make sure that when an agent does something that would matter in an audit or an incident, it already has enough context to make a reasonable decision, or to stop and surface the risk instead of continuing blindly.

Extracting CISO knowledge into an agentic skill

The most interesting part of building this was not writing the code. It was figuring out what the agent actually needs to know to behave in a way that makes sense in a real organisation.

Take NIS2 as an example. The directive does not tell you what to do step by step. It describes outcomes and leaves the interpretation to the organisation. If you read the text on its own, you get a high-level understanding, but not a practical one. It does not tell you how companies actually fail, or how to recognise that failure when you see it.

That knowledge sits with people who do these assessments every week.

So instead of starting from the regulation, we started from our own practice. We worked with our CISO team and senior consultants and went through the material they use in the field: internal assessment toolkits, webinar session transcripts, notes from client engagements, that is where the useful detail is!

For example, a security policy that exists but has not been read or updated in years will often look fine on paper. In practice, it tells you very little about how the organisation actually operates. The same goes for awareness programmes. You can have training records and completion rates, but if you ask someone what they would do with a suspicious email and they hesitate, you already know enough.

One example we use internally is what we call the 3AM test. If systems are encrypted in the middle of the night, who calls whom first? If that answer is not immediately clear, incident response is not as mature as it might look in documentation.

A man in a suit gestures with hands, looking serious and then confused, with text overlay about an incident response plan.

None of these signals come directly from the directive. They come from seeing the same patterns across many organisations, and learning how to interpret them over time. This is also where we think the difference is when building something like this. Large language models already know the public sources, the frameworks, and the control lists. What they do not know is how those frameworks behave in practice, inside real environments, under pressure.

That is the part we tried to capture. Not just the questions, but the interpretation behind them, so that when an agent encounters a situation that looks acceptable on paper but is weak in practice, it can at least surface that gap instead of assuming everything is fine.

Skill description matters more than you expect

Developing skills is new for us, it’s our first. In the process we learned a lot and want to share, to help anyone out experimenting with their own AI skill ideas as it’s a fascinating concept in the agentic era.

One thing that surprised us is how AI platforms decide when to use a skill. They do not read the full skill content on every prompt. They only look at the description and try to match it to the user’s prompt. If that match fails, the skill simply does not run.

We analyzed the most popular skills in some public marketplaces and red Anthropic’s guide to develop skills. We ended up with a skill description structure like this (audit-logging subskill example):

ACTIVATE on ANY request that involves writing, generating, reviewing, modifying, or outputting source code in any programming language — Python, JavaScript, TypeScript, Go, Java, Rust, C#, SQL, Terraform, or any other. This includes functions, endpoints, scripts, migrations, infrastructure-as-code, config files with logic, and code snippets in responses. Every piece of code the LLM produces must include structured audit logging for security-relevant operations. Also activate when the user asks about audit logs, compliance logging, or traceability. Ensures NIS2 and ISO 27001 compliant logging (structured, no string interpolation, no secrets in logs).

We initially approached this too abstractly. Descriptions like “authentication” or “data access” made sense to us, but they did not trigger reliably. What worked better was using the exact kind of language people use in prompts, like “write code”, “create API”, or specific technologies.

In practice, this means you are not writing documentation, you are writing something closer to search queries. The closer your description is to how users actually phrase requests, the more predictable the behaviour becomes.

It is a small detail, but it has a big impact on whether the system works consistently or not.

Performance and cost

One practical challenge we ran into early is that context is expensive. If you load too much into every interaction, even simple prompts become slow and costly. Our first version did exactly that. The entire skill set was included every time, regardless of whether it was relevant. It worked, but it did not scale beyond small experiments.

We changed this by separating what needs to be always present from what can be loaded on demand. The org profile stays in context because it is small and stable, while the rest of the skills activate only when the situation requires it, for example when code is generated or when data handling is involved.

This works well with how modern models handle prompt caching. Repeated parts of the prompt, like the org profile, are cached after the first request. In practice that means the core compliance context is almost free to keep around, while the heavier logic only appears when needed.

The result is that you can keep the agent grounded in organisation-specific constraints without turning every interaction into a large, expensive prompt. That makes it realistic to apply this approach consistently, not only in edge cases.

The emerging stack around agent security and compliance

One of the subskills of complisec includes quality references to open-source skills, tools and databases the agent can use to improve its security and compliance. For example:

We embedded projects like EU_compliance_MCP that expose EU directives and national implementations through an interface agents can query directly. Instead of hardcoding regulatory knowledge, you can pull from a structured source that stays closer to the original texts and mappings.
Another tools we link to is Cisco’s DefenseClaw, with focus on what happens while an agent is operating, inspecting tool calls, enforcing boundaries, and adding guardrails at execution time.
Projects like baz-scm/secure-coding bring OWASP-style checks into agent workflows, so generated code is not only functional but also aligned with secure coding practices.

What we are focusing on with complisec is a different layer: not the regulation database, and not only runtime enforcement, but the organisation-specific context in which decisions are made. Which systems matter, which suppliers are approved, what risk is acceptable, and when something should stop instead of continue.

Agents will need the same controls as people

Today, most organisations are still experimenting with AI. Maybe a first agent that helps with coding, or something internal for support or reporting. It is early, and in most cases still contained.

But that is changing quickly. Over the next few years, agents will move from experiments to something more standard in day-to-day operations. They will start interacting more directly with systems, data, and processes that actually matter. At that point, they look less like tools and more like identities. And just like human identities, they need boundaries, context, and protection.

That is where we see a gap today. Most frontier models and their guardrails are developed outside of Europe, and they are not designed with specific regulatory environments like GDPR or NIS2 in mind. They are general-purpose, and that works up to a point, but it does not cover the kind of organisation-specific constraints European companies operate under.

Complisec is our first attempt to address that gap. It is not a complete solution, but it is a practical way to give agents some awareness of the environment they operate in, so they are less likely to make obvious compliance or security mistakes while doing useful work.

Europe

Most frontier AI models are still developed outside of Europe, and they are built to be generally useful, not to reflect specific regulatory environments like GDPR or NIS2. For European organisations, that creates a gap. Not because the models are bad, but because they are not designed with local constraints in mind.

That is also why we actively support and work with European players in this space. Companies like Mistral, LangDock, and others are building AI systems with data sovereignty and regulatory context as a starting point, not an afterthought. If we want AI to be usable in regulated environments, especially in Europe, that context needs to be built into the ecosystem itself.

Flag of the European Union featuring a circle of yellow stars on a blue background.

Closing

complisec is open source. You can download it, upload it to your AI tool of choice, and run: /complisec setup . Five minutes later, you have a compliance-aware AI agent.

GitHub: https://github.com/eyesecurity/complisec-skill
Official landing page: https://skills.eye.security/eu-compliance/

About Eye Security

We are a European cybersecurity company focused on 24/7 threat monitoring, incident response, and cyber insurance. Our research team performs proactive scans and threat intelligence operations across the region to defend our customers and their supply chains.

We published research before on AI security: learn about log poisoning in Open Claw and learn how our customers battle prompt injection.

Learn more about Eye at https://eye.security/ and follow us on LinkedIn to help us spread the word.

Making AI Agents Work in Europe’s Regulatory Reality

What is a skill?

What complisec is

Extracting CISO knowledge into an agentic skill

Skill description matters more than you expect

Performance and cost

The emerging stack around agent security and compliance

Agents will need the same controls as people

Europe

Closing

About Eye Security

Related articles.

CVE-2026-30612: A vulnerability in Time4Popcorn (PopcornTime)

Device Code Phishing Forensics: What We Learned Investigating BEC in the Wild

Breaking encryption schemes the lazy way

Blocking Shadow AI: How to Prevent Data Leakage from LLMs