Guides

What are the security risks of AI agent skills?

AI agent skills make agent workflows reusable. They can also become a supply-chain risk.

Updated May 30, 2026

Illustration of a hooded AI agent holding a money bag, representing agent skills security risks.

A skill is usually more than a prompt. In the current ecosystem, skills can include a SKILL.md file, metadata, instructions, scripts, dependencies, references, and assets. Claude skills are organized folders with instructions, scripts, and resources. Codex skills follow the same general pattern: a required SKILL.md file plus optional bundled resources, loaded when the agent needs that capability.

That format gives an agent repeatable context and workflow logic. It also means a skill can influence what the agent reads, writes, executes, downloads, and trusts. Review third-party skills with the same care you apply to software packages.

Why skills create a new security surface

AI agent skills sit between natural-language instructions and executable automation. That makes their risk profile different from a normal documentation snippet.

A package manager usually installs code. A browser extension usually gets browser permissions. A CI action usually runs inside a workflow. A skill can combine all three patterns: instructions for the agent, files for context, scripts for execution, and tool access through the agent runtime.

Snyk's ToxicSkills research gives a public baseline. Snyk analyzed 3,984 skills from ClawHub and skills.sh and reported 76 confirmed malicious payloads, 534 skills with at least one critical issue, and 1,467 skills with at least one security flaw. See Snyk's ToxicSkills research and Snyk Skill Inspector.

A skill can solve the task and still be unsafe to install. Security review belongs in the workflow before the skill reaches an agent with access to files, credentials, tools, or production systems.

The main risks with AI agent skills

The most common risks fall into a few practical categories.

Risk	What it looks like	Impact
Malicious payloads	A skill ships a stealer, backdoor, reverse shell, miner, destructive command, or staged downloader	The agent may run or recommend commands that compromise the machine or workspace
Prompt injection inside the skill	`SKILL.md` tells the agent to ignore rules, hide behavior, change identity, or extract secrets	The skill can manipulate the agent through the same instruction channel that gives skills their effect
Indirect prompt injection	The skill fetches web pages, docs, tickets, repos, or API responses that contain hostile instructions	The agent may treat untrusted content as operational guidance
Data exfiltration	The skill reads `.env` files, SSH keys, browser profiles, cookies, wallet files, API tokens, or memory files	Agent environments often have valuable local context and credentials
Remote code execution	The skill uses `curl	bash`,` eval`, dynamic imports, install hooks, or mutable scripts from the web	A harmless-looking skill can turn into arbitrary execution
Brand impersonation	A fake skill claims to come from OpenAI, Anthropic, GitHub, Vercel, Stripe, or another vendor	Builders may trust the name before verifying the publisher
Over-permissioned behavior	The skill asks for broad filesystem, shell, browser, MCP, cloud, repo, or wallet access	Broad access raises the impact of every mistake or malicious instruction
Update drift	The skill is installed from a branch, mutable URL, or marketplace package that changes later	A safe first install can become a risky update
Memory and policy poisoning	The skill edits `AGENTS.md`, `CLAUDE.md`, shell profiles, Git hooks, or agent memory files	The skill can change future agent behavior after the original task ends
Tool abuse	The skill routes the agent toward MCP tools, browser automation, messaging, PR creation, deployment, payments, or signing flows	The agent can move from local work into external systems

Skills that combine private data access, network access, and command execution need deeper review. That combination gives the skill enough reach to read sensitive data, transmit it, and change the local environment.

Official skills reduce publisher-identity risk

Official skills reduce publisher-identity risk. A skill from a vendor-owned repository gives you a clearer source to verify than a marketplace listing with an unknown publisher.

That trust still needs evidence. Check the source, version, permissions, dependencies, scripts, and update path. Official ownership tells you who published the skill. Runtime access still needs its own review.

Good official-skill signals include:

Check	What to verify
Source ownership	The repo belongs to the vendor's official GitHub organization
Cross-linking	Vendor docs link to the repo, and the repo links back to vendor docs
Marketplace listing	The marketplace points to the same repo and publisher
Versioning	The install command pins a tag, release, commit, content hash, or signed artifact
Security metadata	The listing includes a scan result, Skill Card, permission manifest, or signature
Runtime scope	The requested permissions match the declared task
Update flow	Updates pass through CI, scanner gates, and release review

For discovery, start with vendor docs and official directories:

Skillscout belongs in the same discovery step as vendor docs and official directories. While you browse a product, docs site, or SaaS tool, the extension can surface relevant skills and prioritize official sources. The Skillscout Chrome extension and Skillscout Official directory help when the official source is spread across vendor repos, marketplaces, and ecosystem catalogs.

How to prevent skill security risks

Use several small checks before installation: source verification, scanner results, runtime containment, and update review. A scanner badge is one signal in that chain.

1. Verify the source before reading the install command

Start with the publisher. Treat the install command as untrusted until the source is verified.

Check the GitHub organization, vendor docs, marketplace publisher, release history, and issue activity. For official skills, verify that the vendor's documentation and repository point to the same artifact. For community skills, look for clear ownership, review history, and transparent source code.

2. Pin the version

Install from a tag, release, commit, content hash, or signed artifact. Branch names and mutable URLs make review harder because the installed content can change after approval.

NVIDIA's trust pipeline shows the components of a reviewed release: scan the skill, publish a Skill Card, sign the artifact, and verify the signature before install. See NVIDIA's Agent Skill Trust Pipeline, SkillSpector, and NVIDIA skill signing.

3. Read the skill like code

Open SKILL.md. Then inspect scripts, dependency manifests, referenced files, install hooks, remote downloads, and generated configuration.

Look for instructions that ask the agent to hide behavior, override policies, reveal secrets, access sensitive paths, install packages, download scripts, modify memory files, or run broad shell commands. Also compare the description with the behavior. A skill called "blog helper" should stay limited to drafts, outlines, and content files.

4. Scan with more than one layer

Use at least one skill-aware scanner, one supply-chain scanner, and one secrets scanner.

Current ecosystem options include:

Snyk Agent Scan and Snyk Skill Inspector for skills, MCP servers, prompt injection, secrets, malware payloads, and insecure configs
NVIDIA SkillSpector for skill-specific static and semantic review
Socket for supply-chain behavior and package risk signals
Skill Vetter for local multi-scanner review before installing a skill into an agent environment
Gitleaks or TruffleHog for secrets
Semgrep or Bandit for scripts
Snyk SCA or OSV for dependencies

A practical stack for a small AI builder is: Snyk Agent Scan or NVIDIA SkillSpector, Skill Vetter for a local review pass, Gitleaks or TruffleHog for secrets, Semgrep or Bandit for scripts, and Snyk SCA or OSV for dependencies.

5. Install into a constrained environment first

Use a canary workspace before the real workspace. Give the skill fake secrets, fake credentials, and disposable files. Turn on logs. Watch file writes, shell commands, network calls, MCP tool calls, browser automation, and generated outputs.

For local development, prefer workspace-only access, denied sensitive paths, temporary credentials, and an egress allowlist. For production workflows, use an isolated runner and explicit human approvals.

6. Require approval for external or irreversible actions

Skills that send messages, open PRs, deploy, delete files, trade, purchase, bridge assets, use wallets, touch customer data, or change production systems need approval gates.

Treat wallet files, seed phrases, private keys, deployer keys, exchange API keys, CI/CD credentials, and production signing flows as high-risk by default.

Payments add another security layer because an agent can move funds. Agent-payment infrastructure is already taking shape: x402 uses HTTP 402 to let clients and agents pay for access with stablecoins; ERC-8004 defines a trustless framework for agent discovery, identity, and reputation; and Mastercard Agent Pay is a payment program for agentic commerce.

Skills that touch these flows need explicit limits: maximum spend, approved merchants or contracts, allowed assets, allowed chains or payment rails, human confirmation for transfers, separate signing keys, and logs for every payment attempt. A skill can prepare a payment request. Moving funds needs a separate approval boundary.

7. Re-scan updates

Review every update like a new install. Re-scan on version change, permission change, dependency change, new script, new external download, new MCP tool, or new runtime permission.

Teams should keep an inventory with owner, source, version, install location, approval date, and next review date.

A practical workflow for AI builders

Use this workflow before installing a third-party skill:

Discover the skill through vendor docs, Skillscout, skills.sh Official, or a vetted marketplace.
Verify the publisher through the official GitHub org, vendor docs, and marketplace listing.
Pin the artifact to a tag, release, commit, hash, or signature.
Read SKILL.md, scripts, references, assets, dependencies, install hooks, and generated config.
Document permissions across filesystem, shell, network, browser, MCP, cloud, wallet, messaging, and production access.
Run multiple scanners: skill scanner, supply-chain scanner, secrets scanner, dependency scanner, and SAST where scripts exist.
Install in a canary workspace with fake secrets and logs enabled.
Move the skill into the main workspace after it behaves as expected.
Monitor activation, commands, file diffs, network calls, MCP calls, approvals, and outputs.
Revalidate the skill on every update.

For teams, add two more steps: assign an owner and document a revocation path. Someone should know who approved the skill, why it exists, where it is installed, and how to remove it quickly.

How to build safer skills yourself

The same rules apply when you publish your own skills.

Use a narrow scope. Declare inputs, outputs, allowed paths, allowed domains, required tools, and required permissions. Use scripts for deterministic execution, and keep the rest in reviewed instructions. Pin dependencies. Bundle reviewed references. Add test cases for normal use, malformed input, prompt injection, poisoned remote content, and empty inputs.

For shared or official skills, publish a Skill Card. It should include owner, purpose, supported agents, tools, permissions, limitations, test results, scan results, and version. Sign releases where possible and publish from the canonical organization.

This makes skills easier to trust, easier to review, and easier to revoke.

Security starts before installation. A good first question is: "Is there an official skill for this tool?" After discovery, verification, scanning, sandboxing, and monitoring decide how the skill enters the workflow.