Guides
What are the security risks of AI agent skills?
AI agent skills make agent workflows reusable. They can also become a supply-chain risk.
A skill is usually more than a prompt. In the current ecosystem, skills can include a SKILL.md file, metadata, instructions, scripts, dependencies, references, and assets. Claude skills are organized folders with instructions, scripts, and resources. Codex skills follow the same general pattern: a required SKILL.md file plus optional bundled resources, loaded when the agent needs that capability.
That format gives an agent repeatable context and workflow logic. It also means a skill can influence what the agent reads, writes, executes, downloads, and trusts. Review third-party skills with the same care you apply to software packages.
Why skills create a new security surface
AI agent skills sit between natural-language instructions and executable automation. That makes their risk profile different from a normal documentation snippet.
A package manager usually installs code. A browser extension usually gets browser permissions. A CI action usually runs inside a workflow. A skill can combine all three patterns: instructions for the agent, files for context, scripts for execution, and tool access through the agent runtime.
Snyk's ToxicSkills research gives a public baseline. Snyk analyzed 3,984 skills from ClawHub and skills.sh and reported 76 confirmed malicious payloads, 534 skills with at least one critical issue, and 1,467 skills with at least one security flaw. See Snyk's ToxicSkills research and Snyk Skill Inspector.
A skill can solve the task and still be unsafe to install. Security review belongs in the workflow before the skill reaches an agent with access to files, credentials, tools, or production systems.
The main risks with AI agent skills
The most common risks fall into a few practical categories.
| Risk | What it looks like | Impact | |
|---|---|---|---|
| Malicious payloads | A skill ships a stealer, backdoor, reverse shell, miner, destructive command, or staged downloader | The agent may run or recommend commands that compromise the machine or workspace | |
| Prompt injection inside the skill | SKILL.md tells the agent to ignore rules, hide behavior, change identity, or extract secrets |
The skill can manipulate the agent through the same instruction channel that gives skills their effect | |
| Indirect prompt injection | The skill fetches web pages, docs, tickets, repos, or API responses that contain hostile instructions | The agent may treat untrusted content as operational guidance | |
| Data exfiltration | The skill reads .env files, SSH keys, browser profiles, cookies, wallet files, API tokens, or memory files |
Agent environments often have valuable local context and credentials | |
| Remote code execution | The skill uses `curl | bash, eval`, dynamic imports, install hooks, or mutable scripts from the web |
A harmless-looking skill can turn into arbitrary execution |
| Brand impersonation | A fake skill claims to come from OpenAI, Anthropic, GitHub, Vercel, Stripe, or another vendor | Builders may trust the name before verifying the publisher | |
| Over-permissioned behavior | The skill asks for broad filesystem, shell, browser, MCP, cloud, repo, or wallet access | Broad access raises the impact of every mistake or malicious instruction | |
| Update drift | The skill is installed from a branch, mutable URL, or marketplace package that changes later | A safe first install can become a risky update | |
| Memory and policy poisoning | The skill edits AGENTS.md, CLAUDE.md, shell profiles, Git hooks, or agent memory files |
The skill can change future agent behavior after the original task ends | |
| Tool abuse | The skill routes the agent toward MCP tools, browser automation, messaging, PR creation, deployment, payments, or signing flows | The agent can move from local work into external systems |
Skills that combine private data access, network access, and command execution need deeper review. That combination gives the skill enough reach to read sensitive data, transmit it, and change the local environment.
Official skills reduce publisher-identity risk
Official skills reduce publisher-identity risk. A skill from a vendor-owned repository gives you a clearer source to verify than a marketplace listing with an unknown publisher.
That trust still needs evidence. Check the source, version, permissions, dependencies, scripts, and update path. Official ownership tells you who published the skill. Runtime access still needs its own review.
Good official-skill signals include:
| Check | What to verify |
|---|---|
| Source ownership | The repo belongs to the vendor's official GitHub organization |
| Cross-linking | Vendor docs link to the repo, and the repo links back to vendor docs |
| Marketplace listing | The marketplace points to the same repo and publisher |
| Versioning | The install command pins a tag, release, commit, content hash, or signed artifact |
| Security metadata | The listing includes a scan result, Skill Card, permission manifest, or signature |
| Runtime scope | The requested permissions match the declared task |
| Update flow | Updates pass through CI, scanner gates, and release review |
For discovery, start with vendor docs and official directories:
- OpenAI Codex skills documentation
- Anthropic skills repository
- NVIDIA skills repository
- GitHub Copilot agent skills documentation
- Skillscout Official skills directory
- skills.sh Official directory
Skillscout belongs in the same discovery step as vendor docs and official directories. While you browse a product, docs site, or SaaS tool, the extension can surface relevant skills and prioritize official sources. The Skillscout Chrome extension and Skillscout Official directory help when the official source is spread across vendor repos, marketplaces, and ecosystem catalogs.
How to prevent skill security risks
Use several small checks before installation: source verification, scanner results, runtime containment, and update review. A scanner badge is one signal in that chain.
1. Verify the source before reading the install command
Start with the publisher. Treat the install command as untrusted until the source is verified.
Check the GitHub organization, vendor docs, marketplace publisher, release history, and issue activity. For official skills, verify that the vendor's documentation and repository point to the same artifact. For community skills, look for clear ownership, review history, and transparent source code.
2. Pin the version
Install from a tag, release, commit, content hash, or signed artifact. Branch names and mutable URLs make review harder because the installed content can change after approval.
NVIDIA's trust pipeline shows the components of a reviewed release: scan the skill, publish a Skill Card, sign the artifact, and verify the signature before install. See NVIDIA's Agent Skill Trust Pipeline, SkillSpector, and NVIDIA skill signing.
3. Read the skill like code
Open SKILL.md. Then inspect scripts, dependency manifests, referenced files, install hooks, remote downloads, and generated configuration.
Look for instructions that ask the agent to hide behavior, override policies, reveal secrets, access sensitive paths, install packages, download scripts, modify memory files, or run broad shell commands. Also compare the description with the behavior. A skill called "blog helper" should stay limited to drafts, outlines, and content files.
4. Scan with more than one layer
Use at least one skill-aware scanner, one supply-chain scanner, and one secrets scanner.
Current ecosystem options include:
- Snyk Agent Scan and Snyk Skill Inspector for skills, MCP servers, prompt injection, secrets, malware payloads, and insecure configs
- NVIDIA SkillSpector for skill-specific static and semantic review
- Socket for supply-chain behavior and package risk signals
- Skill Vetter for local multi-scanner review before installing a skill into an agent environment
- Gitleaks or TruffleHog for secrets
- Semgrep or Bandit for scripts
- Snyk SCA or OSV for dependencies
A practical stack for a small AI builder is: Snyk Agent Scan or NVIDIA SkillSpector, Skill Vetter for a local review pass, Gitleaks or TruffleHog for secrets, Semgrep or Bandit for scripts, and Snyk SCA or OSV for dependencies.
5. Install into a constrained environment first
Use a canary workspace before the real workspace. Give the skill fake secrets, fake credentials, and disposable files. Turn on logs. Watch file writes, shell commands, network calls, MCP tool calls, browser automation, and generated outputs.
For local development, prefer workspace-only access, denied sensitive paths, temporary credentials, and an egress allowlist. For production workflows, use an isolated runner and explicit human approvals.
6. Require approval for external or irreversible actions
Skills that send messages, open PRs, deploy, delete files, trade, purchase, bridge assets, use wallets, touch customer data, or change production systems need approval gates.
Treat wallet files, seed phrases, private keys, deployer keys, exchange API keys, CI/CD credentials, and production signing flows as high-risk by default.
Payments add another security layer because an agent can move funds. Agent-payment infrastructure is already taking shape: x402 uses HTTP 402 to let clients and agents pay for access with stablecoins; ERC-8004 defines a trustless framework for agent discovery, identity, and reputation; and Mastercard Agent Pay is a payment program for agentic commerce.
Skills that touch these flows need explicit limits: maximum spend, approved merchants or contracts, allowed assets, allowed chains or payment rails, human confirmation for transfers, separate signing keys, and logs for every payment attempt. A skill can prepare a payment request. Moving funds needs a separate approval boundary.
7. Re-scan updates
Review every update like a new install. Re-scan on version change, permission change, dependency change, new script, new external download, new MCP tool, or new runtime permission.
Teams should keep an inventory with owner, source, version, install location, approval date, and next review date.
A practical workflow for AI builders
Use this workflow before installing a third-party skill:
- Discover the skill through vendor docs, Skillscout, skills.sh Official, or a vetted marketplace.
- Verify the publisher through the official GitHub org, vendor docs, and marketplace listing.
- Pin the artifact to a tag, release, commit, hash, or signature.
- Read
SKILL.md, scripts, references, assets, dependencies, install hooks, and generated config. - Document permissions across filesystem, shell, network, browser, MCP, cloud, wallet, messaging, and production access.
- Run multiple scanners: skill scanner, supply-chain scanner, secrets scanner, dependency scanner, and SAST where scripts exist.
- Install in a canary workspace with fake secrets and logs enabled.
- Move the skill into the main workspace after it behaves as expected.
- Monitor activation, commands, file diffs, network calls, MCP calls, approvals, and outputs.
- Revalidate the skill on every update.
For teams, add two more steps: assign an owner and document a revocation path. Someone should know who approved the skill, why it exists, where it is installed, and how to remove it quickly.
How to build safer skills yourself
The same rules apply when you publish your own skills.
Use a narrow scope. Declare inputs, outputs, allowed paths, allowed domains, required tools, and required permissions. Use scripts for deterministic execution, and keep the rest in reviewed instructions. Pin dependencies. Bundle reviewed references. Add test cases for normal use, malformed input, prompt injection, poisoned remote content, and empty inputs.
For shared or official skills, publish a Skill Card. It should include owner, purpose, supported agents, tools, permissions, limitations, test results, scan results, and version. Sign releases where possible and publish from the canonical organization.
This makes skills easier to trust, easier to review, and easier to revoke.
Security starts before installation. A good first question is: "Is there an official skill for this tool?" After discovery, verification, scanning, sandboxing, and monitoring decide how the skill enters the workflow.
