← All projects

2026 · Autonomous AI security orchestrator

Rudra

An autonomous multi-agent offensive security platform for creative, non-templated vulnerability exploitation.

Parked 2026

Why I built this

Tools like Metasploit work from rigid, templated modules — you pick an exploit and fire it. I wanted a system that reasons about a specific target surface, writes custom exploit code for that surface, tests it in a sandbox, and iterates on failures the way a skilled human would. The interesting engineering problem was keeping that autonomy safe: scope enforcement that cannot be overridden by an LLM, sandboxed execution with network isolation, and a full audit trail.

4 Agent types designed
2 / 4 Agents fully implemented
22 GB Total cluster VRAM

Current status

Frontier LLMs (Claude, GPT-4) have tightened restrictions on offensive security tasks. Smaller local models via Ollama lack the reasoning depth needed for the exploit loop. Two agents are fully implemented. Parked until the model landscape shifts.

Architecture

Client → Orchestrator

CLI (rakshak) or FastAPI accepts target hostname/IP and scope definition. Scope validated at input — RFC1918, loopback, and link-local addresses rejected before any agent is spawned. Pre-scan health check verifies Ray cluster, Ollama, Docker, Redis, Kafka, Cassandra, and Qdrant.

Orchestrator (FastAPI + Ray actor)

Manages agent lifecycle, concurrency budget (Redis counter), and heartbeat monitoring (120s timeout). Scope enforcement is hard-coded Python — never delegated to an LLM instruction.

Shared Bus

Kafka for event routing between agents. Redis for concurrency budget and live state. Cassandra for persistent findings and attempt history. Qdrant for semantic CVE search and partial-win similarity matching.

Recon Agent ✅ Complete

Fingerprints open ports, services, versions, frameworks, auth mechanisms, and API endpoints. Writes surface map to Redis and publishes to Kafka rudra.recon.discovered. Results flow into Cassandra target_intelligence.

Analyst Agent ✅ Complete

Consumes surface map, queries Qdrant CVE knowledge base, fetches CVSS scores from NVD API. Scores CVEs by confidence. CVSS scores are always fetched from NVD — never estimated by LLM.

Exploit Agent — Parked

5-retry LLM loop: reason → write Python exploit → execute in sandbox → interpret result → iterate with failure context. Budget extended if progress_score > 0.7 on attempt 4. AST-based code validator runs before every execution.

Sandbox

Ephemeral Docker container per run, destroyed after 300s or on completion. iptables rules on Linux (M1) whitelist only target IPs. Windows machines (M2/M3) route through a tinyproxy traffic cop. Any scope breach kills the container immediately.

Tech stack

Technologies used

core

Python 3.11FastAPIRay (distributed agents)LiteLLM + OllamaLangGraph

infra

Kafka (event bus)Redis (state)Cassandra (findings)Qdrant (semantic search)Docker (sandbox)

tools

impacketscapypwntoolsparamikoNVD API (CVSS)ruffmypypytest

Key highlights

Proof points

  1. 01

    Recon Agent fully implemented: fingerprints ports, services, versions, auth mechanisms, and API endpoints.

  2. 02

    Analyst Agent fully implemented: maps findings to CVEs via Qdrant semantic search with CVSS scores from NVD — never LLM-estimated.

  3. 03

    Scope enforcement is hard-coded Python — RFC1918 and loopback always blocked regardless of target configuration.

  4. 04

    AST-based code validator checks every generated exploit for syntax, import whitelist, and blocked patterns before sandbox execution.

  5. 05

    3-machine Ray cluster provides 22 GB total VRAM (RTX 3080 Ti + A1000 + T600) for distributed agent workloads.

Focus areas

AI securitySandbox designDistributed systemsEvent-driven architectureAPI integration

Explore the work