NACC: Proving That AI Can Orchestrate Real Infrastructure at Scale

Community Article Published November 30, 2025

How I built a multi-node agentic system that turned cloud platform constraints into innovationβ€”and what this means for the future of infrastructure


πŸš€ The Problem Nobody's Solving

Every day, engineers at scale face the same nightmare:

  • Developers managing 50+ microservices across multiple clouds
  • DevOps teams SSH-ing between servers, manually running commands
  • Security teams manually coordinating vulnerability scans across environments
  • MLOps engineers deploying models to 100+ edge devices via scripts
  • Result: Bottleneck, errors, and massive operational overhead

We've automated everything EXCEPT orchestrating and reasoning about multiple machines at once.

That's the problem NACC solves: Making AI agents the orchestration layer for distributed infrastructure.


πŸ’‘ The Core Innovation: AI as Infrastructure Brain

Instead of building another monitoring dashboard or automation script, NACC asks: What if you could just talk to all your infrastructure in plain English?

User: "Check all prod servers for the Xz vulnerability and generate a report"

NACC:
1. Identifies 47 production nodes
2. Plans vulnerability scan workflow
3. Coordinates execution across all nodes
4. Aggregates results
5. Generates actionable report

All autonomously. Using natural language.

This isn't a dashboard. This isn't a script. This is AI as your infrastructure co-pilot.


πŸ—οΈ Why This Matters (Business Perspective)

The TAM (Total Addressable Market)

  • DevOps/Infrastructure: $50B+ market
  • Enterprise Automation: $200B+ market
  • Cloud Management: Growing 25%+ YoY
  • Security Operations: $60B+ market

NACC's Opportunity: Become the conversational interface for ALL infrastructure management.

Instead of Terraform (code), Ansible (YAML), or Kubernetes (manifests)β€”just talk to your infrastructure.

Competitive Advantages

Approach Current State With NACC
DevOps Tasks Manual SSH or scripts Natural language commands
Deployment Coordination Scheduled jobs + manual oversight AI-driven autonomous orchestration
Incident Response Page on-call engineers AI automatically diagnostics + fixes
Multi-Cloud Management Separate tools per cloud Unified agentic interface

πŸ› οΈ What I Built (Technical Excellence)

The Two-Space Architecture

Faced with the constraint that HuggingFace Spaces run in isolated containers, I pioneered a breakthrough: Two spaces communicating via HTTP-based MCP protocol.

  • Main Space: Orchestrator + AI brain (using Blaxel for <25ms LLM inference)
  • VM Space: Simulated production node with command execution + file management
  • Communication: Custom JSON-RPC implementation over HTTPS

Result: Fully functional distributed system demo on a free platform.

Custom MCP Implementation

Built the entire MCP stack from scratch:

class MCPServer:
    def register_tool(self, name, schema, handler):
        """Register MCP-compliant tools"""
        self.tools[name] = {
            "name": name,
            "description": schema["description"],
            "inputSchema": schema["parameters"],
            "handler": handler
        }
    
    async def handle_call_tool(self, name, arguments):
        """Execute tools via MCP protocol"""
        return await self.tools[name]["handler"](**arguments)

Why from scratch?: Security, performance, and to deeply understand the protocol. No black boxes on critical infrastructure.

Cross-Space Node Management

The innovation: Abstract multiple HF Spaces as "nodes" in a distributed system.

class NodeManager:
    nodes = {
        "hf-space-local": LocalNode(),
        "vm-node-01": RemoteNode("https://huggingface.co/spaces/.../NACC-VM")
    }
    
    async def route_command(self, node_id, command):
        """Seamlessly execute commands on any node"""
        node = self.nodes[node_id]
        return await node.execute(command)

What this proves: You can build real distributed system demos without expensive infrastructure.


🎯 Real-World Applications

For Enterprises

  • Multi-cloud deployment orchestration: "Deploy frontend to AWS, backend to GCP, run tests"
  • Security automation: "Scan all prod systems, compile vulnerability report"
  • Incident response: "Find root cause, collect logs, trigger alerts"

For DevOps Teams

  • Kubernetes orchestration: Manage clusters through conversation
  • Database migrations: Coordinate across multiple database instances
  • Infrastructure provisioning: "Set up 3-tier app on AWS, configure auto-scaling"

For AI/ML Engineers

  • Model deployment: "Deploy v2.0 to 50 edge devices, rollback if accuracy < 90%"
  • Distributed training: Orchestrate multi-node training jobs
  • A/B testing: "Run experiment on 25% of prod, monitor metrics"

πŸ“Š The Technical Challenges (What I Learned)

Challenge 1: State Management Across Boundaries

Problem: Each HF Space is stateless. How do you maintain context when commands span multiple spaces?

Solution: Session-based state persistence with context injection

class SessionManager:
    sessions = {}
    
    def get_context(self, session_id):
        return {
            "current_node": "vm-node-01",  # Remember which node user is on
            "current_path": "/app/src",    # Remember directory
            "last_command": "ls",           # Remember history
        }

Impact: Users can seamlessly switch between nodesβ€”the system remembers context.

Challenge 2: Security Without Sacrificing Functionality

Problem: Can't give full root access in a public demo, but also can't cripple the system.

Solution: Intelligent whitelisting + path restrictions

ALLOWED_COMMANDS = {"ls", "cat", "python3", "find", "grep", "wc"}
ROOT_DIR = "/app"  # All operations sandboxed
TIMEOUT = 30       # Prevent runaway processes

Result: Safe for public use, still demonstrates core capabilities.

Challenge 3: MCP Protocol Compliance at Scale

Problem: Implementing MCP for multi-node orchestration wasn't in any documentation.

Solution: Custom tool definitions that scale horizontally

tools.register("execute_command", execute_on_node)
tools.register("read_file", read_from_node)
tools.register("switch_node", change_active_node)
tools.register("sync_files", multi_node_sync)  # Custom innovation

Learning: Standard protocols need custom extensions for novel use cases. That's okayβ€”it shows deep understanding.


πŸŽ“ Why This Matters for Hiring/Collaboration

What NACC Demonstrates

βœ… Full-Stack Capability: Architected, implemented, deployed, and documented a complex system
βœ… Problem-Solving Under Constraints: Turned platform limitations into innovation
βœ… Deep Protocol Understanding: Built MCP from scratch to prove comprehension
βœ… Production Thinking: Security, scalability, state managementβ€”not just "does it work"
βœ… Communication: Explained complex tech to multiple audiences
βœ… Grit: Balanced hackathon coding with semester exams and still shipped

For Potential Collaborators

NACC is open for contributions. If you're interested in:

  • Kubernetes integration
  • Real-time monitoring dashboard
  • Security hardening
  • Blockchain node orchestration
  • Plugin system for custom tools

Let's build this together. This is just the v1 demo.


πŸš€ The Vision: Generational Shift

In 5 years, I believe:

  1. Infrastructure code becomes infrastructure conversation

    • No more Terraform for beginnersβ€”just ask AI agents
    • Democratizes DevOps for junior engineers
  2. AI agents become first-class infrastructure citizens

    • Like Docker changed deployment, agentic interfaces change orchestration
    • Every company runs autonomous AI orchestration
  3. MCP becomes the standard infrastructure protocol

    • HTTP standardized the web
    • MCP will standardize AI-to-infrastructure communication

NACC is my bet that this shift is coming, and it starts with proving it's possible.


πŸ”— Get Involved

Try NACC

  • Main Space: Live Demo
  • GitHub: Open Source
  • Quick Commands: list nodes β†’ switch to vm-node-01 β†’ read file demo.txt

Collaborate

Interested in working on NACC together?

  • Internship opportunities: I'm open to summer/full-time roles in infrastructure/DevOps/AI
  • Collaboration: Let's build the orchestration future
  • Discussion: Questions about agentic systems, MCP, or distributed infrastructure?

Connect on LinkedIn: @vasanthadithya-mundrathi
Code on GitHub: @Vasanthadithya-mundrathi


πŸ“ The Bottom Line

NACC isn't just a hackathon project. It's proof that AI can be the reasoning layer for infrastructure orchestration.

For recruiters: I ship complex systems. I learn fast. I think deeply about problems.
For collaborators: Let's build the future of infrastructure together.
For the industry: This is coming. Will you be ready?

Built by: Vasanthadithya Mundrathi (3rd year CS student, CBIT Hyderabad)
For: MCP 1st Birthday Hackathon
With: Blaxel, HuggingFace, Anthropic, Gradio
Goal: Prove that conversational AI is the future of infrastructure

Community

Sign up or log in to comment