Spaces:

MCP-1st-Birthday
/

legacy_code_modernizer

Running

File size: 8,493 Bytes

---
title: Legacy Code Modernizer - Autonomous AI Agent
emoji: 🤖
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Autonomous AI agent for code modernization with MCP tools
tags:
  - mcp-in-action-track-enterprise
  - mcp-in-action-track-consumer
  - code-modernization
  - autonomous-agent
  - mcp
  - gradio
  - gemini
  - modal
  - llama-index
  - nebius
  - chromadb
---

# 🤖 Legacy Code Modernizer - Autonomous AI Agent

**Track 2: MCP in Action - Enterprise Applications**

An autonomous AI agent that modernizes legacy codebases through intelligent planning, reasoning, and execution using Model Context Protocol (MCP) tools.

## 🎯 Project Overview

Legacy Code Modernizer is a complete autonomous agent system that transforms outdated code into modern, secure, and maintainable software. The agent autonomously:

1. **Plans** - Analyzes codebases and creates modernization strategies
2. **Reasons** - Makes intelligent decisions about transformation priorities
3. **Executes** - Applies transformations, generates tests, and validates changes
4. **Integrates** - Creates GitHub PRs with comprehensive documentation

## 🏆 Why This Project Stands Out

### Autonomous Agent Capabilities

**Multi-Phase Planning & Reasoning:**
- **Phase 1**: Intelligent file discovery and classification using AI pattern detection
- **Phase 2**: Semantic code analysis with vector-based similarity search (LlamaIndex + Chroma)
- **Phase 3**: Deep pattern analysis using multiple AI models (Gemini, Nebius AI)
- **Phase 4**: Autonomous code transformation with context-aware reasoning
- **Phase 5**: Automated testing in isolated sandbox + GitHub PR creation

**Context Engineering & RAG:**
- Vector embeddings for semantic code search
- Pattern grouping across similar files
- Historical transformation caching via MCP Memory
- Real-time migration guide retrieval via MCP Search

### MCP Tools Integration

The agent uses **4 MCP servers** as autonomous tools:

1. **GitHub MCP** - Autonomous PR creation with comprehensive documentation
2. **Tavily Search MCP** - Real-time migration guide discovery
3. **Memory MCP** - Pattern analysis caching and learning
4. **Filesystem MCP** - Safe file operations (planned)

### Real-World Enterprise Value

- **Multi-language support**: Python, Java, JavaScript, TypeScript
- **Secure execution**: Modal sandbox with isolated test environments
- **Production-ready**: Comprehensive test generation with coverage reporting

## 🚀 Demo

### Video Demo
**[Demo video](https://drive.google.com/file/d/1ph0NK8QKXRStjydqBV9w6HJaViirswE2/view?usp=sharing)**

### Social Media Post
**[Post on X](https://x.com/naazimhussain02/status/1994786125110710567?s=46&t=SdhRmvogISrVhMiZB_HDJQ)**

## 🎬 Quick Start

### Try It Live on Hugging Face Spaces

1. **Upload a code file** (Python, Java, JavaScript, TypeScript)
2. **Select target version** (auto-detected from your code)
3. **Click "Start Modernization"**
4. **Watch the autonomous agent work** through all 5 phases
5. **Download modernized code, tests, and reports**

### Local Installation

```bash
# Clone repository
git clone https://huggingface.co/spaces/MCP-1st-Birthday/legacy_code_modernizer
cd legacy_code_modernizer

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys:
# - GEMINI_API_KEY (required)
# - GITHUB_TOKEN (for PR creation)
# - TAVILY_API_KEY (for search)
# - MODAL_TOKEN_ID & MODAL_TOKEN_SECRET (for sandbox)

# Set up Python virtual environment
#   On macOS / Linux:
source venv/bin/activate
#   On Windows PowerShell:
.\venv\Scripts\Activate.ps1
#   On Windows CMD:
venv\Scripts\activate.bat

# Install dependencies
pip install -r requirements.txt

# Run the Gradio app
python app.py
```

## 🧠 Autonomous Agent Architecture

### Planning Phase
```
Input: Legacy codebase
↓
Agent analyzes file structure and content
↓
Classifies files by modernization priority
↓
Creates transformation roadmap
```

### Reasoning Phase
```
Agent groups similar patterns using vector search
↓
Retrieves migration guides via Tavily MCP
↓
Checks cached analyses via Memory MCP
↓
Prioritizes transformations by risk/impact
```

### Execution Phase
```
Agent transforms code with AI models
↓
Generates comprehensive test suites
↓
Validates in isolated Modal sandbox
↓
Auto-fixes export/import issues
```

### Integration Phase
```
Agent creates GitHub branch via GitHub MCP
↓
Commits transformed files
↓
Generates PR with deployment checklist
↓
Adds rollback plan and test results
```

## 🛠️ Technical Stack

### AI & LLM
- **Google Gemini** - Primary reasoning engine with large context window
- **Nebius AI** - Alternative model for diverse perspectives
- **LlamaIndex** - RAG framework for semantic code search
- **Chroma** - Vector database for embeddings
- **bge-large-en** - Embedding model deployed on Modal for inference

### MCP Integration
- **mcp** (v1.22.0) - Model Context Protocol SDK
- **@modelcontextprotocol/server-github** - GitHub operations
- **@modelcontextprotocol/server-tavily** - Web search
- **@modelcontextprotocol/server-memory** - Persistent storage

### Execution & Testing
- **Modal** - Serverless sandbox for secure test execution
- **pytest/Jest/JUnit** - Language-specific test frameworks
- **Coverage.py/JaCoCo** - Code coverage analysis

### UI & Orchestration
- **Gradio 6.0** - Interactive web interface
- **LangGraph** - Agent workflow orchestration
- **asyncio** - Asynchronous execution

## 📊 Features Showcase

### 1. Intelligent Pattern Detection
```python
# Agent automatically detects legacy patterns:
- Deprecated libraries (MySQLdb → PyMySQL)
- Security vulnerabilities (SQL injection)
- Python 2 syntax → Python 3
- Missing type hints
- Old-style string formatting
```

### 2. Semantic Code Search
```python
# Vector-based similarity search finds:
- Files with similar legacy patterns
- Related security vulnerabilities
- Common refactoring opportunities
```

### 3. Autonomous Test Generation
```python
# Agent generates:
- Unit tests with pytest/Jest/JUnit
- Integration tests
- Edge case coverage
- Performance benchmarks
```

### 4. GitHub Integration via MCP
```python
# Automated PR includes:
- Comprehensive change summary
- Test results with coverage
- Risk assessment
- Deployment checklist
- Rollback plan
```

## 🎯 Supported Languages & Versions

### Python
- **Versions**: 3.10, 3.11, 3.12, 3.13, 3.14
- **Frameworks**: Django 5.2 LTS, Flask 3.1, FastAPI 0.122
- **Testing**: pytest with coverage

### Java
- **Versions**: Java 17 LTS, 21 LTS, 23, 25 LTS
- **Frameworks**: Spring Boot 3.4, 4.0
- **Testing**: Maven + JUnit 5 + JaCoCo

### JavaScript
- **Standards**: ES2024, ES2025
- **Runtimes**: Node.js 22 LTS, 24 LTS, 25
- **Frameworks**: React 19, Angular 21, Vue 3.5, Express 5.1, Next.js 16
- **Testing**: Jest with coverage

### TypeScript
- **Versions**: 5.6, 5.7, 5.8, 5.9
- **Frameworks**: React 19, Angular 21, Next.js 16
- **Testing**: Jest with ts-jest

## 🔒 Security & Isolation

### Modal Sandbox Execution
- **Network isolation**: No external network access during tests
- **Filesystem isolation**: Temporary containers per execution
- **Resource limits**: CPU and memory constraints
- **Automatic cleanup**: Containers destroyed after execution

### Code Validation
- **Syntax checking**: Pre-execution validation
- **Import/export fixing**: Automatic resolution of module issues
- **Security scanning**: Detection of vulnerabilities
- **Type checking**: Language-specific validation


## 🎓 Advanced Features

### Context Engineering
- **Sliding window context**: Manages large files efficiently
- **Cross-file analysis**: Understands dependencies
- **Pattern learning**: Improves with usage via Memory MCP

### RAG Implementation
- **Semantic chunking**: Intelligent code splitting
- **Vector similarity**: Finds related patterns
- **Hybrid search**: Combines keyword + semantic search

### Agent Reasoning
- **Priority scoring**: Risk vs. impact analysis
- **Dependency tracking**: Understands file relationships

## 📝 License

Apache 2.0 - See LICENSE file for details

## 🙏 Acknowledgments

Built for **MCP's 1st Birthday Hackathon** hosted by Anthropic and Gradio.

**Powered by:**
- Google Gemini & Nebius AI
- Model Context Protocol (MCP)
- LlamaIndex & Chroma
- Modal
- Gradio

---

*Autonomous agents + MCP tools = The future of software development*