|
|
--- |
|
|
title: Legacy Code Modernizer - Autonomous AI Agent |
|
|
emoji: π€ |
|
|
colorFrom: purple |
|
|
colorTo: blue |
|
|
sdk: gradio |
|
|
sdk_version: 6.0.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
short_description: Autonomous AI agent for code modernization with MCP tools |
|
|
tags: |
|
|
- mcp-in-action-track-enterprise |
|
|
- mcp-in-action-track-consumer |
|
|
- code-modernization |
|
|
- autonomous-agent |
|
|
- mcp |
|
|
- gradio |
|
|
- gemini |
|
|
- modal |
|
|
- llama-index |
|
|
- nebius |
|
|
- chromadb |
|
|
--- |
|
|
|
|
|
# π€ Legacy Code Modernizer - Autonomous AI Agent |
|
|
|
|
|
**Track 2: MCP in Action - Enterprise Applications** |
|
|
|
|
|
An autonomous AI agent that modernizes legacy codebases through intelligent planning, reasoning, and execution using Model Context Protocol (MCP) tools. |
|
|
|
|
|
## π― Project Overview |
|
|
|
|
|
Legacy Code Modernizer is a complete autonomous agent system that transforms outdated code into modern, secure, and maintainable software. The agent autonomously: |
|
|
|
|
|
1. **Plans** - Analyzes codebases and creates modernization strategies |
|
|
2. **Reasons** - Makes intelligent decisions about transformation priorities |
|
|
3. **Executes** - Applies transformations, generates tests, and validates changes |
|
|
4. **Integrates** - Creates GitHub PRs with comprehensive documentation |
|
|
|
|
|
## π Why This Project Stands Out |
|
|
|
|
|
### Autonomous Agent Capabilities |
|
|
|
|
|
**Multi-Phase Planning & Reasoning:** |
|
|
- **Phase 1**: Intelligent file discovery and classification using AI pattern detection |
|
|
- **Phase 2**: Semantic code analysis with vector-based similarity search (LlamaIndex + Chroma) |
|
|
- **Phase 3**: Deep pattern analysis using multiple AI models (Gemini, Nebius AI) |
|
|
- **Phase 4**: Autonomous code transformation with context-aware reasoning |
|
|
- **Phase 5**: Automated testing in isolated sandbox + GitHub PR creation |
|
|
|
|
|
**Context Engineering & RAG:** |
|
|
- Vector embeddings for semantic code search |
|
|
- Pattern grouping across similar files |
|
|
- Historical transformation caching via MCP Memory |
|
|
- Real-time migration guide retrieval via MCP Search |
|
|
|
|
|
### MCP Tools Integration |
|
|
|
|
|
The agent uses **4 MCP servers** as autonomous tools: |
|
|
|
|
|
1. **GitHub MCP** - Autonomous PR creation with comprehensive documentation |
|
|
2. **Tavily Search MCP** - Real-time migration guide discovery |
|
|
3. **Memory MCP** - Pattern analysis caching and learning |
|
|
4. **Filesystem MCP** - Safe file operations (planned) |
|
|
|
|
|
### Real-World Enterprise Value |
|
|
|
|
|
- **Multi-language support**: Python, Java, JavaScript, TypeScript |
|
|
- **Secure execution**: Modal sandbox with isolated test environments |
|
|
- **Production-ready**: Comprehensive test generation with coverage reporting |
|
|
|
|
|
## π Demo |
|
|
|
|
|
### Video Demo |
|
|
**[Demo video](https://drive.google.com/file/d/1ph0NK8QKXRStjydqBV9w6HJaViirswE2/view?usp=sharing)** |
|
|
|
|
|
### Social Media Post |
|
|
**[Post on X](https://x.com/naazimhussain02/status/1994786125110710567?s=46&t=SdhRmvogISrVhMiZB_HDJQ)** |
|
|
|
|
|
## π¬ Quick Start |
|
|
|
|
|
### Try It Live on Hugging Face Spaces |
|
|
|
|
|
1. **Upload a code file** (Python, Java, JavaScript, TypeScript) |
|
|
2. **Select target version** (auto-detected from your code) |
|
|
3. **Click "Start Modernization"** |
|
|
4. **Watch the autonomous agent work** through all 5 phases |
|
|
5. **Download modernized code, tests, and reports** |
|
|
|
|
|
### Local Installation |
|
|
|
|
|
```bash |
|
|
# Clone repository |
|
|
git clone https://huggingface.co/spaces/MCP-1st-Birthday/legacy_code_modernizer |
|
|
cd legacy_code_modernizer |
|
|
|
|
|
# Set up environment variables |
|
|
cp .env.example .env |
|
|
# Edit .env with your API keys: |
|
|
# - GEMINI_API_KEY (required) |
|
|
# - GITHUB_TOKEN (for PR creation) |
|
|
# - TAVILY_API_KEY (for search) |
|
|
# - MODAL_TOKEN_ID & MODAL_TOKEN_SECRET (for sandbox) |
|
|
|
|
|
# Set up Python virtual environment |
|
|
# On macOS / Linux: |
|
|
source venv/bin/activate |
|
|
# On Windows PowerShell: |
|
|
.\venv\Scripts\Activate.ps1 |
|
|
# On Windows CMD: |
|
|
venv\Scripts\activate.bat |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Run the Gradio app |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
## π§ Autonomous Agent Architecture |
|
|
|
|
|
### Planning Phase |
|
|
``` |
|
|
Input: Legacy codebase |
|
|
β |
|
|
Agent analyzes file structure and content |
|
|
β |
|
|
Classifies files by modernization priority |
|
|
β |
|
|
Creates transformation roadmap |
|
|
``` |
|
|
|
|
|
### Reasoning Phase |
|
|
``` |
|
|
Agent groups similar patterns using vector search |
|
|
β |
|
|
Retrieves migration guides via Tavily MCP |
|
|
β |
|
|
Checks cached analyses via Memory MCP |
|
|
β |
|
|
Prioritizes transformations by risk/impact |
|
|
``` |
|
|
|
|
|
### Execution Phase |
|
|
``` |
|
|
Agent transforms code with AI models |
|
|
β |
|
|
Generates comprehensive test suites |
|
|
β |
|
|
Validates in isolated Modal sandbox |
|
|
β |
|
|
Auto-fixes export/import issues |
|
|
``` |
|
|
|
|
|
### Integration Phase |
|
|
``` |
|
|
Agent creates GitHub branch via GitHub MCP |
|
|
β |
|
|
Commits transformed files |
|
|
β |
|
|
Generates PR with deployment checklist |
|
|
β |
|
|
Adds rollback plan and test results |
|
|
``` |
|
|
|
|
|
## π οΈ Technical Stack |
|
|
|
|
|
### AI & LLM |
|
|
- **Google Gemini** - Primary reasoning engine with large context window |
|
|
- **Nebius AI** - Alternative model for diverse perspectives |
|
|
- **LlamaIndex** - RAG framework for semantic code search |
|
|
- **Chroma** - Vector database for embeddings |
|
|
- **bge-large-en** - Embedding model deployed on Modal for inference |
|
|
|
|
|
### MCP Integration |
|
|
- **mcp** (v1.22.0) - Model Context Protocol SDK |
|
|
- **@modelcontextprotocol/server-github** - GitHub operations |
|
|
- **@modelcontextprotocol/server-tavily** - Web search |
|
|
- **@modelcontextprotocol/server-memory** - Persistent storage |
|
|
|
|
|
### Execution & Testing |
|
|
- **Modal** - Serverless sandbox for secure test execution |
|
|
- **pytest/Jest/JUnit** - Language-specific test frameworks |
|
|
- **Coverage.py/JaCoCo** - Code coverage analysis |
|
|
|
|
|
### UI & Orchestration |
|
|
- **Gradio 6.0** - Interactive web interface |
|
|
- **LangGraph** - Agent workflow orchestration |
|
|
- **asyncio** - Asynchronous execution |
|
|
|
|
|
## π Features Showcase |
|
|
|
|
|
### 1. Intelligent Pattern Detection |
|
|
```python |
|
|
# Agent automatically detects legacy patterns: |
|
|
- Deprecated libraries (MySQLdb β PyMySQL) |
|
|
- Security vulnerabilities (SQL injection) |
|
|
- Python 2 syntax β Python 3 |
|
|
- Missing type hints |
|
|
- Old-style string formatting |
|
|
``` |
|
|
|
|
|
### 2. Semantic Code Search |
|
|
```python |
|
|
# Vector-based similarity search finds: |
|
|
- Files with similar legacy patterns |
|
|
- Related security vulnerabilities |
|
|
- Common refactoring opportunities |
|
|
``` |
|
|
|
|
|
### 3. Autonomous Test Generation |
|
|
```python |
|
|
# Agent generates: |
|
|
- Unit tests with pytest/Jest/JUnit |
|
|
- Integration tests |
|
|
- Edge case coverage |
|
|
- Performance benchmarks |
|
|
``` |
|
|
|
|
|
### 4. GitHub Integration via MCP |
|
|
```python |
|
|
# Automated PR includes: |
|
|
- Comprehensive change summary |
|
|
- Test results with coverage |
|
|
- Risk assessment |
|
|
- Deployment checklist |
|
|
- Rollback plan |
|
|
``` |
|
|
|
|
|
## π― Supported Languages & Versions |
|
|
|
|
|
### Python |
|
|
- **Versions**: 3.10, 3.11, 3.12, 3.13, 3.14 |
|
|
- **Frameworks**: Django 5.2 LTS, Flask 3.1, FastAPI 0.122 |
|
|
- **Testing**: pytest with coverage |
|
|
|
|
|
### Java |
|
|
- **Versions**: Java 17 LTS, 21 LTS, 23, 25 LTS |
|
|
- **Frameworks**: Spring Boot 3.4, 4.0 |
|
|
- **Testing**: Maven + JUnit 5 + JaCoCo |
|
|
|
|
|
### JavaScript |
|
|
- **Standards**: ES2024, ES2025 |
|
|
- **Runtimes**: Node.js 22 LTS, 24 LTS, 25 |
|
|
- **Frameworks**: React 19, Angular 21, Vue 3.5, Express 5.1, Next.js 16 |
|
|
- **Testing**: Jest with coverage |
|
|
|
|
|
### TypeScript |
|
|
- **Versions**: 5.6, 5.7, 5.8, 5.9 |
|
|
- **Frameworks**: React 19, Angular 21, Next.js 16 |
|
|
- **Testing**: Jest with ts-jest |
|
|
|
|
|
## π Security & Isolation |
|
|
|
|
|
### Modal Sandbox Execution |
|
|
- **Network isolation**: No external network access during tests |
|
|
- **Filesystem isolation**: Temporary containers per execution |
|
|
- **Resource limits**: CPU and memory constraints |
|
|
- **Automatic cleanup**: Containers destroyed after execution |
|
|
|
|
|
### Code Validation |
|
|
- **Syntax checking**: Pre-execution validation |
|
|
- **Import/export fixing**: Automatic resolution of module issues |
|
|
- **Security scanning**: Detection of vulnerabilities |
|
|
- **Type checking**: Language-specific validation |
|
|
|
|
|
|
|
|
## π Advanced Features |
|
|
|
|
|
### Context Engineering |
|
|
- **Sliding window context**: Manages large files efficiently |
|
|
- **Cross-file analysis**: Understands dependencies |
|
|
- **Pattern learning**: Improves with usage via Memory MCP |
|
|
|
|
|
### RAG Implementation |
|
|
- **Semantic chunking**: Intelligent code splitting |
|
|
- **Vector similarity**: Finds related patterns |
|
|
- **Hybrid search**: Combines keyword + semantic search |
|
|
|
|
|
### Agent Reasoning |
|
|
- **Priority scoring**: Risk vs. impact analysis |
|
|
- **Dependency tracking**: Understands file relationships |
|
|
|
|
|
## π License |
|
|
|
|
|
Apache 2.0 - See LICENSE file for details |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
Built for **MCP's 1st Birthday Hackathon** hosted by Anthropic and Gradio. |
|
|
|
|
|
**Powered by:** |
|
|
- Google Gemini & Nebius AI |
|
|
- Model Context Protocol (MCP) |
|
|
- LlamaIndex & Chroma |
|
|
- Modal |
|
|
- Gradio |
|
|
|
|
|
--- |
|
|
|
|
|
*Autonomous agents + MCP tools = The future of software development* |