File size: 8,493 Bytes
110a838 ec4aa90 110a838 ec4aa90 110a838 ec4aa90 3e66404 ec4aa90 3e66404 110a838 ec4aa90 3e66404 ec4aa90 3e66404 ec4aa90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
---
title: Legacy Code Modernizer - Autonomous AI Agent
emoji: π€
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Autonomous AI agent for code modernization with MCP tools
tags:
- mcp-in-action-track-enterprise
- mcp-in-action-track-consumer
- code-modernization
- autonomous-agent
- mcp
- gradio
- gemini
- modal
- llama-index
- nebius
- chromadb
---
# π€ Legacy Code Modernizer - Autonomous AI Agent
**Track 2: MCP in Action - Enterprise Applications**
An autonomous AI agent that modernizes legacy codebases through intelligent planning, reasoning, and execution using Model Context Protocol (MCP) tools.
## π― Project Overview
Legacy Code Modernizer is a complete autonomous agent system that transforms outdated code into modern, secure, and maintainable software. The agent autonomously:
1. **Plans** - Analyzes codebases and creates modernization strategies
2. **Reasons** - Makes intelligent decisions about transformation priorities
3. **Executes** - Applies transformations, generates tests, and validates changes
4. **Integrates** - Creates GitHub PRs with comprehensive documentation
## π Why This Project Stands Out
### Autonomous Agent Capabilities
**Multi-Phase Planning & Reasoning:**
- **Phase 1**: Intelligent file discovery and classification using AI pattern detection
- **Phase 2**: Semantic code analysis with vector-based similarity search (LlamaIndex + Chroma)
- **Phase 3**: Deep pattern analysis using multiple AI models (Gemini, Nebius AI)
- **Phase 4**: Autonomous code transformation with context-aware reasoning
- **Phase 5**: Automated testing in isolated sandbox + GitHub PR creation
**Context Engineering & RAG:**
- Vector embeddings for semantic code search
- Pattern grouping across similar files
- Historical transformation caching via MCP Memory
- Real-time migration guide retrieval via MCP Search
### MCP Tools Integration
The agent uses **4 MCP servers** as autonomous tools:
1. **GitHub MCP** - Autonomous PR creation with comprehensive documentation
2. **Tavily Search MCP** - Real-time migration guide discovery
3. **Memory MCP** - Pattern analysis caching and learning
4. **Filesystem MCP** - Safe file operations (planned)
### Real-World Enterprise Value
- **Multi-language support**: Python, Java, JavaScript, TypeScript
- **Secure execution**: Modal sandbox with isolated test environments
- **Production-ready**: Comprehensive test generation with coverage reporting
## π Demo
### Video Demo
**[Demo video](https://drive.google.com/file/d/1ph0NK8QKXRStjydqBV9w6HJaViirswE2/view?usp=sharing)**
### Social Media Post
**[Post on X](https://x.com/naazimhussain02/status/1994786125110710567?s=46&t=SdhRmvogISrVhMiZB_HDJQ)**
## π¬ Quick Start
### Try It Live on Hugging Face Spaces
1. **Upload a code file** (Python, Java, JavaScript, TypeScript)
2. **Select target version** (auto-detected from your code)
3. **Click "Start Modernization"**
4. **Watch the autonomous agent work** through all 5 phases
5. **Download modernized code, tests, and reports**
### Local Installation
```bash
# Clone repository
git clone https://huggingface.co/spaces/MCP-1st-Birthday/legacy_code_modernizer
cd legacy_code_modernizer
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys:
# - GEMINI_API_KEY (required)
# - GITHUB_TOKEN (for PR creation)
# - TAVILY_API_KEY (for search)
# - MODAL_TOKEN_ID & MODAL_TOKEN_SECRET (for sandbox)
# Set up Python virtual environment
# On macOS / Linux:
source venv/bin/activate
# On Windows PowerShell:
.\venv\Scripts\Activate.ps1
# On Windows CMD:
venv\Scripts\activate.bat
# Install dependencies
pip install -r requirements.txt
# Run the Gradio app
python app.py
```
## π§ Autonomous Agent Architecture
### Planning Phase
```
Input: Legacy codebase
β
Agent analyzes file structure and content
β
Classifies files by modernization priority
β
Creates transformation roadmap
```
### Reasoning Phase
```
Agent groups similar patterns using vector search
β
Retrieves migration guides via Tavily MCP
β
Checks cached analyses via Memory MCP
β
Prioritizes transformations by risk/impact
```
### Execution Phase
```
Agent transforms code with AI models
β
Generates comprehensive test suites
β
Validates in isolated Modal sandbox
β
Auto-fixes export/import issues
```
### Integration Phase
```
Agent creates GitHub branch via GitHub MCP
β
Commits transformed files
β
Generates PR with deployment checklist
β
Adds rollback plan and test results
```
## π οΈ Technical Stack
### AI & LLM
- **Google Gemini** - Primary reasoning engine with large context window
- **Nebius AI** - Alternative model for diverse perspectives
- **LlamaIndex** - RAG framework for semantic code search
- **Chroma** - Vector database for embeddings
- **bge-large-en** - Embedding model deployed on Modal for inference
### MCP Integration
- **mcp** (v1.22.0) - Model Context Protocol SDK
- **@modelcontextprotocol/server-github** - GitHub operations
- **@modelcontextprotocol/server-tavily** - Web search
- **@modelcontextprotocol/server-memory** - Persistent storage
### Execution & Testing
- **Modal** - Serverless sandbox for secure test execution
- **pytest/Jest/JUnit** - Language-specific test frameworks
- **Coverage.py/JaCoCo** - Code coverage analysis
### UI & Orchestration
- **Gradio 6.0** - Interactive web interface
- **LangGraph** - Agent workflow orchestration
- **asyncio** - Asynchronous execution
## π Features Showcase
### 1. Intelligent Pattern Detection
```python
# Agent automatically detects legacy patterns:
- Deprecated libraries (MySQLdb β PyMySQL)
- Security vulnerabilities (SQL injection)
- Python 2 syntax β Python 3
- Missing type hints
- Old-style string formatting
```
### 2. Semantic Code Search
```python
# Vector-based similarity search finds:
- Files with similar legacy patterns
- Related security vulnerabilities
- Common refactoring opportunities
```
### 3. Autonomous Test Generation
```python
# Agent generates:
- Unit tests with pytest/Jest/JUnit
- Integration tests
- Edge case coverage
- Performance benchmarks
```
### 4. GitHub Integration via MCP
```python
# Automated PR includes:
- Comprehensive change summary
- Test results with coverage
- Risk assessment
- Deployment checklist
- Rollback plan
```
## π― Supported Languages & Versions
### Python
- **Versions**: 3.10, 3.11, 3.12, 3.13, 3.14
- **Frameworks**: Django 5.2 LTS, Flask 3.1, FastAPI 0.122
- **Testing**: pytest with coverage
### Java
- **Versions**: Java 17 LTS, 21 LTS, 23, 25 LTS
- **Frameworks**: Spring Boot 3.4, 4.0
- **Testing**: Maven + JUnit 5 + JaCoCo
### JavaScript
- **Standards**: ES2024, ES2025
- **Runtimes**: Node.js 22 LTS, 24 LTS, 25
- **Frameworks**: React 19, Angular 21, Vue 3.5, Express 5.1, Next.js 16
- **Testing**: Jest with coverage
### TypeScript
- **Versions**: 5.6, 5.7, 5.8, 5.9
- **Frameworks**: React 19, Angular 21, Next.js 16
- **Testing**: Jest with ts-jest
## π Security & Isolation
### Modal Sandbox Execution
- **Network isolation**: No external network access during tests
- **Filesystem isolation**: Temporary containers per execution
- **Resource limits**: CPU and memory constraints
- **Automatic cleanup**: Containers destroyed after execution
### Code Validation
- **Syntax checking**: Pre-execution validation
- **Import/export fixing**: Automatic resolution of module issues
- **Security scanning**: Detection of vulnerabilities
- **Type checking**: Language-specific validation
## π Advanced Features
### Context Engineering
- **Sliding window context**: Manages large files efficiently
- **Cross-file analysis**: Understands dependencies
- **Pattern learning**: Improves with usage via Memory MCP
### RAG Implementation
- **Semantic chunking**: Intelligent code splitting
- **Vector similarity**: Finds related patterns
- **Hybrid search**: Combines keyword + semantic search
### Agent Reasoning
- **Priority scoring**: Risk vs. impact analysis
- **Dependency tracking**: Understands file relationships
## π License
Apache 2.0 - See LICENSE file for details
## π Acknowledgments
Built for **MCP's 1st Birthday Hackathon** hosted by Anthropic and Gradio.
**Powered by:**
- Google Gemini & Nebius AI
- Model Context Protocol (MCP)
- LlamaIndex & Chroma
- Modal
- Gradio
---
*Autonomous agents + MCP tools = The future of software development* |