File size: 8,493 Bytes
110a838
ec4aa90
 
110a838
ec4aa90
110a838
 
 
 
 
ec4aa90
 
 
3e66404
ec4aa90
 
 
 
3e66404
 
 
 
 
110a838
 
ec4aa90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e66404
ec4aa90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e66404
ec4aa90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
---
title: Legacy Code Modernizer - Autonomous AI Agent
emoji: πŸ€–
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Autonomous AI agent for code modernization with MCP tools
tags:
  - mcp-in-action-track-enterprise
  - mcp-in-action-track-consumer
  - code-modernization
  - autonomous-agent
  - mcp
  - gradio
  - gemini
  - modal
  - llama-index
  - nebius
  - chromadb
---

# πŸ€– Legacy Code Modernizer - Autonomous AI Agent

**Track 2: MCP in Action - Enterprise Applications**

An autonomous AI agent that modernizes legacy codebases through intelligent planning, reasoning, and execution using Model Context Protocol (MCP) tools.

## 🎯 Project Overview

Legacy Code Modernizer is a complete autonomous agent system that transforms outdated code into modern, secure, and maintainable software. The agent autonomously:

1. **Plans** - Analyzes codebases and creates modernization strategies
2. **Reasons** - Makes intelligent decisions about transformation priorities
3. **Executes** - Applies transformations, generates tests, and validates changes
4. **Integrates** - Creates GitHub PRs with comprehensive documentation

## πŸ† Why This Project Stands Out

### Autonomous Agent Capabilities

**Multi-Phase Planning & Reasoning:**
- **Phase 1**: Intelligent file discovery and classification using AI pattern detection
- **Phase 2**: Semantic code analysis with vector-based similarity search (LlamaIndex + Chroma)
- **Phase 3**: Deep pattern analysis using multiple AI models (Gemini, Nebius AI)
- **Phase 4**: Autonomous code transformation with context-aware reasoning
- **Phase 5**: Automated testing in isolated sandbox + GitHub PR creation

**Context Engineering & RAG:**
- Vector embeddings for semantic code search
- Pattern grouping across similar files
- Historical transformation caching via MCP Memory
- Real-time migration guide retrieval via MCP Search

### MCP Tools Integration

The agent uses **4 MCP servers** as autonomous tools:

1. **GitHub MCP** - Autonomous PR creation with comprehensive documentation
2. **Tavily Search MCP** - Real-time migration guide discovery
3. **Memory MCP** - Pattern analysis caching and learning
4. **Filesystem MCP** - Safe file operations (planned)

### Real-World Enterprise Value

- **Multi-language support**: Python, Java, JavaScript, TypeScript
- **Secure execution**: Modal sandbox with isolated test environments
- **Production-ready**: Comprehensive test generation with coverage reporting

## πŸš€ Demo

### Video Demo
**[Demo video](https://drive.google.com/file/d/1ph0NK8QKXRStjydqBV9w6HJaViirswE2/view?usp=sharing)**

### Social Media Post
**[Post on X](https://x.com/naazimhussain02/status/1994786125110710567?s=46&t=SdhRmvogISrVhMiZB_HDJQ)**

## 🎬 Quick Start

### Try It Live on Hugging Face Spaces

1. **Upload a code file** (Python, Java, JavaScript, TypeScript)
2. **Select target version** (auto-detected from your code)
3. **Click "Start Modernization"**
4. **Watch the autonomous agent work** through all 5 phases
5. **Download modernized code, tests, and reports**

### Local Installation

```bash
# Clone repository
git clone https://huggingface.co/spaces/MCP-1st-Birthday/legacy_code_modernizer
cd legacy_code_modernizer

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys:
# - GEMINI_API_KEY (required)
# - GITHUB_TOKEN (for PR creation)
# - TAVILY_API_KEY (for search)
# - MODAL_TOKEN_ID & MODAL_TOKEN_SECRET (for sandbox)

# Set up Python virtual environment
#   On macOS / Linux:
source venv/bin/activate
#   On Windows PowerShell:
.\venv\Scripts\Activate.ps1
#   On Windows CMD:
venv\Scripts\activate.bat

# Install dependencies
pip install -r requirements.txt

# Run the Gradio app
python app.py
```

## 🧠 Autonomous Agent Architecture

### Planning Phase
```
Input: Legacy codebase
↓
Agent analyzes file structure and content
↓
Classifies files by modernization priority
↓
Creates transformation roadmap
```

### Reasoning Phase
```
Agent groups similar patterns using vector search
↓
Retrieves migration guides via Tavily MCP
↓
Checks cached analyses via Memory MCP
↓
Prioritizes transformations by risk/impact
```

### Execution Phase
```
Agent transforms code with AI models
↓
Generates comprehensive test suites
↓
Validates in isolated Modal sandbox
↓
Auto-fixes export/import issues
```

### Integration Phase
```
Agent creates GitHub branch via GitHub MCP
↓
Commits transformed files
↓
Generates PR with deployment checklist
↓
Adds rollback plan and test results
```

## πŸ› οΈ Technical Stack

### AI & LLM
- **Google Gemini** - Primary reasoning engine with large context window
- **Nebius AI** - Alternative model for diverse perspectives
- **LlamaIndex** - RAG framework for semantic code search
- **Chroma** - Vector database for embeddings
- **bge-large-en** - Embedding model deployed on Modal for inference

### MCP Integration
- **mcp** (v1.22.0) - Model Context Protocol SDK
- **@modelcontextprotocol/server-github** - GitHub operations
- **@modelcontextprotocol/server-tavily** - Web search
- **@modelcontextprotocol/server-memory** - Persistent storage

### Execution & Testing
- **Modal** - Serverless sandbox for secure test execution
- **pytest/Jest/JUnit** - Language-specific test frameworks
- **Coverage.py/JaCoCo** - Code coverage analysis

### UI & Orchestration
- **Gradio 6.0** - Interactive web interface
- **LangGraph** - Agent workflow orchestration
- **asyncio** - Asynchronous execution

## πŸ“Š Features Showcase

### 1. Intelligent Pattern Detection
```python
# Agent automatically detects legacy patterns:
- Deprecated libraries (MySQLdb β†’ PyMySQL)
- Security vulnerabilities (SQL injection)
- Python 2 syntax β†’ Python 3
- Missing type hints
- Old-style string formatting
```

### 2. Semantic Code Search
```python
# Vector-based similarity search finds:
- Files with similar legacy patterns
- Related security vulnerabilities
- Common refactoring opportunities
```

### 3. Autonomous Test Generation
```python
# Agent generates:
- Unit tests with pytest/Jest/JUnit
- Integration tests
- Edge case coverage
- Performance benchmarks
```

### 4. GitHub Integration via MCP
```python
# Automated PR includes:
- Comprehensive change summary
- Test results with coverage
- Risk assessment
- Deployment checklist
- Rollback plan
```

## 🎯 Supported Languages & Versions

### Python
- **Versions**: 3.10, 3.11, 3.12, 3.13, 3.14
- **Frameworks**: Django 5.2 LTS, Flask 3.1, FastAPI 0.122
- **Testing**: pytest with coverage

### Java
- **Versions**: Java 17 LTS, 21 LTS, 23, 25 LTS
- **Frameworks**: Spring Boot 3.4, 4.0
- **Testing**: Maven + JUnit 5 + JaCoCo

### JavaScript
- **Standards**: ES2024, ES2025
- **Runtimes**: Node.js 22 LTS, 24 LTS, 25
- **Frameworks**: React 19, Angular 21, Vue 3.5, Express 5.1, Next.js 16
- **Testing**: Jest with coverage

### TypeScript
- **Versions**: 5.6, 5.7, 5.8, 5.9
- **Frameworks**: React 19, Angular 21, Next.js 16
- **Testing**: Jest with ts-jest

## πŸ”’ Security & Isolation

### Modal Sandbox Execution
- **Network isolation**: No external network access during tests
- **Filesystem isolation**: Temporary containers per execution
- **Resource limits**: CPU and memory constraints
- **Automatic cleanup**: Containers destroyed after execution

### Code Validation
- **Syntax checking**: Pre-execution validation
- **Import/export fixing**: Automatic resolution of module issues
- **Security scanning**: Detection of vulnerabilities
- **Type checking**: Language-specific validation


## πŸŽ“ Advanced Features

### Context Engineering
- **Sliding window context**: Manages large files efficiently
- **Cross-file analysis**: Understands dependencies
- **Pattern learning**: Improves with usage via Memory MCP

### RAG Implementation
- **Semantic chunking**: Intelligent code splitting
- **Vector similarity**: Finds related patterns
- **Hybrid search**: Combines keyword + semantic search

### Agent Reasoning
- **Priority scoring**: Risk vs. impact analysis
- **Dependency tracking**: Understands file relationships

## πŸ“ License

Apache 2.0 - See LICENSE file for details

## πŸ™ Acknowledgments

Built for **MCP's 1st Birthday Hackathon** hosted by Anthropic and Gradio.

**Powered by:**
- Google Gemini & Nebius AI
- Model Context Protocol (MCP)
- LlamaIndex & Chroma
- Modal
- Gradio

---

*Autonomous agents + MCP tools = The future of software development*