Spaces:
Running
on
Zero
Running
on
Zero
Commit
·
b9ed1c8
0
Parent(s):
Initial commit
Browse files- .gitattributes +35 -0
- DEPLOYMENT.md +151 -0
- README.md +213 -0
- app.py +140 -0
- client.html +460 -0
- client.py +119 -0
- client_requirements.txt +2 -0
- requirements.txt +11 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deployment Guide
|
| 2 |
+
|
| 3 |
+
## 🚀 Deploying to Hugging Face Spaces
|
| 4 |
+
|
| 5 |
+
### Step 1: Create a New Space
|
| 6 |
+
|
| 7 |
+
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
|
| 8 |
+
2. Click "Create new Space"
|
| 9 |
+
3. Fill in the details:
|
| 10 |
+
- **Space name**: `whisper-api` (or your preferred name)
|
| 11 |
+
- **License**: MIT
|
| 12 |
+
- **SDK**: Docker
|
| 13 |
+
- **Hardware**: **ZeroGPU** (this is crucial!)
|
| 14 |
+
- **Visibility**: Public or Private
|
| 15 |
+
|
| 16 |
+
### Step 2: Upload Files
|
| 17 |
+
|
| 18 |
+
Upload these files to your Space:
|
| 19 |
+
|
| 20 |
+
1. **app.py** - Main application file
|
| 21 |
+
2. **requirements.txt** - Python dependencies
|
| 22 |
+
3. **README.md** - Space documentation (optional)
|
| 23 |
+
|
| 24 |
+
### Step 3: Configure Space Settings
|
| 25 |
+
|
| 26 |
+
In your Space settings, ensure:
|
| 27 |
+
- **Hardware**: ZeroGPU is selected
|
| 28 |
+
- **Environment variables**: None required
|
| 29 |
+
- **Secrets**: None required
|
| 30 |
+
|
| 31 |
+
### Step 4: Deploy
|
| 32 |
+
|
| 33 |
+
The Space will automatically build and deploy. This process may take 5-10 minutes.
|
| 34 |
+
|
| 35 |
+
### Step 5: Test Your Deployment
|
| 36 |
+
|
| 37 |
+
Once deployed, test your API:
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
# Health check
|
| 41 |
+
curl https://your-username-whisper-api.hf.space/health
|
| 42 |
+
|
| 43 |
+
# Test transcription (replace with your actual endpoint)
|
| 44 |
+
curl -X POST "https://your-username-whisper-api.hf.space/transcribe" \
|
| 45 |
+
-F "file=@test_audio.mp3"
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## 🔧 Local Testing
|
| 49 |
+
|
| 50 |
+
### Test the Web Client
|
| 51 |
+
|
| 52 |
+
1. Open `client.html` in your browser
|
| 53 |
+
2. Update the endpoint URL to your Space URL
|
| 54 |
+
3. Upload an audio file and test transcription
|
| 55 |
+
|
| 56 |
+
### Test the Python Client
|
| 57 |
+
|
| 58 |
+
```bash
|
| 59 |
+
# Install dependencies
|
| 60 |
+
pip install -r client_requirements.txt
|
| 61 |
+
|
| 62 |
+
# Test health
|
| 63 |
+
python client.py --health --endpoint https://your-username-whisper-api.hf.space
|
| 64 |
+
|
| 65 |
+
# Test transcription
|
| 66 |
+
python client.py audio.mp3 --endpoint https://your-username-whisper-api.hf.space
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
### Run the Test Suite
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
# Install additional dependencies for testing
|
| 73 |
+
pip install numpy soundfile
|
| 74 |
+
|
| 75 |
+
# Run comprehensive tests
|
| 76 |
+
python test_api.py
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
## 🐛 Troubleshooting
|
| 80 |
+
|
| 81 |
+
### Common Issues
|
| 82 |
+
|
| 83 |
+
1. **Space fails to build**
|
| 84 |
+
- Check that all required files are uploaded
|
| 85 |
+
- Verify `requirements.txt` has correct dependencies
|
| 86 |
+
- Check Space logs for error messages
|
| 87 |
+
|
| 88 |
+
2. **Model loading errors**
|
| 89 |
+
- Ensure ZeroGPU is enabled
|
| 90 |
+
- Check that the model ID is correct
|
| 91 |
+
- Verify internet connectivity for model download
|
| 92 |
+
|
| 93 |
+
3. **API not responding**
|
| 94 |
+
- Check Space status (should be "Running")
|
| 95 |
+
- Verify the Space URL is correct
|
| 96 |
+
- Check Space logs for runtime errors
|
| 97 |
+
|
| 98 |
+
4. **CORS errors**
|
| 99 |
+
- The app is configured for all origins (`*`)
|
| 100 |
+
- If you need specific origins, modify the CORS settings in `app.py`
|
| 101 |
+
|
| 102 |
+
### Monitoring Your Space
|
| 103 |
+
|
| 104 |
+
- **Logs**: Check the "Logs" tab in your Space
|
| 105 |
+
- **Metrics**: Monitor CPU/GPU usage in the "Metrics" tab
|
| 106 |
+
- **Health**: Use the `/health` endpoint to check API status
|
| 107 |
+
|
| 108 |
+
## 📊 Performance Optimization
|
| 109 |
+
|
| 110 |
+
### For Better Performance
|
| 111 |
+
|
| 112 |
+
1. **Use ZeroGPU**: Essential for fast inference
|
| 113 |
+
2. **Optimize file sizes**: Smaller files process faster
|
| 114 |
+
3. **Batch processing**: Use the batch_size parameter for multiple files
|
| 115 |
+
4. **Model caching**: The model stays loaded between requests
|
| 116 |
+
|
| 117 |
+
### Resource Limits
|
| 118 |
+
|
| 119 |
+
- **ZeroGPU**: Limited GPU time per month
|
| 120 |
+
- **Memory**: ~2GB RAM usage
|
| 121 |
+
- **Storage**: Model files (~1.6GB)
|
| 122 |
+
- **Timeout**: 5-minute request timeout
|
| 123 |
+
|
| 124 |
+
## 🔄 Updates and Maintenance
|
| 125 |
+
|
| 126 |
+
### Updating Your Space
|
| 127 |
+
|
| 128 |
+
1. Edit files in your Space repository
|
| 129 |
+
2. Commit changes
|
| 130 |
+
3. The Space will automatically rebuild
|
| 131 |
+
|
| 132 |
+
### Monitoring Usage
|
| 133 |
+
|
| 134 |
+
- Check your ZeroGPU usage in your HF account
|
| 135 |
+
- Monitor API calls and performance
|
| 136 |
+
- Set up alerts if needed
|
| 137 |
+
|
| 138 |
+
## 📈 Scaling Considerations
|
| 139 |
+
|
| 140 |
+
### For Production Use
|
| 141 |
+
|
| 142 |
+
1. **Rate limiting**: Consider implementing rate limiting
|
| 143 |
+
2. **Authentication**: Add API keys for production use
|
| 144 |
+
3. **Monitoring**: Set up proper logging and monitoring
|
| 145 |
+
4. **Backup**: Keep local copies of your code
|
| 146 |
+
|
| 147 |
+
### Alternative Deployments
|
| 148 |
+
|
| 149 |
+
- **Self-hosted**: Deploy on your own infrastructure
|
| 150 |
+
- **Cloud providers**: Use AWS, GCP, or Azure
|
| 151 |
+
- **Docker**: Use the Dockerfile for containerized deployment
|
README.md
ADDED
|
@@ -0,0 +1,213 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Whisper Large V3 Turbo API
|
| 3 |
+
emoji: 🎤
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
license: mit
|
| 9 |
+
short_description: Whisper LV3Turbo for ASR -API
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Whisper API Project
|
| 13 |
+
|
| 14 |
+
## Features
|
| 15 |
+
|
| 16 |
+
- **Hugging Face Space Deployment**: Deploy Whisper Large V3 Turbo with ZeroGPU
|
| 17 |
+
- **Fast API**: RESTful API endpoints for transcription
|
| 18 |
+
- **CORS Support**: Works with any frontend application
|
| 19 |
+
- **Web Interface**: Beautiful Gradio interface for easy testing
|
| 20 |
+
- **Local Clients**: Both web and Python clients for integration
|
| 21 |
+
- **Multiple Formats**: Supports audio and video files
|
| 22 |
+
|
| 23 |
+
## Project Structure
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
WhisperAPI/
|
| 27 |
+
├── app.py # Main Hugging Face Space application
|
| 28 |
+
├── requirements.txt # Python dependencies for HF Space
|
| 29 |
+
├── README.md # HF Space documentation
|
| 30 |
+
├── client.html # Web-based client interface
|
| 31 |
+
├── client.py # Python command-line client
|
| 32 |
+
├── client_requirements.txt # Python client dependencies
|
| 33 |
+
└── README.md # This file
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Setup Instructions
|
| 37 |
+
|
| 38 |
+
### 1. Deploy to Hugging Face Spaces
|
| 39 |
+
|
| 40 |
+
1. Create a new Space on [Hugging Face](https://huggingface.co/spaces)
|
| 41 |
+
2. Choose "Docker" as the SDK
|
| 42 |
+
3. Upload the following files to your Space:
|
| 43 |
+
- `app.py`
|
| 44 |
+
- `requirements.txt`
|
| 45 |
+
- `README.md` (for the Space)
|
| 46 |
+
4. Set the Space to use **ZeroGPU** hardware
|
| 47 |
+
5. The Space will automatically build and deploy
|
| 48 |
+
|
| 49 |
+
### 2. Local Client Setup
|
| 50 |
+
|
| 51 |
+
#### Web Client
|
| 52 |
+
1. Open `client.html` in your web browser
|
| 53 |
+
2. Enter your Hugging Face Space URL (default: `https://binkhoale1812-whisperapi.hf.space`)
|
| 54 |
+
3. Upload audio/video files and get transcriptions
|
| 55 |
+
|
| 56 |
+
#### Python Client
|
| 57 |
+
1. Install dependencies:
|
| 58 |
+
```bash
|
| 59 |
+
pip install -r client_requirements.txt
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
2. Use the command-line client:
|
| 63 |
+
```bash
|
| 64 |
+
# Transcribe a file
|
| 65 |
+
python client.py audio.mp3
|
| 66 |
+
|
| 67 |
+
# Check API health
|
| 68 |
+
python client.py --health
|
| 69 |
+
|
| 70 |
+
# Save transcription to file
|
| 71 |
+
python client.py audio.mp3 --output transcription.txt
|
| 72 |
+
|
| 73 |
+
# Use custom endpoint
|
| 74 |
+
python client.py audio.mp3 --endpoint https://your-space.hf.space
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## API Endpoints
|
| 78 |
+
|
| 79 |
+
### POST /transcribe
|
| 80 |
+
Transcribe an audio/video file.
|
| 81 |
+
|
| 82 |
+
**Request:**
|
| 83 |
+
- Method: POST
|
| 84 |
+
- Content-Type: multipart/form-data
|
| 85 |
+
- Body: File upload
|
| 86 |
+
|
| 87 |
+
**Response:**
|
| 88 |
+
```json
|
| 89 |
+
{
|
| 90 |
+
"text": "Transcribed text here...",
|
| 91 |
+
"success": true
|
| 92 |
+
}
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
### GET /health
|
| 96 |
+
Check API health status.
|
| 97 |
+
|
| 98 |
+
**Response:**
|
| 99 |
+
```json
|
| 100 |
+
{
|
| 101 |
+
"status": "healthy",
|
| 102 |
+
"model_loaded": true
|
| 103 |
+
}
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
## Usage Examples
|
| 107 |
+
|
| 108 |
+
### cURL Example
|
| 109 |
+
```bash
|
| 110 |
+
curl -X POST "https://your-space.hf.space/transcribe" \
|
| 111 |
+
-F "file=@audio.mp3"
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
### Python Example
|
| 115 |
+
```python
|
| 116 |
+
import requests
|
| 117 |
+
|
| 118 |
+
# Transcribe audio file
|
| 119 |
+
with open('audio.mp3', 'rb') as f:
|
| 120 |
+
files = {'file': ('audio.mp3', f, 'audio/mpeg')}
|
| 121 |
+
response = requests.post('https://your-space.hf.space/transcribe', files=files)
|
| 122 |
+
result = response.json()
|
| 123 |
+
|
| 124 |
+
if result['success']:
|
| 125 |
+
print(result['text'])
|
| 126 |
+
else:
|
| 127 |
+
print(f"Error: {result['error']}")
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
### JavaScript Example
|
| 131 |
+
```javascript
|
| 132 |
+
const formData = new FormData();
|
| 133 |
+
formData.append('file', audioFile);
|
| 134 |
+
|
| 135 |
+
fetch('https://your-space.hf.space/transcribe', {
|
| 136 |
+
method: 'POST',
|
| 137 |
+
body: formData
|
| 138 |
+
})
|
| 139 |
+
.then(response => response.json())
|
| 140 |
+
.then(result => {
|
| 141 |
+
if (result.success) {
|
| 142 |
+
console.log(result.text);
|
| 143 |
+
} else {
|
| 144 |
+
console.error(result.error);
|
| 145 |
+
}
|
| 146 |
+
});
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
## Supported File Formats
|
| 150 |
+
|
| 151 |
+
### Audio Formats
|
| 152 |
+
- MP3
|
| 153 |
+
- WAV
|
| 154 |
+
- FLAC
|
| 155 |
+
- M4A
|
| 156 |
+
- OGG
|
| 157 |
+
|
| 158 |
+
### Video Formats
|
| 159 |
+
- MP4
|
| 160 |
+
- AVI
|
| 161 |
+
- MOV
|
| 162 |
+
- MKV
|
| 163 |
+
|
| 164 |
+
*Note: For video files, only the audio track will be processed.*
|
| 165 |
+
|
| 166 |
+
## Performance Notes
|
| 167 |
+
|
| 168 |
+
- **Model**: Whisper Large V3 Turbo (809M parameters)
|
| 169 |
+
- **Speed**: ~4x faster than standard Large V3
|
| 170 |
+
- **GPU**: Uses ZeroGPU for efficient inference
|
| 171 |
+
- **Languages**: Supports 99 languages
|
| 172 |
+
- **Accuracy**: Maintains high accuracy despite speed optimizations
|
| 173 |
+
|
| 174 |
+
## Security & CORS
|
| 175 |
+
|
| 176 |
+
The API is configured with CORS enabled for all origins (`*`). This allows any frontend application to make requests to the API. For production use, consider restricting CORS origins to specific domains.
|
| 177 |
+
|
| 178 |
+
## Troubleshooting
|
| 179 |
+
|
| 180 |
+
### Common Issues
|
| 181 |
+
|
| 182 |
+
1. **Model Loading Time**: First request may take longer as the model loads
|
| 183 |
+
2. **File Size Limits**: Large files may timeout (5-minute limit)
|
| 184 |
+
3. **Format Support**: Ensure your file format is supported
|
| 185 |
+
4. **Network Issues**: Check your internet connection and API endpoint
|
| 186 |
+
|
| 187 |
+
### Error Messages
|
| 188 |
+
|
| 189 |
+
- `"Model not loaded"`: Wait a moment and try again
|
| 190 |
+
- `"File not found"`: Check file path and permissions
|
| 191 |
+
- `"Transcription failed"`: File may be corrupted or unsupported format
|
| 192 |
+
- `"Network error"`: Check internet connection and endpoint URL
|
| 193 |
+
|
| 194 |
+
## Monitoring
|
| 195 |
+
|
| 196 |
+
Use the `/health` endpoint to monitor your API:
|
| 197 |
+
```bash
|
| 198 |
+
curl https://your-space.hf.space/health
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
## Contributing
|
| 202 |
+
|
| 203 |
+
Feel free to submit issues and enhancement requests!
|
| 204 |
+
|
| 205 |
+
## License
|
| 206 |
+
|
| 207 |
+
This project is licensed under the MIT License.
|
| 208 |
+
|
| 209 |
+
## Acknowledgments
|
| 210 |
+
|
| 211 |
+
- OpenAI for the Whisper model
|
| 212 |
+
- Hugging Face for the Spaces platform
|
| 213 |
+
- The open-source community for various libraries used
|
app.py
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import spaces
|
| 3 |
+
import torch
|
| 4 |
+
import tempfile
|
| 5 |
+
import os
|
| 6 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
|
| 7 |
+
from fastapi import FastAPI
|
| 8 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 9 |
+
import uvicorn
|
| 10 |
+
|
| 11 |
+
# Initialize the model and processor globally
|
| 12 |
+
model_id = "openai/whisper-large-v3-turbo"
|
| 13 |
+
model = None
|
| 14 |
+
processor = None
|
| 15 |
+
pipe = None
|
| 16 |
+
|
| 17 |
+
@spaces.GPU
|
| 18 |
+
def load_model():
|
| 19 |
+
"""Load the Whisper model on GPU"""
|
| 20 |
+
global model, processor, pipe
|
| 21 |
+
|
| 22 |
+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
| 23 |
+
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
|
| 24 |
+
|
| 25 |
+
print(f"Loading model on device: {device}")
|
| 26 |
+
|
| 27 |
+
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
| 28 |
+
model_id,
|
| 29 |
+
torch_dtype=torch_dtype,
|
| 30 |
+
low_cpu_mem_usage=True,
|
| 31 |
+
use_safetensors=True
|
| 32 |
+
)
|
| 33 |
+
model.to(device)
|
| 34 |
+
|
| 35 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
| 36 |
+
|
| 37 |
+
pipe = pipeline(
|
| 38 |
+
"automatic-speech-recognition",
|
| 39 |
+
model=model,
|
| 40 |
+
tokenizer=processor.tokenizer,
|
| 41 |
+
feature_extractor=processor.feature_extractor,
|
| 42 |
+
torch_dtype=torch_dtype,
|
| 43 |
+
device=device,
|
| 44 |
+
)
|
| 45 |
+
|
| 46 |
+
print("Model loaded successfully!")
|
| 47 |
+
return True
|
| 48 |
+
|
| 49 |
+
@spaces.GPU
|
| 50 |
+
def transcribe_audio(audio_file):
|
| 51 |
+
"""Transcribe audio file using Whisper"""
|
| 52 |
+
global pipe
|
| 53 |
+
|
| 54 |
+
if pipe is None:
|
| 55 |
+
return {"error": "Model not loaded. Please wait and try again."}
|
| 56 |
+
|
| 57 |
+
try:
|
| 58 |
+
# Handle different audio file formats
|
| 59 |
+
if isinstance(audio_file, str):
|
| 60 |
+
# File path
|
| 61 |
+
result = pipe(audio_file)
|
| 62 |
+
else:
|
| 63 |
+
# File object
|
| 64 |
+
result = pipe(audio_file.name)
|
| 65 |
+
|
| 66 |
+
return {
|
| 67 |
+
"text": result["text"],
|
| 68 |
+
"success": True
|
| 69 |
+
}
|
| 70 |
+
except Exception as e:
|
| 71 |
+
return {
|
| 72 |
+
"error": f"Transcription failed: {str(e)}",
|
| 73 |
+
"success": False
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
# Load model on startup
|
| 77 |
+
load_model()
|
| 78 |
+
|
| 79 |
+
# Create Gradio interface
|
| 80 |
+
def gradio_transcribe(audio_file):
|
| 81 |
+
"""Gradio interface for transcription"""
|
| 82 |
+
if audio_file is None:
|
| 83 |
+
return "Please upload an audio file."
|
| 84 |
+
|
| 85 |
+
result = transcribe_audio(audio_file)
|
| 86 |
+
|
| 87 |
+
if result.get("success"):
|
| 88 |
+
return result["text"]
|
| 89 |
+
else:
|
| 90 |
+
return f"Error: {result.get('error', 'Unknown error')}"
|
| 91 |
+
|
| 92 |
+
# Create the Gradio interface
|
| 93 |
+
demo = gr.Interface(
|
| 94 |
+
fn=gradio_transcribe,
|
| 95 |
+
inputs=gr.Audio(
|
| 96 |
+
sources=["upload", "microphone"],
|
| 97 |
+
type="filepath",
|
| 98 |
+
label="Upload Audio File or Record"
|
| 99 |
+
),
|
| 100 |
+
outputs=gr.Textbox(
|
| 101 |
+
label="Transcription",
|
| 102 |
+
lines=10,
|
| 103 |
+
placeholder="Transcribed text will appear here..."
|
| 104 |
+
),
|
| 105 |
+
title="Whisper Large V3 Turbo - Speech Recognition",
|
| 106 |
+
description="Upload an audio file or record audio to get transcription using OpenAI's Whisper model.",
|
| 107 |
+
allow_flagging="never"
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
# Create FastAPI app for API endpoints
|
| 111 |
+
app = FastAPI(title="Whisper API", description="API for Whisper speech recognition")
|
| 112 |
+
|
| 113 |
+
# Add CORS middleware
|
| 114 |
+
app.add_middleware(
|
| 115 |
+
CORSMiddleware,
|
| 116 |
+
allow_origins=["*"], # Allow all origins
|
| 117 |
+
allow_credentials=True,
|
| 118 |
+
allow_methods=["*"],
|
| 119 |
+
allow_headers=["*"],
|
| 120 |
+
)
|
| 121 |
+
|
| 122 |
+
@app.post("/transcribe")
|
| 123 |
+
async def api_transcribe(file: gr.File):
|
| 124 |
+
"""API endpoint for transcription"""
|
| 125 |
+
if file is None:
|
| 126 |
+
return {"error": "No file provided", "success": False}
|
| 127 |
+
|
| 128 |
+
result = transcribe_audio(file)
|
| 129 |
+
return result
|
| 130 |
+
|
| 131 |
+
@app.get("/health")
|
| 132 |
+
async def health_check():
|
| 133 |
+
"""Health check endpoint"""
|
| 134 |
+
return {"status": "healthy", "model_loaded": pipe is not None}
|
| 135 |
+
|
| 136 |
+
# Mount Gradio app
|
| 137 |
+
app = gr.mount_gradio_app(app, demo, path="/")
|
| 138 |
+
|
| 139 |
+
if __name__ == "__main__":
|
| 140 |
+
uvicorn.run(app, host="0.0.0.0", port=7860)
|
client.html
ADDED
|
@@ -0,0 +1,460 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Whisper API Client</title>
|
| 7 |
+
<style>
|
| 8 |
+
* {
|
| 9 |
+
margin: 0;
|
| 10 |
+
padding: 0;
|
| 11 |
+
box-sizing: border-box;
|
| 12 |
+
}
|
| 13 |
+
|
| 14 |
+
body {
|
| 15 |
+
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
| 16 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 17 |
+
min-height: 100vh;
|
| 18 |
+
display: flex;
|
| 19 |
+
align-items: center;
|
| 20 |
+
justify-content: center;
|
| 21 |
+
padding: 20px;
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
.container {
|
| 25 |
+
background: white;
|
| 26 |
+
border-radius: 20px;
|
| 27 |
+
box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
|
| 28 |
+
padding: 40px;
|
| 29 |
+
max-width: 600px;
|
| 30 |
+
width: 100%;
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
.header {
|
| 34 |
+
text-align: center;
|
| 35 |
+
margin-bottom: 30px;
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
.header h1 {
|
| 39 |
+
color: #333;
|
| 40 |
+
font-size: 2.5em;
|
| 41 |
+
margin-bottom: 10px;
|
| 42 |
+
}
|
| 43 |
+
|
| 44 |
+
.header p {
|
| 45 |
+
color: #666;
|
| 46 |
+
font-size: 1.1em;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
.upload-area {
|
| 50 |
+
border: 3px dashed #ddd;
|
| 51 |
+
border-radius: 15px;
|
| 52 |
+
padding: 40px;
|
| 53 |
+
text-align: center;
|
| 54 |
+
margin-bottom: 30px;
|
| 55 |
+
transition: all 0.3s ease;
|
| 56 |
+
cursor: pointer;
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
.upload-area:hover {
|
| 60 |
+
border-color: #667eea;
|
| 61 |
+
background-color: #f8f9ff;
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
.upload-area.dragover {
|
| 65 |
+
border-color: #667eea;
|
| 66 |
+
background-color: #f0f4ff;
|
| 67 |
+
transform: scale(1.02);
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
.upload-icon {
|
| 71 |
+
font-size: 3em;
|
| 72 |
+
color: #667eea;
|
| 73 |
+
margin-bottom: 20px;
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
.upload-text {
|
| 77 |
+
font-size: 1.2em;
|
| 78 |
+
color: #333;
|
| 79 |
+
margin-bottom: 10px;
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
.upload-subtext {
|
| 83 |
+
color: #666;
|
| 84 |
+
font-size: 0.9em;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.file-input {
|
| 88 |
+
display: none;
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
.file-info {
|
| 92 |
+
background: #f8f9fa;
|
| 93 |
+
border-radius: 10px;
|
| 94 |
+
padding: 20px;
|
| 95 |
+
margin-bottom: 20px;
|
| 96 |
+
display: none;
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
.file-info.show {
|
| 100 |
+
display: block;
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
.file-name {
|
| 104 |
+
font-weight: bold;
|
| 105 |
+
color: #333;
|
| 106 |
+
margin-bottom: 5px;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.file-size {
|
| 110 |
+
color: #666;
|
| 111 |
+
font-size: 0.9em;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
.button-group {
|
| 115 |
+
display: flex;
|
| 116 |
+
gap: 15px;
|
| 117 |
+
margin-bottom: 30px;
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
.btn {
|
| 121 |
+
flex: 1;
|
| 122 |
+
padding: 15px 30px;
|
| 123 |
+
border: none;
|
| 124 |
+
border-radius: 10px;
|
| 125 |
+
font-size: 1.1em;
|
| 126 |
+
font-weight: bold;
|
| 127 |
+
cursor: pointer;
|
| 128 |
+
transition: all 0.3s ease;
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
.btn-primary {
|
| 132 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 133 |
+
color: white;
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
.btn-primary:hover {
|
| 137 |
+
transform: translateY(-2px);
|
| 138 |
+
box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
.btn-primary:disabled {
|
| 142 |
+
background: #ccc;
|
| 143 |
+
cursor: not-allowed;
|
| 144 |
+
transform: none;
|
| 145 |
+
box-shadow: none;
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.btn-secondary {
|
| 149 |
+
background: #f8f9fa;
|
| 150 |
+
color: #333;
|
| 151 |
+
border: 2px solid #ddd;
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
.btn-secondary:hover {
|
| 155 |
+
background: #e9ecef;
|
| 156 |
+
border-color: #adb5bd;
|
| 157 |
+
}
|
| 158 |
+
|
| 159 |
+
.result-area {
|
| 160 |
+
background: #f8f9fa;
|
| 161 |
+
border-radius: 15px;
|
| 162 |
+
padding: 25px;
|
| 163 |
+
min-height: 150px;
|
| 164 |
+
display: none;
|
| 165 |
+
}
|
| 166 |
+
|
| 167 |
+
.result-area.show {
|
| 168 |
+
display: block;
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
.result-title {
|
| 172 |
+
font-weight: bold;
|
| 173 |
+
color: #333;
|
| 174 |
+
margin-bottom: 15px;
|
| 175 |
+
font-size: 1.2em;
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
.result-text {
|
| 179 |
+
color: #555;
|
| 180 |
+
line-height: 1.6;
|
| 181 |
+
white-space: pre-wrap;
|
| 182 |
+
word-wrap: break-word;
|
| 183 |
+
}
|
| 184 |
+
|
| 185 |
+
.loading {
|
| 186 |
+
display: none;
|
| 187 |
+
text-align: center;
|
| 188 |
+
padding: 20px;
|
| 189 |
+
}
|
| 190 |
+
|
| 191 |
+
.loading.show {
|
| 192 |
+
display: block;
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
.spinner {
|
| 196 |
+
border: 4px solid #f3f3f3;
|
| 197 |
+
border-top: 4px solid #667eea;
|
| 198 |
+
border-radius: 50%;
|
| 199 |
+
width: 40px;
|
| 200 |
+
height: 40px;
|
| 201 |
+
animation: spin 1s linear infinite;
|
| 202 |
+
margin: 0 auto 15px;
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
@keyframes spin {
|
| 206 |
+
0% { transform: rotate(0deg); }
|
| 207 |
+
100% { transform: rotate(360deg); }
|
| 208 |
+
}
|
| 209 |
+
|
| 210 |
+
.error {
|
| 211 |
+
background: #f8d7da;
|
| 212 |
+
color: #721c24;
|
| 213 |
+
border: 1px solid #f5c6cb;
|
| 214 |
+
border-radius: 10px;
|
| 215 |
+
padding: 15px;
|
| 216 |
+
margin-top: 20px;
|
| 217 |
+
display: none;
|
| 218 |
+
}
|
| 219 |
+
|
| 220 |
+
.error.show {
|
| 221 |
+
display: block;
|
| 222 |
+
}
|
| 223 |
+
|
| 224 |
+
.endpoint-config {
|
| 225 |
+
background: #f8f9fa;
|
| 226 |
+
border-radius: 10px;
|
| 227 |
+
padding: 20px;
|
| 228 |
+
margin-bottom: 20px;
|
| 229 |
+
}
|
| 230 |
+
|
| 231 |
+
.endpoint-config label {
|
| 232 |
+
display: block;
|
| 233 |
+
font-weight: bold;
|
| 234 |
+
color: #333;
|
| 235 |
+
margin-bottom: 8px;
|
| 236 |
+
}
|
| 237 |
+
|
| 238 |
+
.endpoint-config input {
|
| 239 |
+
width: 100%;
|
| 240 |
+
padding: 12px;
|
| 241 |
+
border: 2px solid #ddd;
|
| 242 |
+
border-radius: 8px;
|
| 243 |
+
font-size: 1em;
|
| 244 |
+
transition: border-color 0.3s ease;
|
| 245 |
+
}
|
| 246 |
+
|
| 247 |
+
.endpoint-config input:focus {
|
| 248 |
+
outline: none;
|
| 249 |
+
border-color: #667eea;
|
| 250 |
+
}
|
| 251 |
+
</style>
|
| 252 |
+
</head>
|
| 253 |
+
<body>
|
| 254 |
+
<div class="container">
|
| 255 |
+
<div class="header">
|
| 256 |
+
<h1>🎤 Whisper API Client</h1>
|
| 257 |
+
<p>Upload audio/video files for transcription using OpenAI's Whisper Large V3 Turbo</p>
|
| 258 |
+
</div>
|
| 259 |
+
|
| 260 |
+
<div class="endpoint-config">
|
| 261 |
+
<label for="endpoint">API Endpoint:</label>
|
| 262 |
+
<input type="text" id="endpoint" value="https://binkhoale1812-whisperapi.hf.space" placeholder="Enter your Hugging Face Space URL">
|
| 263 |
+
</div>
|
| 264 |
+
|
| 265 |
+
<div class="upload-area" id="uploadArea">
|
| 266 |
+
<div class="upload-icon">📁</div>
|
| 267 |
+
<div class="upload-text">Click to upload or drag & drop</div>
|
| 268 |
+
<div class="upload-subtext">Supports MP3, WAV, FLAC, M4A, MP4, AVI, MOV files</div>
|
| 269 |
+
<input type="file" id="fileInput" class="file-input" accept="audio/*,video/*">
|
| 270 |
+
</div>
|
| 271 |
+
|
| 272 |
+
<div class="file-info" id="fileInfo">
|
| 273 |
+
<div class="file-name" id="fileName"></div>
|
| 274 |
+
<div class="file-size" id="fileSize"></div>
|
| 275 |
+
</div>
|
| 276 |
+
|
| 277 |
+
<div class="button-group">
|
| 278 |
+
<button class="btn btn-primary" id="transcribeBtn" disabled>Transcribe</button>
|
| 279 |
+
<button class="btn btn-secondary" id="clearBtn">Clear</button>
|
| 280 |
+
</div>
|
| 281 |
+
|
| 282 |
+
<div class="loading" id="loading">
|
| 283 |
+
<div class="spinner"></div>
|
| 284 |
+
<div>Processing your audio file...</div>
|
| 285 |
+
</div>
|
| 286 |
+
|
| 287 |
+
<div class="result-area" id="resultArea">
|
| 288 |
+
<div class="result-title">Transcription Result:</div>
|
| 289 |
+
<div class="result-text" id="resultText"></div>
|
| 290 |
+
</div>
|
| 291 |
+
|
| 292 |
+
<div class="error" id="errorArea">
|
| 293 |
+
<div id="errorText"></div>
|
| 294 |
+
</div>
|
| 295 |
+
</div>
|
| 296 |
+
|
| 297 |
+
<script>
|
| 298 |
+
const uploadArea = document.getElementById('uploadArea');
|
| 299 |
+
const fileInput = document.getElementById('fileInput');
|
| 300 |
+
const fileInfo = document.getElementById('fileInfo');
|
| 301 |
+
const fileName = document.getElementById('fileName');
|
| 302 |
+
const fileSize = document.getElementById('fileSize');
|
| 303 |
+
const transcribeBtn = document.getElementById('transcribeBtn');
|
| 304 |
+
const clearBtn = document.getElementById('clearBtn');
|
| 305 |
+
const loading = document.getElementById('loading');
|
| 306 |
+
const resultArea = document.getElementById('resultArea');
|
| 307 |
+
const resultText = document.getElementById('resultText');
|
| 308 |
+
const errorArea = document.getElementById('errorArea');
|
| 309 |
+
const errorText = document.getElementById('errorText');
|
| 310 |
+
const endpointInput = document.getElementById('endpoint');
|
| 311 |
+
|
| 312 |
+
let selectedFile = null;
|
| 313 |
+
|
| 314 |
+
// Upload area click handler
|
| 315 |
+
uploadArea.addEventListener('click', () => {
|
| 316 |
+
fileInput.click();
|
| 317 |
+
});
|
| 318 |
+
|
| 319 |
+
// File input change handler
|
| 320 |
+
fileInput.addEventListener('change', (e) => {
|
| 321 |
+
const file = e.target.files[0];
|
| 322 |
+
if (file) {
|
| 323 |
+
handleFileSelect(file);
|
| 324 |
+
}
|
| 325 |
+
});
|
| 326 |
+
|
| 327 |
+
// Drag and drop handlers
|
| 328 |
+
uploadArea.addEventListener('dragover', (e) => {
|
| 329 |
+
e.preventDefault();
|
| 330 |
+
uploadArea.classList.add('dragover');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
uploadArea.addEventListener('dragleave', () => {
|
| 334 |
+
uploadArea.classList.remove('dragover');
|
| 335 |
+
});
|
| 336 |
+
|
| 337 |
+
uploadArea.addEventListener('drop', (e) => {
|
| 338 |
+
e.preventDefault();
|
| 339 |
+
uploadArea.classList.remove('dragover');
|
| 340 |
+
const file = e.dataTransfer.files[0];
|
| 341 |
+
if (file) {
|
| 342 |
+
handleFileSelect(file);
|
| 343 |
+
}
|
| 344 |
+
});
|
| 345 |
+
|
| 346 |
+
// File selection handler
|
| 347 |
+
function handleFileSelect(file) {
|
| 348 |
+
selectedFile = file;
|
| 349 |
+
fileName.textContent = file.name;
|
| 350 |
+
fileSize.textContent = formatFileSize(file.size);
|
| 351 |
+
fileInfo.classList.add('show');
|
| 352 |
+
transcribeBtn.disabled = false;
|
| 353 |
+
hideError();
|
| 354 |
+
}
|
| 355 |
+
|
| 356 |
+
// Format file size
|
| 357 |
+
function formatFileSize(bytes) {
|
| 358 |
+
if (bytes === 0) return '0 Bytes';
|
| 359 |
+
const k = 1024;
|
| 360 |
+
const sizes = ['Bytes', 'KB', 'MB', 'GB'];
|
| 361 |
+
const i = Math.floor(Math.log(bytes) / Math.log(k));
|
| 362 |
+
return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
|
| 363 |
+
}
|
| 364 |
+
|
| 365 |
+
// Transcribe button handler
|
| 366 |
+
transcribeBtn.addEventListener('click', async () => {
|
| 367 |
+
if (!selectedFile) return;
|
| 368 |
+
|
| 369 |
+
const endpoint = endpointInput.value.trim();
|
| 370 |
+
if (!endpoint) {
|
| 371 |
+
showError('Please enter a valid API endpoint');
|
| 372 |
+
return;
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
showLoading();
|
| 376 |
+
hideError();
|
| 377 |
+
hideResult();
|
| 378 |
+
|
| 379 |
+
try {
|
| 380 |
+
const formData = new FormData();
|
| 381 |
+
formData.append('file', selectedFile);
|
| 382 |
+
|
| 383 |
+
const response = await fetch(`${endpoint}/transcribe`, {
|
| 384 |
+
method: 'POST',
|
| 385 |
+
body: formData,
|
| 386 |
+
});
|
| 387 |
+
|
| 388 |
+
const result = await response.json();
|
| 389 |
+
|
| 390 |
+
hideLoading();
|
| 391 |
+
|
| 392 |
+
if (result.success) {
|
| 393 |
+
showResult(result.text);
|
| 394 |
+
} else {
|
| 395 |
+
showError(result.error || 'Transcription failed');
|
| 396 |
+
}
|
| 397 |
+
} catch (error) {
|
| 398 |
+
hideLoading();
|
| 399 |
+
showError(`Network error: ${error.message}`);
|
| 400 |
+
}
|
| 401 |
+
});
|
| 402 |
+
|
| 403 |
+
// Clear button handler
|
| 404 |
+
clearBtn.addEventListener('click', () => {
|
| 405 |
+
selectedFile = null;
|
| 406 |
+
fileInput.value = '';
|
| 407 |
+
fileInfo.classList.remove('show');
|
| 408 |
+
transcribeBtn.disabled = true;
|
| 409 |
+
hideResult();
|
| 410 |
+
hideError();
|
| 411 |
+
});
|
| 412 |
+
|
| 413 |
+
// Utility functions
|
| 414 |
+
function showLoading() {
|
| 415 |
+
loading.classList.add('show');
|
| 416 |
+
transcribeBtn.disabled = true;
|
| 417 |
+
}
|
| 418 |
+
|
| 419 |
+
function hideLoading() {
|
| 420 |
+
loading.classList.remove('show');
|
| 421 |
+
transcribeBtn.disabled = false;
|
| 422 |
+
}
|
| 423 |
+
|
| 424 |
+
function showResult(text) {
|
| 425 |
+
resultText.textContent = text;
|
| 426 |
+
resultArea.classList.add('show');
|
| 427 |
+
}
|
| 428 |
+
|
| 429 |
+
function hideResult() {
|
| 430 |
+
resultArea.classList.remove('show');
|
| 431 |
+
}
|
| 432 |
+
|
| 433 |
+
function showError(message) {
|
| 434 |
+
errorText.textContent = message;
|
| 435 |
+
errorArea.classList.add('show');
|
| 436 |
+
}
|
| 437 |
+
|
| 438 |
+
function hideError() {
|
| 439 |
+
errorArea.classList.remove('show');
|
| 440 |
+
}
|
| 441 |
+
|
| 442 |
+
// Test endpoint connectivity on page load
|
| 443 |
+
window.addEventListener('load', async () => {
|
| 444 |
+
const endpoint = endpointInput.value.trim();
|
| 445 |
+
if (endpoint) {
|
| 446 |
+
try {
|
| 447 |
+
const response = await fetch(`${endpoint}/health`);
|
| 448 |
+
if (response.ok) {
|
| 449 |
+
console.log('API endpoint is accessible');
|
| 450 |
+
} else {
|
| 451 |
+
console.warn('API endpoint returned non-200 status');
|
| 452 |
+
}
|
| 453 |
+
} catch (error) {
|
| 454 |
+
console.warn('Could not reach API endpoint:', error.message);
|
| 455 |
+
}
|
| 456 |
+
}
|
| 457 |
+
});
|
| 458 |
+
</script>
|
| 459 |
+
</body>
|
| 460 |
+
</html>
|
client.py
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Whisper API Client - Python script to interact with the Hugging Face Space API
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import requests
|
| 7 |
+
import argparse
|
| 8 |
+
import os
|
| 9 |
+
import sys
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
class WhisperAPIClient:
|
| 13 |
+
def __init__(self, endpoint="https://binkhoale1812-whisperapi.hf.space"):
|
| 14 |
+
self.endpoint = endpoint.rstrip('/')
|
| 15 |
+
|
| 16 |
+
def transcribe(self, file_path):
|
| 17 |
+
"""
|
| 18 |
+
Transcribe an audio/video file using the Whisper API
|
| 19 |
+
|
| 20 |
+
Args:
|
| 21 |
+
file_path (str): Path to the audio/video file
|
| 22 |
+
|
| 23 |
+
Returns:
|
| 24 |
+
dict: Response containing transcription text or error
|
| 25 |
+
"""
|
| 26 |
+
if not os.path.exists(file_path):
|
| 27 |
+
return {"error": f"File not found: {file_path}", "success": False}
|
| 28 |
+
|
| 29 |
+
try:
|
| 30 |
+
with open(file_path, 'rb') as f:
|
| 31 |
+
files = {'file': (os.path.basename(file_path), f, 'audio/mpeg')}
|
| 32 |
+
|
| 33 |
+
response = requests.post(
|
| 34 |
+
f"{self.endpoint}/transcribe",
|
| 35 |
+
files=files,
|
| 36 |
+
timeout=300 # 5 minutes timeout for large files
|
| 37 |
+
)
|
| 38 |
+
|
| 39 |
+
if response.status_code == 200:
|
| 40 |
+
return response.json()
|
| 41 |
+
else:
|
| 42 |
+
return {
|
| 43 |
+
"error": f"HTTP {response.status_code}: {response.text}",
|
| 44 |
+
"success": False
|
| 45 |
+
}
|
| 46 |
+
|
| 47 |
+
except requests.exceptions.RequestException as e:
|
| 48 |
+
return {"error": f"Request failed: {str(e)}", "success": False}
|
| 49 |
+
except Exception as e:
|
| 50 |
+
return {"error": f"Unexpected error: {str(e)}", "success": False}
|
| 51 |
+
|
| 52 |
+
def health_check(self):
|
| 53 |
+
"""
|
| 54 |
+
Check if the API endpoint is healthy
|
| 55 |
+
|
| 56 |
+
Returns:
|
| 57 |
+
dict: Health status
|
| 58 |
+
"""
|
| 59 |
+
try:
|
| 60 |
+
response = requests.get(f"{self.endpoint}/health", timeout=10)
|
| 61 |
+
if response.status_code == 200:
|
| 62 |
+
return response.json()
|
| 63 |
+
else:
|
| 64 |
+
return {"status": "unhealthy", "error": f"HTTP {response.status_code}"}
|
| 65 |
+
except requests.exceptions.RequestException as e:
|
| 66 |
+
return {"status": "unhealthy", "error": str(e)}
|
| 67 |
+
|
| 68 |
+
def main():
|
| 69 |
+
parser = argparse.ArgumentParser(description="Whisper API Client")
|
| 70 |
+
parser.add_argument("file", nargs="?", help="Audio/video file to transcribe")
|
| 71 |
+
parser.add_argument("--endpoint", default="https://binkhoale1812-whisperapi.hf.space",
|
| 72 |
+
help="API endpoint URL")
|
| 73 |
+
parser.add_argument("--health", action="store_true", help="Check API health")
|
| 74 |
+
parser.add_argument("--output", "-o", help="Output file for transcription")
|
| 75 |
+
|
| 76 |
+
args = parser.parse_args()
|
| 77 |
+
|
| 78 |
+
client = WhisperAPIClient(args.endpoint)
|
| 79 |
+
|
| 80 |
+
# Health check
|
| 81 |
+
if args.health:
|
| 82 |
+
print("Checking API health...")
|
| 83 |
+
health = client.health_check()
|
| 84 |
+
print(f"Status: {health.get('status', 'unknown')}")
|
| 85 |
+
if 'error' in health:
|
| 86 |
+
print(f"Error: {health['error']}")
|
| 87 |
+
return
|
| 88 |
+
|
| 89 |
+
# Transcribe file
|
| 90 |
+
if not args.file:
|
| 91 |
+
print("Error: Please provide a file to transcribe")
|
| 92 |
+
print("Usage: python client.py <file> [options]")
|
| 93 |
+
print(" python client.py --health")
|
| 94 |
+
sys.exit(1)
|
| 95 |
+
|
| 96 |
+
print(f"Transcribing: {args.file}")
|
| 97 |
+
print("This may take a while for large files...")
|
| 98 |
+
|
| 99 |
+
result = client.transcribe(args.file)
|
| 100 |
+
|
| 101 |
+
if result.get("success"):
|
| 102 |
+
transcription = result["text"]
|
| 103 |
+
print("\n" + "="*50)
|
| 104 |
+
print("TRANSCRIPTION RESULT:")
|
| 105 |
+
print("="*50)
|
| 106 |
+
print(transcription)
|
| 107 |
+
print("="*50)
|
| 108 |
+
|
| 109 |
+
# Save to file if requested
|
| 110 |
+
if args.output:
|
| 111 |
+
with open(args.output, 'w', encoding='utf-8') as f:
|
| 112 |
+
f.write(transcription)
|
| 113 |
+
print(f"\nTranscription saved to: {args.output}")
|
| 114 |
+
else:
|
| 115 |
+
print(f"Error: {result.get('error', 'Unknown error')}")
|
| 116 |
+
sys.exit(1)
|
| 117 |
+
|
| 118 |
+
if __name__ == "__main__":
|
| 119 |
+
main()
|
client_requirements.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
requests>=2.31.0
|
| 2 |
+
pathlib2>=2.3.7
|
requirements.txt
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=4.0.0
|
| 2 |
+
transformers>=4.35.0
|
| 3 |
+
torch>=2.0.0
|
| 4 |
+
torchaudio>=2.0.0
|
| 5 |
+
accelerate>=0.20.0
|
| 6 |
+
datasets[audio]>=2.14.0
|
| 7 |
+
fastapi>=0.100.0
|
| 8 |
+
uvicorn>=0.20.0
|
| 9 |
+
python-multipart>=0.0.6
|
| 10 |
+
librosa>=0.10.0
|
| 11 |
+
soundfile>=0.12.0
|