LiamKhoaLe commited on
Commit
b9ed1c8
·
0 Parent(s):

Initial commit

Browse files
Files changed (8) hide show
  1. .gitattributes +35 -0
  2. DEPLOYMENT.md +151 -0
  3. README.md +213 -0
  4. app.py +140 -0
  5. client.html +460 -0
  6. client.py +119 -0
  7. client_requirements.txt +2 -0
  8. requirements.txt +11 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
DEPLOYMENT.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide
2
+
3
+ ## 🚀 Deploying to Hugging Face Spaces
4
+
5
+ ### Step 1: Create a New Space
6
+
7
+ 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
8
+ 2. Click "Create new Space"
9
+ 3. Fill in the details:
10
+ - **Space name**: `whisper-api` (or your preferred name)
11
+ - **License**: MIT
12
+ - **SDK**: Docker
13
+ - **Hardware**: **ZeroGPU** (this is crucial!)
14
+ - **Visibility**: Public or Private
15
+
16
+ ### Step 2: Upload Files
17
+
18
+ Upload these files to your Space:
19
+
20
+ 1. **app.py** - Main application file
21
+ 2. **requirements.txt** - Python dependencies
22
+ 3. **README.md** - Space documentation (optional)
23
+
24
+ ### Step 3: Configure Space Settings
25
+
26
+ In your Space settings, ensure:
27
+ - **Hardware**: ZeroGPU is selected
28
+ - **Environment variables**: None required
29
+ - **Secrets**: None required
30
+
31
+ ### Step 4: Deploy
32
+
33
+ The Space will automatically build and deploy. This process may take 5-10 minutes.
34
+
35
+ ### Step 5: Test Your Deployment
36
+
37
+ Once deployed, test your API:
38
+
39
+ ```bash
40
+ # Health check
41
+ curl https://your-username-whisper-api.hf.space/health
42
+
43
+ # Test transcription (replace with your actual endpoint)
44
+ curl -X POST "https://your-username-whisper-api.hf.space/transcribe" \
45
+ -F "file=@test_audio.mp3"
46
+ ```
47
+
48
+ ## 🔧 Local Testing
49
+
50
+ ### Test the Web Client
51
+
52
+ 1. Open `client.html` in your browser
53
+ 2. Update the endpoint URL to your Space URL
54
+ 3. Upload an audio file and test transcription
55
+
56
+ ### Test the Python Client
57
+
58
+ ```bash
59
+ # Install dependencies
60
+ pip install -r client_requirements.txt
61
+
62
+ # Test health
63
+ python client.py --health --endpoint https://your-username-whisper-api.hf.space
64
+
65
+ # Test transcription
66
+ python client.py audio.mp3 --endpoint https://your-username-whisper-api.hf.space
67
+ ```
68
+
69
+ ### Run the Test Suite
70
+
71
+ ```bash
72
+ # Install additional dependencies for testing
73
+ pip install numpy soundfile
74
+
75
+ # Run comprehensive tests
76
+ python test_api.py
77
+ ```
78
+
79
+ ## 🐛 Troubleshooting
80
+
81
+ ### Common Issues
82
+
83
+ 1. **Space fails to build**
84
+ - Check that all required files are uploaded
85
+ - Verify `requirements.txt` has correct dependencies
86
+ - Check Space logs for error messages
87
+
88
+ 2. **Model loading errors**
89
+ - Ensure ZeroGPU is enabled
90
+ - Check that the model ID is correct
91
+ - Verify internet connectivity for model download
92
+
93
+ 3. **API not responding**
94
+ - Check Space status (should be "Running")
95
+ - Verify the Space URL is correct
96
+ - Check Space logs for runtime errors
97
+
98
+ 4. **CORS errors**
99
+ - The app is configured for all origins (`*`)
100
+ - If you need specific origins, modify the CORS settings in `app.py`
101
+
102
+ ### Monitoring Your Space
103
+
104
+ - **Logs**: Check the "Logs" tab in your Space
105
+ - **Metrics**: Monitor CPU/GPU usage in the "Metrics" tab
106
+ - **Health**: Use the `/health` endpoint to check API status
107
+
108
+ ## 📊 Performance Optimization
109
+
110
+ ### For Better Performance
111
+
112
+ 1. **Use ZeroGPU**: Essential for fast inference
113
+ 2. **Optimize file sizes**: Smaller files process faster
114
+ 3. **Batch processing**: Use the batch_size parameter for multiple files
115
+ 4. **Model caching**: The model stays loaded between requests
116
+
117
+ ### Resource Limits
118
+
119
+ - **ZeroGPU**: Limited GPU time per month
120
+ - **Memory**: ~2GB RAM usage
121
+ - **Storage**: Model files (~1.6GB)
122
+ - **Timeout**: 5-minute request timeout
123
+
124
+ ## 🔄 Updates and Maintenance
125
+
126
+ ### Updating Your Space
127
+
128
+ 1. Edit files in your Space repository
129
+ 2. Commit changes
130
+ 3. The Space will automatically rebuild
131
+
132
+ ### Monitoring Usage
133
+
134
+ - Check your ZeroGPU usage in your HF account
135
+ - Monitor API calls and performance
136
+ - Set up alerts if needed
137
+
138
+ ## 📈 Scaling Considerations
139
+
140
+ ### For Production Use
141
+
142
+ 1. **Rate limiting**: Consider implementing rate limiting
143
+ 2. **Authentication**: Add API keys for production use
144
+ 3. **Monitoring**: Set up proper logging and monitoring
145
+ 4. **Backup**: Keep local copies of your code
146
+
147
+ ### Alternative Deployments
148
+
149
+ - **Self-hosted**: Deploy on your own infrastructure
150
+ - **Cloud providers**: Use AWS, GCP, or Azure
151
+ - **Docker**: Use the Dockerfile for containerized deployment
README.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Whisper Large V3 Turbo API
3
+ emoji: 🎤
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ short_description: Whisper LV3Turbo for ASR -API
10
+ ---
11
+
12
+ # Whisper API Project
13
+
14
+ ## Features
15
+
16
+ - **Hugging Face Space Deployment**: Deploy Whisper Large V3 Turbo with ZeroGPU
17
+ - **Fast API**: RESTful API endpoints for transcription
18
+ - **CORS Support**: Works with any frontend application
19
+ - **Web Interface**: Beautiful Gradio interface for easy testing
20
+ - **Local Clients**: Both web and Python clients for integration
21
+ - **Multiple Formats**: Supports audio and video files
22
+
23
+ ## Project Structure
24
+
25
+ ```
26
+ WhisperAPI/
27
+ ├── app.py # Main Hugging Face Space application
28
+ ├── requirements.txt # Python dependencies for HF Space
29
+ ├── README.md # HF Space documentation
30
+ ├── client.html # Web-based client interface
31
+ ├── client.py # Python command-line client
32
+ ├── client_requirements.txt # Python client dependencies
33
+ └── README.md # This file
34
+ ```
35
+
36
+ ## Setup Instructions
37
+
38
+ ### 1. Deploy to Hugging Face Spaces
39
+
40
+ 1. Create a new Space on [Hugging Face](https://huggingface.co/spaces)
41
+ 2. Choose "Docker" as the SDK
42
+ 3. Upload the following files to your Space:
43
+ - `app.py`
44
+ - `requirements.txt`
45
+ - `README.md` (for the Space)
46
+ 4. Set the Space to use **ZeroGPU** hardware
47
+ 5. The Space will automatically build and deploy
48
+
49
+ ### 2. Local Client Setup
50
+
51
+ #### Web Client
52
+ 1. Open `client.html` in your web browser
53
+ 2. Enter your Hugging Face Space URL (default: `https://binkhoale1812-whisperapi.hf.space`)
54
+ 3. Upload audio/video files and get transcriptions
55
+
56
+ #### Python Client
57
+ 1. Install dependencies:
58
+ ```bash
59
+ pip install -r client_requirements.txt
60
+ ```
61
+
62
+ 2. Use the command-line client:
63
+ ```bash
64
+ # Transcribe a file
65
+ python client.py audio.mp3
66
+
67
+ # Check API health
68
+ python client.py --health
69
+
70
+ # Save transcription to file
71
+ python client.py audio.mp3 --output transcription.txt
72
+
73
+ # Use custom endpoint
74
+ python client.py audio.mp3 --endpoint https://your-space.hf.space
75
+ ```
76
+
77
+ ## API Endpoints
78
+
79
+ ### POST /transcribe
80
+ Transcribe an audio/video file.
81
+
82
+ **Request:**
83
+ - Method: POST
84
+ - Content-Type: multipart/form-data
85
+ - Body: File upload
86
+
87
+ **Response:**
88
+ ```json
89
+ {
90
+ "text": "Transcribed text here...",
91
+ "success": true
92
+ }
93
+ ```
94
+
95
+ ### GET /health
96
+ Check API health status.
97
+
98
+ **Response:**
99
+ ```json
100
+ {
101
+ "status": "healthy",
102
+ "model_loaded": true
103
+ }
104
+ ```
105
+
106
+ ## Usage Examples
107
+
108
+ ### cURL Example
109
+ ```bash
110
+ curl -X POST "https://your-space.hf.space/transcribe" \
111
+ -F "file=@audio.mp3"
112
+ ```
113
+
114
+ ### Python Example
115
+ ```python
116
+ import requests
117
+
118
+ # Transcribe audio file
119
+ with open('audio.mp3', 'rb') as f:
120
+ files = {'file': ('audio.mp3', f, 'audio/mpeg')}
121
+ response = requests.post('https://your-space.hf.space/transcribe', files=files)
122
+ result = response.json()
123
+
124
+ if result['success']:
125
+ print(result['text'])
126
+ else:
127
+ print(f"Error: {result['error']}")
128
+ ```
129
+
130
+ ### JavaScript Example
131
+ ```javascript
132
+ const formData = new FormData();
133
+ formData.append('file', audioFile);
134
+
135
+ fetch('https://your-space.hf.space/transcribe', {
136
+ method: 'POST',
137
+ body: formData
138
+ })
139
+ .then(response => response.json())
140
+ .then(result => {
141
+ if (result.success) {
142
+ console.log(result.text);
143
+ } else {
144
+ console.error(result.error);
145
+ }
146
+ });
147
+ ```
148
+
149
+ ## Supported File Formats
150
+
151
+ ### Audio Formats
152
+ - MP3
153
+ - WAV
154
+ - FLAC
155
+ - M4A
156
+ - OGG
157
+
158
+ ### Video Formats
159
+ - MP4
160
+ - AVI
161
+ - MOV
162
+ - MKV
163
+
164
+ *Note: For video files, only the audio track will be processed.*
165
+
166
+ ## Performance Notes
167
+
168
+ - **Model**: Whisper Large V3 Turbo (809M parameters)
169
+ - **Speed**: ~4x faster than standard Large V3
170
+ - **GPU**: Uses ZeroGPU for efficient inference
171
+ - **Languages**: Supports 99 languages
172
+ - **Accuracy**: Maintains high accuracy despite speed optimizations
173
+
174
+ ## Security & CORS
175
+
176
+ The API is configured with CORS enabled for all origins (`*`). This allows any frontend application to make requests to the API. For production use, consider restricting CORS origins to specific domains.
177
+
178
+ ## Troubleshooting
179
+
180
+ ### Common Issues
181
+
182
+ 1. **Model Loading Time**: First request may take longer as the model loads
183
+ 2. **File Size Limits**: Large files may timeout (5-minute limit)
184
+ 3. **Format Support**: Ensure your file format is supported
185
+ 4. **Network Issues**: Check your internet connection and API endpoint
186
+
187
+ ### Error Messages
188
+
189
+ - `"Model not loaded"`: Wait a moment and try again
190
+ - `"File not found"`: Check file path and permissions
191
+ - `"Transcription failed"`: File may be corrupted or unsupported format
192
+ - `"Network error"`: Check internet connection and endpoint URL
193
+
194
+ ## Monitoring
195
+
196
+ Use the `/health` endpoint to monitor your API:
197
+ ```bash
198
+ curl https://your-space.hf.space/health
199
+ ```
200
+
201
+ ## Contributing
202
+
203
+ Feel free to submit issues and enhancement requests!
204
+
205
+ ## License
206
+
207
+ This project is licensed under the MIT License.
208
+
209
+ ## Acknowledgments
210
+
211
+ - OpenAI for the Whisper model
212
+ - Hugging Face for the Spaces platform
213
+ - The open-source community for various libraries used
app.py ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import spaces
3
+ import torch
4
+ import tempfile
5
+ import os
6
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
7
+ from fastapi import FastAPI
8
+ from fastapi.middleware.cors import CORSMiddleware
9
+ import uvicorn
10
+
11
+ # Initialize the model and processor globally
12
+ model_id = "openai/whisper-large-v3-turbo"
13
+ model = None
14
+ processor = None
15
+ pipe = None
16
+
17
+ @spaces.GPU
18
+ def load_model():
19
+ """Load the Whisper model on GPU"""
20
+ global model, processor, pipe
21
+
22
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
23
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
24
+
25
+ print(f"Loading model on device: {device}")
26
+
27
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
28
+ model_id,
29
+ torch_dtype=torch_dtype,
30
+ low_cpu_mem_usage=True,
31
+ use_safetensors=True
32
+ )
33
+ model.to(device)
34
+
35
+ processor = AutoProcessor.from_pretrained(model_id)
36
+
37
+ pipe = pipeline(
38
+ "automatic-speech-recognition",
39
+ model=model,
40
+ tokenizer=processor.tokenizer,
41
+ feature_extractor=processor.feature_extractor,
42
+ torch_dtype=torch_dtype,
43
+ device=device,
44
+ )
45
+
46
+ print("Model loaded successfully!")
47
+ return True
48
+
49
+ @spaces.GPU
50
+ def transcribe_audio(audio_file):
51
+ """Transcribe audio file using Whisper"""
52
+ global pipe
53
+
54
+ if pipe is None:
55
+ return {"error": "Model not loaded. Please wait and try again."}
56
+
57
+ try:
58
+ # Handle different audio file formats
59
+ if isinstance(audio_file, str):
60
+ # File path
61
+ result = pipe(audio_file)
62
+ else:
63
+ # File object
64
+ result = pipe(audio_file.name)
65
+
66
+ return {
67
+ "text": result["text"],
68
+ "success": True
69
+ }
70
+ except Exception as e:
71
+ return {
72
+ "error": f"Transcription failed: {str(e)}",
73
+ "success": False
74
+ }
75
+
76
+ # Load model on startup
77
+ load_model()
78
+
79
+ # Create Gradio interface
80
+ def gradio_transcribe(audio_file):
81
+ """Gradio interface for transcription"""
82
+ if audio_file is None:
83
+ return "Please upload an audio file."
84
+
85
+ result = transcribe_audio(audio_file)
86
+
87
+ if result.get("success"):
88
+ return result["text"]
89
+ else:
90
+ return f"Error: {result.get('error', 'Unknown error')}"
91
+
92
+ # Create the Gradio interface
93
+ demo = gr.Interface(
94
+ fn=gradio_transcribe,
95
+ inputs=gr.Audio(
96
+ sources=["upload", "microphone"],
97
+ type="filepath",
98
+ label="Upload Audio File or Record"
99
+ ),
100
+ outputs=gr.Textbox(
101
+ label="Transcription",
102
+ lines=10,
103
+ placeholder="Transcribed text will appear here..."
104
+ ),
105
+ title="Whisper Large V3 Turbo - Speech Recognition",
106
+ description="Upload an audio file or record audio to get transcription using OpenAI's Whisper model.",
107
+ allow_flagging="never"
108
+ )
109
+
110
+ # Create FastAPI app for API endpoints
111
+ app = FastAPI(title="Whisper API", description="API for Whisper speech recognition")
112
+
113
+ # Add CORS middleware
114
+ app.add_middleware(
115
+ CORSMiddleware,
116
+ allow_origins=["*"], # Allow all origins
117
+ allow_credentials=True,
118
+ allow_methods=["*"],
119
+ allow_headers=["*"],
120
+ )
121
+
122
+ @app.post("/transcribe")
123
+ async def api_transcribe(file: gr.File):
124
+ """API endpoint for transcription"""
125
+ if file is None:
126
+ return {"error": "No file provided", "success": False}
127
+
128
+ result = transcribe_audio(file)
129
+ return result
130
+
131
+ @app.get("/health")
132
+ async def health_check():
133
+ """Health check endpoint"""
134
+ return {"status": "healthy", "model_loaded": pipe is not None}
135
+
136
+ # Mount Gradio app
137
+ app = gr.mount_gradio_app(app, demo, path="/")
138
+
139
+ if __name__ == "__main__":
140
+ uvicorn.run(app, host="0.0.0.0", port=7860)
client.html ADDED
@@ -0,0 +1,460 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Whisper API Client</title>
7
+ <style>
8
+ * {
9
+ margin: 0;
10
+ padding: 0;
11
+ box-sizing: border-box;
12
+ }
13
+
14
+ body {
15
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
16
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
17
+ min-height: 100vh;
18
+ display: flex;
19
+ align-items: center;
20
+ justify-content: center;
21
+ padding: 20px;
22
+ }
23
+
24
+ .container {
25
+ background: white;
26
+ border-radius: 20px;
27
+ box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
28
+ padding: 40px;
29
+ max-width: 600px;
30
+ width: 100%;
31
+ }
32
+
33
+ .header {
34
+ text-align: center;
35
+ margin-bottom: 30px;
36
+ }
37
+
38
+ .header h1 {
39
+ color: #333;
40
+ font-size: 2.5em;
41
+ margin-bottom: 10px;
42
+ }
43
+
44
+ .header p {
45
+ color: #666;
46
+ font-size: 1.1em;
47
+ }
48
+
49
+ .upload-area {
50
+ border: 3px dashed #ddd;
51
+ border-radius: 15px;
52
+ padding: 40px;
53
+ text-align: center;
54
+ margin-bottom: 30px;
55
+ transition: all 0.3s ease;
56
+ cursor: pointer;
57
+ }
58
+
59
+ .upload-area:hover {
60
+ border-color: #667eea;
61
+ background-color: #f8f9ff;
62
+ }
63
+
64
+ .upload-area.dragover {
65
+ border-color: #667eea;
66
+ background-color: #f0f4ff;
67
+ transform: scale(1.02);
68
+ }
69
+
70
+ .upload-icon {
71
+ font-size: 3em;
72
+ color: #667eea;
73
+ margin-bottom: 20px;
74
+ }
75
+
76
+ .upload-text {
77
+ font-size: 1.2em;
78
+ color: #333;
79
+ margin-bottom: 10px;
80
+ }
81
+
82
+ .upload-subtext {
83
+ color: #666;
84
+ font-size: 0.9em;
85
+ }
86
+
87
+ .file-input {
88
+ display: none;
89
+ }
90
+
91
+ .file-info {
92
+ background: #f8f9fa;
93
+ border-radius: 10px;
94
+ padding: 20px;
95
+ margin-bottom: 20px;
96
+ display: none;
97
+ }
98
+
99
+ .file-info.show {
100
+ display: block;
101
+ }
102
+
103
+ .file-name {
104
+ font-weight: bold;
105
+ color: #333;
106
+ margin-bottom: 5px;
107
+ }
108
+
109
+ .file-size {
110
+ color: #666;
111
+ font-size: 0.9em;
112
+ }
113
+
114
+ .button-group {
115
+ display: flex;
116
+ gap: 15px;
117
+ margin-bottom: 30px;
118
+ }
119
+
120
+ .btn {
121
+ flex: 1;
122
+ padding: 15px 30px;
123
+ border: none;
124
+ border-radius: 10px;
125
+ font-size: 1.1em;
126
+ font-weight: bold;
127
+ cursor: pointer;
128
+ transition: all 0.3s ease;
129
+ }
130
+
131
+ .btn-primary {
132
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
133
+ color: white;
134
+ }
135
+
136
+ .btn-primary:hover {
137
+ transform: translateY(-2px);
138
+ box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
139
+ }
140
+
141
+ .btn-primary:disabled {
142
+ background: #ccc;
143
+ cursor: not-allowed;
144
+ transform: none;
145
+ box-shadow: none;
146
+ }
147
+
148
+ .btn-secondary {
149
+ background: #f8f9fa;
150
+ color: #333;
151
+ border: 2px solid #ddd;
152
+ }
153
+
154
+ .btn-secondary:hover {
155
+ background: #e9ecef;
156
+ border-color: #adb5bd;
157
+ }
158
+
159
+ .result-area {
160
+ background: #f8f9fa;
161
+ border-radius: 15px;
162
+ padding: 25px;
163
+ min-height: 150px;
164
+ display: none;
165
+ }
166
+
167
+ .result-area.show {
168
+ display: block;
169
+ }
170
+
171
+ .result-title {
172
+ font-weight: bold;
173
+ color: #333;
174
+ margin-bottom: 15px;
175
+ font-size: 1.2em;
176
+ }
177
+
178
+ .result-text {
179
+ color: #555;
180
+ line-height: 1.6;
181
+ white-space: pre-wrap;
182
+ word-wrap: break-word;
183
+ }
184
+
185
+ .loading {
186
+ display: none;
187
+ text-align: center;
188
+ padding: 20px;
189
+ }
190
+
191
+ .loading.show {
192
+ display: block;
193
+ }
194
+
195
+ .spinner {
196
+ border: 4px solid #f3f3f3;
197
+ border-top: 4px solid #667eea;
198
+ border-radius: 50%;
199
+ width: 40px;
200
+ height: 40px;
201
+ animation: spin 1s linear infinite;
202
+ margin: 0 auto 15px;
203
+ }
204
+
205
+ @keyframes spin {
206
+ 0% { transform: rotate(0deg); }
207
+ 100% { transform: rotate(360deg); }
208
+ }
209
+
210
+ .error {
211
+ background: #f8d7da;
212
+ color: #721c24;
213
+ border: 1px solid #f5c6cb;
214
+ border-radius: 10px;
215
+ padding: 15px;
216
+ margin-top: 20px;
217
+ display: none;
218
+ }
219
+
220
+ .error.show {
221
+ display: block;
222
+ }
223
+
224
+ .endpoint-config {
225
+ background: #f8f9fa;
226
+ border-radius: 10px;
227
+ padding: 20px;
228
+ margin-bottom: 20px;
229
+ }
230
+
231
+ .endpoint-config label {
232
+ display: block;
233
+ font-weight: bold;
234
+ color: #333;
235
+ margin-bottom: 8px;
236
+ }
237
+
238
+ .endpoint-config input {
239
+ width: 100%;
240
+ padding: 12px;
241
+ border: 2px solid #ddd;
242
+ border-radius: 8px;
243
+ font-size: 1em;
244
+ transition: border-color 0.3s ease;
245
+ }
246
+
247
+ .endpoint-config input:focus {
248
+ outline: none;
249
+ border-color: #667eea;
250
+ }
251
+ </style>
252
+ </head>
253
+ <body>
254
+ <div class="container">
255
+ <div class="header">
256
+ <h1>🎤 Whisper API Client</h1>
257
+ <p>Upload audio/video files for transcription using OpenAI's Whisper Large V3 Turbo</p>
258
+ </div>
259
+
260
+ <div class="endpoint-config">
261
+ <label for="endpoint">API Endpoint:</label>
262
+ <input type="text" id="endpoint" value="https://binkhoale1812-whisperapi.hf.space" placeholder="Enter your Hugging Face Space URL">
263
+ </div>
264
+
265
+ <div class="upload-area" id="uploadArea">
266
+ <div class="upload-icon">📁</div>
267
+ <div class="upload-text">Click to upload or drag & drop</div>
268
+ <div class="upload-subtext">Supports MP3, WAV, FLAC, M4A, MP4, AVI, MOV files</div>
269
+ <input type="file" id="fileInput" class="file-input" accept="audio/*,video/*">
270
+ </div>
271
+
272
+ <div class="file-info" id="fileInfo">
273
+ <div class="file-name" id="fileName"></div>
274
+ <div class="file-size" id="fileSize"></div>
275
+ </div>
276
+
277
+ <div class="button-group">
278
+ <button class="btn btn-primary" id="transcribeBtn" disabled>Transcribe</button>
279
+ <button class="btn btn-secondary" id="clearBtn">Clear</button>
280
+ </div>
281
+
282
+ <div class="loading" id="loading">
283
+ <div class="spinner"></div>
284
+ <div>Processing your audio file...</div>
285
+ </div>
286
+
287
+ <div class="result-area" id="resultArea">
288
+ <div class="result-title">Transcription Result:</div>
289
+ <div class="result-text" id="resultText"></div>
290
+ </div>
291
+
292
+ <div class="error" id="errorArea">
293
+ <div id="errorText"></div>
294
+ </div>
295
+ </div>
296
+
297
+ <script>
298
+ const uploadArea = document.getElementById('uploadArea');
299
+ const fileInput = document.getElementById('fileInput');
300
+ const fileInfo = document.getElementById('fileInfo');
301
+ const fileName = document.getElementById('fileName');
302
+ const fileSize = document.getElementById('fileSize');
303
+ const transcribeBtn = document.getElementById('transcribeBtn');
304
+ const clearBtn = document.getElementById('clearBtn');
305
+ const loading = document.getElementById('loading');
306
+ const resultArea = document.getElementById('resultArea');
307
+ const resultText = document.getElementById('resultText');
308
+ const errorArea = document.getElementById('errorArea');
309
+ const errorText = document.getElementById('errorText');
310
+ const endpointInput = document.getElementById('endpoint');
311
+
312
+ let selectedFile = null;
313
+
314
+ // Upload area click handler
315
+ uploadArea.addEventListener('click', () => {
316
+ fileInput.click();
317
+ });
318
+
319
+ // File input change handler
320
+ fileInput.addEventListener('change', (e) => {
321
+ const file = e.target.files[0];
322
+ if (file) {
323
+ handleFileSelect(file);
324
+ }
325
+ });
326
+
327
+ // Drag and drop handlers
328
+ uploadArea.addEventListener('dragover', (e) => {
329
+ e.preventDefault();
330
+ uploadArea.classList.add('dragover');
331
+ });
332
+
333
+ uploadArea.addEventListener('dragleave', () => {
334
+ uploadArea.classList.remove('dragover');
335
+ });
336
+
337
+ uploadArea.addEventListener('drop', (e) => {
338
+ e.preventDefault();
339
+ uploadArea.classList.remove('dragover');
340
+ const file = e.dataTransfer.files[0];
341
+ if (file) {
342
+ handleFileSelect(file);
343
+ }
344
+ });
345
+
346
+ // File selection handler
347
+ function handleFileSelect(file) {
348
+ selectedFile = file;
349
+ fileName.textContent = file.name;
350
+ fileSize.textContent = formatFileSize(file.size);
351
+ fileInfo.classList.add('show');
352
+ transcribeBtn.disabled = false;
353
+ hideError();
354
+ }
355
+
356
+ // Format file size
357
+ function formatFileSize(bytes) {
358
+ if (bytes === 0) return '0 Bytes';
359
+ const k = 1024;
360
+ const sizes = ['Bytes', 'KB', 'MB', 'GB'];
361
+ const i = Math.floor(Math.log(bytes) / Math.log(k));
362
+ return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
363
+ }
364
+
365
+ // Transcribe button handler
366
+ transcribeBtn.addEventListener('click', async () => {
367
+ if (!selectedFile) return;
368
+
369
+ const endpoint = endpointInput.value.trim();
370
+ if (!endpoint) {
371
+ showError('Please enter a valid API endpoint');
372
+ return;
373
+ }
374
+
375
+ showLoading();
376
+ hideError();
377
+ hideResult();
378
+
379
+ try {
380
+ const formData = new FormData();
381
+ formData.append('file', selectedFile);
382
+
383
+ const response = await fetch(`${endpoint}/transcribe`, {
384
+ method: 'POST',
385
+ body: formData,
386
+ });
387
+
388
+ const result = await response.json();
389
+
390
+ hideLoading();
391
+
392
+ if (result.success) {
393
+ showResult(result.text);
394
+ } else {
395
+ showError(result.error || 'Transcription failed');
396
+ }
397
+ } catch (error) {
398
+ hideLoading();
399
+ showError(`Network error: ${error.message}`);
400
+ }
401
+ });
402
+
403
+ // Clear button handler
404
+ clearBtn.addEventListener('click', () => {
405
+ selectedFile = null;
406
+ fileInput.value = '';
407
+ fileInfo.classList.remove('show');
408
+ transcribeBtn.disabled = true;
409
+ hideResult();
410
+ hideError();
411
+ });
412
+
413
+ // Utility functions
414
+ function showLoading() {
415
+ loading.classList.add('show');
416
+ transcribeBtn.disabled = true;
417
+ }
418
+
419
+ function hideLoading() {
420
+ loading.classList.remove('show');
421
+ transcribeBtn.disabled = false;
422
+ }
423
+
424
+ function showResult(text) {
425
+ resultText.textContent = text;
426
+ resultArea.classList.add('show');
427
+ }
428
+
429
+ function hideResult() {
430
+ resultArea.classList.remove('show');
431
+ }
432
+
433
+ function showError(message) {
434
+ errorText.textContent = message;
435
+ errorArea.classList.add('show');
436
+ }
437
+
438
+ function hideError() {
439
+ errorArea.classList.remove('show');
440
+ }
441
+
442
+ // Test endpoint connectivity on page load
443
+ window.addEventListener('load', async () => {
444
+ const endpoint = endpointInput.value.trim();
445
+ if (endpoint) {
446
+ try {
447
+ const response = await fetch(`${endpoint}/health`);
448
+ if (response.ok) {
449
+ console.log('API endpoint is accessible');
450
+ } else {
451
+ console.warn('API endpoint returned non-200 status');
452
+ }
453
+ } catch (error) {
454
+ console.warn('Could not reach API endpoint:', error.message);
455
+ }
456
+ }
457
+ });
458
+ </script>
459
+ </body>
460
+ </html>
client.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Whisper API Client - Python script to interact with the Hugging Face Space API
4
+ """
5
+
6
+ import requests
7
+ import argparse
8
+ import os
9
+ import sys
10
+ from pathlib import Path
11
+
12
+ class WhisperAPIClient:
13
+ def __init__(self, endpoint="https://binkhoale1812-whisperapi.hf.space"):
14
+ self.endpoint = endpoint.rstrip('/')
15
+
16
+ def transcribe(self, file_path):
17
+ """
18
+ Transcribe an audio/video file using the Whisper API
19
+
20
+ Args:
21
+ file_path (str): Path to the audio/video file
22
+
23
+ Returns:
24
+ dict: Response containing transcription text or error
25
+ """
26
+ if not os.path.exists(file_path):
27
+ return {"error": f"File not found: {file_path}", "success": False}
28
+
29
+ try:
30
+ with open(file_path, 'rb') as f:
31
+ files = {'file': (os.path.basename(file_path), f, 'audio/mpeg')}
32
+
33
+ response = requests.post(
34
+ f"{self.endpoint}/transcribe",
35
+ files=files,
36
+ timeout=300 # 5 minutes timeout for large files
37
+ )
38
+
39
+ if response.status_code == 200:
40
+ return response.json()
41
+ else:
42
+ return {
43
+ "error": f"HTTP {response.status_code}: {response.text}",
44
+ "success": False
45
+ }
46
+
47
+ except requests.exceptions.RequestException as e:
48
+ return {"error": f"Request failed: {str(e)}", "success": False}
49
+ except Exception as e:
50
+ return {"error": f"Unexpected error: {str(e)}", "success": False}
51
+
52
+ def health_check(self):
53
+ """
54
+ Check if the API endpoint is healthy
55
+
56
+ Returns:
57
+ dict: Health status
58
+ """
59
+ try:
60
+ response = requests.get(f"{self.endpoint}/health", timeout=10)
61
+ if response.status_code == 200:
62
+ return response.json()
63
+ else:
64
+ return {"status": "unhealthy", "error": f"HTTP {response.status_code}"}
65
+ except requests.exceptions.RequestException as e:
66
+ return {"status": "unhealthy", "error": str(e)}
67
+
68
+ def main():
69
+ parser = argparse.ArgumentParser(description="Whisper API Client")
70
+ parser.add_argument("file", nargs="?", help="Audio/video file to transcribe")
71
+ parser.add_argument("--endpoint", default="https://binkhoale1812-whisperapi.hf.space",
72
+ help="API endpoint URL")
73
+ parser.add_argument("--health", action="store_true", help="Check API health")
74
+ parser.add_argument("--output", "-o", help="Output file for transcription")
75
+
76
+ args = parser.parse_args()
77
+
78
+ client = WhisperAPIClient(args.endpoint)
79
+
80
+ # Health check
81
+ if args.health:
82
+ print("Checking API health...")
83
+ health = client.health_check()
84
+ print(f"Status: {health.get('status', 'unknown')}")
85
+ if 'error' in health:
86
+ print(f"Error: {health['error']}")
87
+ return
88
+
89
+ # Transcribe file
90
+ if not args.file:
91
+ print("Error: Please provide a file to transcribe")
92
+ print("Usage: python client.py <file> [options]")
93
+ print(" python client.py --health")
94
+ sys.exit(1)
95
+
96
+ print(f"Transcribing: {args.file}")
97
+ print("This may take a while for large files...")
98
+
99
+ result = client.transcribe(args.file)
100
+
101
+ if result.get("success"):
102
+ transcription = result["text"]
103
+ print("\n" + "="*50)
104
+ print("TRANSCRIPTION RESULT:")
105
+ print("="*50)
106
+ print(transcription)
107
+ print("="*50)
108
+
109
+ # Save to file if requested
110
+ if args.output:
111
+ with open(args.output, 'w', encoding='utf-8') as f:
112
+ f.write(transcription)
113
+ print(f"\nTranscription saved to: {args.output}")
114
+ else:
115
+ print(f"Error: {result.get('error', 'Unknown error')}")
116
+ sys.exit(1)
117
+
118
+ if __name__ == "__main__":
119
+ main()
client_requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ requests>=2.31.0
2
+ pathlib2>=2.3.7
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ transformers>=4.35.0
3
+ torch>=2.0.0
4
+ torchaudio>=2.0.0
5
+ accelerate>=0.20.0
6
+ datasets[audio]>=2.14.0
7
+ fastapi>=0.100.0
8
+ uvicorn>=0.20.0
9
+ python-multipart>=0.0.6
10
+ librosa>=0.10.0
11
+ soundfile>=0.12.0