Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

MonsterMMORPG

posted an update 1 day ago

Post

2156

Whisper-WebUI Premium - Ultra Fast and High Accuracy Speech to Text Transcripton App for All Languages - Windows, RunPod, Massed Compute 1-Click Installers - Supporting RTX 1000 to 5000 series

Latest installer zip file : https://www.patreon.com/posts/145395299

New Features

Password protected version, password is just 1 : WhisperWeb_UI_v1_password_is_1.zip

It has better interface, more features, default settings set for maximum accuracy

It will show transcription realtime both on Gradio interface and also on CMD

It will show better status and output at the cmd like starting time, starting file, etc

It will save every generated transcription properly with same name as input file name with proper name sanitization

After deep scan of the entire pipeline, default parameters are set for maximum accuracy and quality

1-Click installers for Windows local PC, RunPod (Linux-Cloud) and Massed Compute (Linux-Cloud)

The app the installers are made for RTX 1000 series to RTX 5000 series with pre-compiled libraries

We install with Torch 2.8, CUDA 12.9, latest Flash Attention, Sage Attention, xFormers - all precompiled

As low as 6 GB VRAM GPUs can use

OpenAI Whisper Supported Models:

tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, large-v3-turbo, turbo

Distil-Whisper Supported Models (Faster-Whisper & Insanely-Fast-Whisper):

distil-large-v2, distil-large-v3, distil-medium.en, distil-small.en

100 languages are supported

Kseniase

posted an update 3 days ago

Post

3779

15 Outstanding Research Papers from NeurIPS 2025

NeurIPS 2025, as a premier annual event in machine learning and computational neuroscience, tackles major topics like the future of AI, current research, and the most difficult challenges. While we’re not attending this year, we’re closely following the updates and today we pull together a quick, easy-to-digest roundup of a few standout papers so you can jump in without getting overwhelmed.

Here is a list of 15 papers from NeurIPS 2025, including 8 top research papers that received awards, along with 7 others that caught our attention:

1. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks → https://neurips.cc/virtual/2025/loc/san-diego/test-of-time/128328
Test of Time Award winner. Introduces the RPN, a small convnet that predicts objectness and boxes on shared features, enabling Faster R-CNN to share computation and run around 5 fps on a GPU

2. Artificial Hivemind: The Open-Ended Homogeneity of LMs (and Beyond) → https://neurips.cc/virtual/2025/loc/san-diego/poster/121421
Releases a huge open-ended prompt dataset, showing that LLMs often fall into an “artificial hivemind” – generate surprisingly similar answers – and measuring diversity collapse

3. Optimal Mistake Bounds for Transductive Online Learning → https://neurips.cc/virtual/2025/loc/san-diego/poster/119098
Settles a 30-year-old question by showing how much unlabeled data helps in online learning – it gives a precise quadratic advantage with tight matching bounds

4. Gated Attention for LLMs: Non-linearity, Sparsity, and Attention-Sink-Free → https://neurips.cc/virtual/2025/loc/san-diego/poster/120216
Demonstrates how gating actually affects attention: a simple sigmoid gate after Scaled Dot-Product Attention (SDPA) boosts performance, stability, and long-context behavior by adding useful nonlinearity and sparse modulation

Read further below ⬇️
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

sequelbox

posted an update 1 day ago

Post

2201

Two new releases today!

Firstly, our new Raiden-Mini dataset, powered by DeepSeek's newest deepseek-ai/DeepSeek-V3.2-Speciale model!
- A V3.2-Speciale reasoning showcase: the Raiden prompts test the model's creative, analytic, and general reasoning skills!
- HEAD TO HEAD: a comparison subset pits V3.2-Speciale against V3.2 with the same prompts, providing a direct look at each model's advantages!

Get the new Raiden-Mini dataset: sequelbox/Raiden-Mini-DeepSeek-V3.2-Speciale

On the model side, we've also brought Shining Valiant 3 to Ministral 3!
- Science-reasoning: sequelbox/Celestia3-DeepSeek-R1-0528 for physics, biology, chemistry, compsci, astronomy, Earth science, and information theory.
- AI to build AI: the sequelbox/Mitakihara-DeepSeek-R1-0528 dataset for high-quality reasoning performance on AI, MLOps, math and CUDA, complex adaptive and agentic systems, cognition, logic, linguistics, simulation, knowledge management, and more!
- Creative reasoning and general chat performance supplemented with sequelbox/Raiden-DeepSeek-R1

Get the newest SV3: ValiantLabs/Ministral-3-14B-Reasoning-2512-ShiningValiant3

Esper 3.1 is available for Ministral 3 as well: ValiantLabs/Ministral-3-14B-Reasoning-2512-Esper3.1

We're working hard on our next Big New Release, coming out in the next few weeks :)

Help support our releases, donations used for models and datasets: sequelbox/SupportOpenSource

Open source matters. Fight for it with us.

with love and friendship,
allegra

1 reply

melvindave

posted an update about 14 hours ago

Post

658

Currently having a blast learning the transformers library.

I noticed that model cards usually have Transformers code as usage examples.

So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.

Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.

Once I got the model loaded and sample inference working, I made an API to serve it.

I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!

Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090

Here's the result of my experimentation

6 replies

RakshitAralimatti

posted an update about 20 hours ago

Post

955

I built something crazy you never saw before.

Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag

A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.

Not just "what was discussed" – but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.

1 reply

sergiopaniego

posted an update about 23 hours ago

Post

1493

NEW: @EssentialAI just released Rnj-1, their first 8B model.

You can easily fine-tune it with GRPO using TRL to add reasoning capabilities to a compact mode

Free Colab link: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_rnj_1_instruct.ipynb

More free TRL notebooks: https://huggingface.co/docs/trl/main/en/example_overview#notebooks

mitkox

posted an update about 15 hours ago

Post

245

Got to 1199.8 tokens/sec with Devstral Small -2 on my desktop GPU workstation. vLLM nightly.
Works out of the box with Mistral Vibe. Next is time to test the big one.

2 replies

Juanxi

posted an update 1 day ago

Post

1904

ScalingOpt is continuously evolving! We are steadily expanding the Community section with new content. For our Blog, we've launched by featuring work from Jianlin Su and are actively translating insightful posts from scientific communities into English to share on ScalingOpt (we'll keep curating excellent community blogs and providing English versions alongside the originals).

We operate under the Creative Commons Attribution-NonCommercial principle, sharing knowledge freely and openly. We welcome your ideas, suggestions, and feedback to help shape ScalingOpt's future.

If you find this initiative valuable, please consider following and starring the project to show your support. Thank you!

2 replies

prithivMLmods

posted an update 1 day ago

Post

1367

Introducing the D.Markdown Experimental Models, Proxima and Epsilon OCR models, built on top of Qwen3-VL and Qwen2.5-VL respectively. Proxima is optimized for Markdown generation and is capable of embedding inline programming code snippets and generating rich nodes such as HTML, XML, JSON, and YAML. Epsilon is optimized for reconstructing complex layouts including tables, forms, and mathematical content. 🌌✨

● proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
● epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
● proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
● epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF

● Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
● Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

👉 These models are stage progression models, and currently they may contain artifacts.

To know more about it, visit the app page or the respective model page!

codelion

posted an update 2 days ago

Post

2391

Recently, Essential AI released a new 8B base model EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning -

"In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. "

This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training -
https://huggingface.co/blog/codelion/optimal-dataset-mixing

Recently active users