-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2501.08313
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs
Paper • 2412.05819 • Published -
Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration
Paper • 2501.05179 • Published
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 315 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210
-
MiniMaxAI/MiniMax-Text-01-hf
Text Generation • 456B • Updated • 10.6k • 8 -
MiniMaxAI/MiniMax-M1-80k-hf
Text Generation • 456B • Updated • 77 • 6 -
MiniMaxAI/MiniMax-M1-40k-hf
Text Generation • 456B • Updated • 70 • 10 -
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 1.59k • 651
-
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 1.59k • 651 -
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text • 456B • Updated • 90.8k • 280 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
MiniMaxText01
💬117Generate responses to text and images in a chat interface
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 7 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Agent-Ark/Toucan-1.5M
Viewer • Updated • 1.65M • 9.99k • 183 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 2.11k • 543 -
Salesforce/Webscale-RL
Viewer • Updated • 1.11M • 939 • 81
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 8 • 1 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 125k • 209 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 55.6k • 432 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 15.5k • 125
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs
Paper • 2412.05819 • Published -
Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration
Paper • 2501.05179 • Published
-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 7 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 628 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 315 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Agent-Ark/Toucan-1.5M
Viewer • Updated • 1.65M • 9.99k • 183 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 2.11k • 543 -
Salesforce/Webscale-RL
Viewer • Updated • 1.11M • 939 • 81
-
MiniMaxAI/MiniMax-Text-01-hf
Text Generation • 456B • Updated • 10.6k • 8 -
MiniMaxAI/MiniMax-M1-80k-hf
Text Generation • 456B • Updated • 77 • 6 -
MiniMaxAI/MiniMax-M1-40k-hf
Text Generation • 456B • Updated • 70 • 10 -
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 1.59k • 651
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 8 • 1 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 125k • 209 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 55.6k • 432 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 15.5k • 125
-
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 1.59k • 651 -
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text • 456B • Updated • 90.8k • 280 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
MiniMaxText01
💬117Generate responses to text and images in a chat interface
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144