ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 298 • 11 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 63 • 4 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 2 • 6 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 9 • 5
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 3.43k • 58 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 555 • 15 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 412 • 8
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 73 • 18 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 408 • 22
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 1.11k • 263 microsoft/Phi-4-mini-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 9.45k • 213 microsoft/Phi-4-reasoning Text Generation • 15B • Updated Nov 24, 2025 • 5.83k • 215 microsoft/Phi-4-reasoning-plus Text Generation • 15B • Updated Nov 24, 2025 • 55.7k • 331
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 3.42k • 218 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 43.9k • 1.35k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 153 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 5.95k • 1.26k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 2.3k • 34 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 4.84k • 224 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 81
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 74 • 59 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.77k • 43 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 62 • 10 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 498 • 18
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.77k • 43
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 963 • 77 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 856k • • 22 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 130 • 17
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 963 • 77
microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 130 • 17
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 915k • 473 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 525k • 66 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 102k • 61 microsoft/layoutxlm-base Updated Sep 16, 2022 • 6.57k • 73
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 2.15k • 223 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 4.5k • 666
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 6.09k • 106 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 386 • 17 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 128 • 20
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 51 • 9 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 51 • 5
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated 4 days ago • 327k • 2.19k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 290k • 1.07k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 143
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated Jul 22, 2025 • 1.77B • 3.26k • 7 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated 9 days ago • 105 • 3 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated 9 days ago • 86 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated 9 days ago • 136
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 19k • 30 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 185 • 16 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 357 • • 66 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 1.21k • 50
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 417k • 950 microsoft/Phi-3.5-MoE-instruct Text Generation • 42B • Updated Dec 10, 2025 • 87.2k • 569 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • 4B • Updated Dec 10, 2025 • 791k • 724 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 1.32M • 1.37k
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 61 • 2 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 45 • 2 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 90 • 2
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • 671B • Updated Dec 15, 2025 • 220 • 293 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 609 • 24
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 76.3k • 822 Runtime error Featured 219 SpeechT5 Speech Synthesis Demo 👩 219 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 1.65k • 110
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 1.5M • 392 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 855k • 207 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 191k • 79 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 341 • 1
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 855k • 207
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 191k • 79
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 341 • 1
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 6.77k • 68 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 451 • 19 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 59.9k • 69 microsoft/radedit Updated Dec 8, 2025 • 27
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 18.2k • 121 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 45 • 5 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 52 • 33
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 707k • 1.74k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 378k • 333 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 21.4k • 378
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 9 microsoft/mocapact-data Updated Aug 17, 2024 • 33 • 4 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1
ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 298 • 11 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 63 • 4 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 2 • 6 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 9 • 5
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated 4 days ago • 327k • 2.19k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 290k • 1.07k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 143
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 3.43k • 58 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 555 • 15 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 412 • 8
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated Jul 22, 2025 • 1.77B • 3.26k • 7 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated 9 days ago • 105 • 3 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated 9 days ago • 86 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated 9 days ago • 136
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 73 • 18 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 408 • 22
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 19k • 30 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 185 • 16 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 357 • • 66 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 1.21k • 50
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 1.11k • 263 microsoft/Phi-4-mini-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 9.45k • 213 microsoft/Phi-4-reasoning Text Generation • 15B • Updated Nov 24, 2025 • 5.83k • 215 microsoft/Phi-4-reasoning-plus Text Generation • 15B • Updated Nov 24, 2025 • 55.7k • 331
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 417k • 950 microsoft/Phi-3.5-MoE-instruct Text Generation • 42B • Updated Dec 10, 2025 • 87.2k • 569 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • 4B • Updated Dec 10, 2025 • 791k • 724 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 1.32M • 1.37k
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 3.42k • 218 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 43.9k • 1.35k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 153 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 61 • 2 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 45 • 2 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 90 • 2
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 5.95k • 1.26k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 2.3k • 34 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 4.84k • 224 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 81
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • 671B • Updated Dec 15, 2025 • 220 • 293 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 609 • 24
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 74 • 59 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.77k • 43 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 62 • 10 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 498 • 18
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.77k • 43
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 76.3k • 822 Runtime error Featured 219 SpeechT5 Speech Synthesis Demo 👩 219 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 1.65k • 110
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 963 • 77 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 856k • • 22 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 130 • 17
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 963 • 77
microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 130 • 17
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 1.5M • 392 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 855k • 207 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 191k • 79 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 341 • 1
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 855k • 207
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 191k • 79
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 341 • 1
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 915k • 473 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 525k • 66 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 102k • 61 microsoft/layoutxlm-base Updated Sep 16, 2022 • 6.57k • 73
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 6.77k • 68 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 451 • 19 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 59.9k • 69 microsoft/radedit Updated Dec 8, 2025 • 27
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 2.15k • 223 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 4.5k • 666
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 18.2k • 121 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 45 • 5 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 52 • 33
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 6.09k • 106 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 386 • 17 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 128 • 20
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 707k • 1.74k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 378k • 333 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 21.4k • 378
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 51 • 9 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 51 • 5
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 9 microsoft/mocapact-data Updated Aug 17, 2024 • 33 • 4 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1