Malkesh Dalia

malkesh2911

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

upvoted a paper 2 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

upvoted a paper 8 days ago

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published 4 days ago • 59

upvoted a paper 2 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 8 days ago • 20

upvoted a paper 8 days ago

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published 11 days ago • 87

upvoted a collection 10 days ago

Ministral 3

Collection

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 24 days ago • 133

upvoted a paper 13 days ago

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Paper • 2512.10430 • Published 15 days ago • 112

upvoted an article 14 days ago

Article

Codex is Open Sourcing AI models

16 days ago

•

upvoted an article 18 days ago

Article

We Got Claude to Fine-Tune an Open Source LLM

23 days ago

•

535

liked a model 18 days ago

ByteDance/BindWeave

Image-to-Video • Updated 28 days ago • 1.78k • 87

updated a collection about 1 month ago

My AI

Collection

4 items • Updated about 1 month ago

upvoted a paper about 1 month ago

Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17 • 17

liked 2 models about 1 month ago

google/gemma-7b

Text Generation • 9B • Updated Jun 27, 2024 • 54.9k • 3.25k

baidu/ERNIE-4.5-0.3B-PT

Text Generation • Updated Aug 29 • 10.6k • • 99

upvoted 2 papers about 1 month ago

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17 • 45

MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

Paper • 2511.11373 • Published Nov 14 • 12

upvoted a paper about 2 months ago

Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step

Paper • 2509.23924 • Published Sep 28 • 8

liked a dataset about 2 months ago

fka/awesome-chatgpt-prompts

Viewer • Updated about 13 hours ago • 629 • 24.1k • 9.51k

updated a collection about 2 months ago

My AI

Collection

4 items • Updated about 1 month ago

liked a model about 2 months ago

MiniMaxAI/MiniMax-M2

Text Generation • 229B • Updated 3 days ago • 136k • • 1.43k

liked 2 models 2 months ago

vandijklab/C2S-Scale-Gemma-2-27B

Text Generation • 28B • Updated Oct 31 • 842 • 154

mistralai/Mistral-7B-Instruct-v0.2

Text Generation • 7B • Updated Jul 24 • 2.49M • • 3.04k

Malkesh Dalia

AI & ML interests

Recent Activity

Organizations

malkesh2911's activity

Codex is Open Sourcing AI models

We Got Claude to Fine-Tune an Open Source LLM