All checkpoints for "Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment", https://arxiv.org/abs/2503.04647
WenYang
James-WYang
AI & ML interests
None yet
Organizations
None yet
Language Imbalance Driven Rewarding
All checkpoints for our work "Language Imbalance Driven Rewarding for Multilingual Self-improving", https://arxiv.org/abs/2410.08964
-
James-WYang/LIDR_M0_Meta-Llama-3-8B-Instruct_en_es_ru_de_fr
8B • Updated • 3 -
James-WYang/LIDR_M0_Meta-Llama-3-8B-Instruct_en_th_bn_sw
8B • Updated • 4 -
James-WYang/LIDR_M0_Meta-Llama-3-8B-Instruct_translate_by_system_en_th_bn_sw
8B • Updated • 5 -
James-WYang/LIDR_M0_Qwen2-7B-Instruct_en_es_ru_de_fr
8B • Updated • 5
Implicit Cross-lingual Reward
All checkpoints for "Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment", https://arxiv.org/abs/2503.04647
Language Imbalance Driven Rewarding
All checkpoints for our work "Language Imbalance Driven Rewarding for Multilingual Self-improving", https://arxiv.org/abs/2410.08964
-
James-WYang/LIDR_M0_Meta-Llama-3-8B-Instruct_en_es_ru_de_fr
8B • Updated • 3 -
James-WYang/LIDR_M0_Meta-Llama-3-8B-Instruct_en_th_bn_sw
8B • Updated • 4 -
James-WYang/LIDR_M0_Meta-Llama-3-8B-Instruct_translate_by_system_en_th_bn_sw
8B • Updated • 5 -
James-WYang/LIDR_M0_Qwen2-7B-Instruct_en_es_ru_de_fr
8B • Updated • 5
models
60
James-WYang/LIDR_M1_Qwen2-7B-Instruct_en_es_ru_de_fr
8B
•
Updated
•
6
James-WYang/ICR_M0_Llama-3-Base-8B-SFT-DPO_en_bn_sw_th
8B
•
Updated
•
2
James-WYang/LIDR_M0_Llama-2-7b-chat-hf_en_es_ru_de_fr
7B
•
Updated
•
3
James-WYang/ICR_M1_Llama-3-Base-8B-SFT-DPO_en_bn_sw_th
8B
•
Updated
•
2
James-WYang/ICR_M1_Llama-3-Base-8B-SFT-KTO_en_es_ru_de_fr
8B
•
Updated
•
3
James-WYang/ICR_M0_Llama-3-Base-8B-SFT-KTO_en_es_ru_de_fr
8B
•
Updated
•
2
James-WYang/ICR_M1_Llama-3-Base-8B-SFT-DPO_en_es_ru_de_fr
8B
•
Updated
•
2
James-WYang/ICR_ANALYSIS_M1_Llama-3-Base-8B-SFT-DPO_en_es_ru_de_fr_with_t-1_reference_model
8B
•
Updated
•
4
James-WYang/ICR_ANALYSIS_M0_Llama-3-Base-8B-SFT-DPO_en_es_ru_de_fr_wo_length_control
8B
•
Updated
•
6
James-WYang/ICR_ANALYSIS_M0_Llama-3-Base-8B-SFT-DPO_en_es_ru_de_fr_each_language_5000_samples
8B
•
Updated
•
5