Qwen2.5-1.5B-Instruct-KAI

Instruction

Llama3.2-1B-Instruct-KAI, Llama3.2-3B-Instruct-KAI, Qwen2.5-0.5B-Instruct-KAI, Qwen2.5-1.5B-Instruct-KAI, and Qwen2.5-3B-Instruct-KAI are a collection of models fine-tuned on the open Qwen2.5* and Llama3.2* models. They are optimized for Vietnamese language understanding and generation tasks such as reading comprehension, information extraction, question answering and summarization.

Quickstart

This is a demonstration of loading a model and performing a question-answering or summarization task.


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "kiki-ailab/Qwen2.5-3B-Instruct-KAI"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Xin chร o !"

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Examples

Example 1:

prompt = """Dฦฐแป›i ฤ‘รขy lร  mแป™t sแป‘ tร i liแป‡u / vฤƒn bแบฃn:

<DOC id="doc-1">
Theo mแป™t nghiรชn cแปฉu gแบงn ฤ‘รขy, biแบฟn ฤ‘แป•i khรญ hแบญu ฤ‘รฃ lร m gia tฤƒng tแบงn suแบฅt vร  cฦฐแปng ฤ‘แป™ cแปงa cรกc hiแป‡n tฦฐแปฃng thแปi tiแบฟt cแปฑc ฤ‘oan, bao gแป“m bรฃo, hแบกn hรกn vร  lลฉ lแปฅt. Cรกc khu vแปฑc ven biแปƒn ฤรดng Nam ร cรณ nguy cฦก cao nhแบฅt do nฦฐแป›c biแปƒn dรขng vร  hiแป‡n tฦฐแปฃng xรขm nhแบญp mแบทn.
</DOC>
<DOC id="doc-2">
Mแป™t bรกo cรกo tแปซ Ngรขn hร ng Thแบฟ giแป›i cho thแบฅy rแบฑng biแบฟn ฤ‘แป•i khรญ hแบญu sแบฝ แบฃnh hฦฐแปŸng nghiรชm trแปng ฤ‘แบฟn sแบฃn xuแบฅt nรดng nghiแป‡p, ฤ‘แบทc biแป‡t lร  แปŸ cรกc nฦฐแป›c ฤ‘ang phรกt triแปƒn, nฦกi nแปn kinh tแบฟ phแปฅ thuแป™c lแป›n vร o nรดng nghiแป‡p. Cแปฅ thแปƒ, nฤƒng suแบฅt cรขy trแป“ng cรณ thแปƒ giแบฃm tแปซ 10% ฤ‘แบฟn 25% trong 30 nฤƒm tแป›i.
</DOC>
<DOC id="doc-3">
Mแป™t sรกng kiแบฟn quแป‘c tแบฟ ฤ‘รฃ ฤ‘ฦฐแปฃc khแปŸi ฤ‘แป™ng nhแบฑm giแบฃm thiแปƒu tรกc ฤ‘แป™ng cแปงa biแบฟn ฤ‘แป•i khรญ hแบญu thรดng qua viแป‡c thรบc ฤ‘แบฉy sแปญ dแปฅng nฤƒng lฦฐแปฃng tรกi tแบกo vร  giแบฃm phรกt thแบฃi carbon. Cรกc nฦฐแป›c phรกt triแปƒn ฤ‘รฃ cam kแบฟt hแป— trแปฃ tร i chรญnh cho cรกc quแป‘c gia dแป… bแป‹ tแป•n thฦฐฦกng nhแบฅt, nhฦฐng viแป‡c triแปƒn khai vแบซn gแบทp nhiแปu thรกch thแปฉc.
</DOC>

TASK: Hรฃy trแบฃ lแปi cรขu hแปi "Biแบฟn ฤ‘แป•i khรญ hแบญu แบฃnh hฦฐแปŸng nhฦฐ thแบฟ nร o ฤ‘แบฟn nรดng nghiแป‡p แปŸ cรกc nฦฐแป›c ฤ‘ang phรกt triแปƒn?"

INSTRUCTION:
1. Cรขu trแบฃ lแปi khรดng quรก 50 tแปซ. 
2. Trรญch dแบซn rรต rร ng tร i liแป‡u nร o chแปฉa thรดng tin liรชn quan, theo format: [doc-k]"""

Example 2:

prompt = """Trแบฃ lแปi cรขu hแปi dแปฑa vร o nแป™i dung ฤ‘oแบกn vฤƒn sau:
====
Bรฃo Milton bแบฏt ฤ‘แบงu ฤ‘แป• bแป™ vร o Siesta Key, bang Florida, Mแปน, vแป›i sแปฉc giรณ 193 km/h, tฦฐฦกng ฤ‘ฦฐฦกng cแบฅp 3 trong thang ฤ‘o bรฃo 5 cแบฅp, vร o khoแบฃng 20h30 ngร y 9/10 (7h30 sรกng 10/10 giแป Hร  Nแป™i). Sau vร i tiแบฟng cร n quรฉt qua Florida, bรฃo Milton hแบก xuแป‘ng cแบฅp 2 vร  tiแบฟp tแปฅc hแบก xuแป‘ng cแบฅp 1 vร o rแบกng sรกng 10/10.

ฤรขy lร  cฦกn bรฃo thแปฉ nฤƒm แปŸ Mแปน vร  cฦกn bรฃo thแปฉ ba tแบฅn cรดng bang Florida trong nฤƒm nay. Trฦฐแป›c khi bรฃo Milton ฤ‘แป• bแป™, Thแป‘ng ฤ‘แป‘c Florida Ron DeSantis cho biแบฟt รญt nhแบฅt 19 cฦกn lแป‘c xoรกy ฤ‘รฃ xuแบฅt hiแป‡n แปŸ Florida vร  116 cแบฃnh bรกo lแป‘c xoรกy ฤ‘ฦฐแปฃc ban bแป‘ khแบฏp bang.

Mฦฐa lแป›n xแบฃy ra แปŸ cรกc khu vแปฑc, nhแบฅt lร  thร nh phแป‘ St. Petersburg khi hแปฉng chแป‹u "trแบญn mฦฐa nghรฌn nฤƒm cรณ mแป™t", vแป›i lฦฐแปฃng mฦฐa trรบt xuแป‘ng thร nh phแป‘ trong ba giแป tฦฐฦกng ฤ‘ฦฐฦกng ba thรกng trong nฤƒm. Cรกc thร nh phแป‘ McKay Creek, Clearwater Beach vร  Temple Terrace cลฉng ghi nhแบญn lฦฐแปฃng mฦฐa lแป›n, lแบงn lฦฐแปฃt lร  371 mm, 355 mm vร  344 mm.
====

Yรชu cแบงu cรขu trแบฃ lแปi hoแบทc lร  ฤ‘ฦฐแปฃc trรญch ra tแปซ ฤ‘oแบกn vฤƒn, hoแบทc lร  'NO ANSWER' nแบฟu nแป™i dung ฤ‘oแบกn vฤƒn khรดng liรชn quan ฤ‘แบฟn cรขu hแปi.

Cรขu hแปi: Bรฃo Milton mแบกnh nhฦฐ thแบฟ nร o ? Diแป…n ra แปŸ ฤ‘รขu ?
Cรขu trแบฃ lแปi:"""

Example 3:

prompt = """Cho vฤƒn bแบฃn dฦฐแป›i ฤ‘รขy:
====
Bรฃo Milton bแบฏt ฤ‘แบงu ฤ‘แป• bแป™ vร o Siesta Key, bang Florida, Mแปน, vแป›i sแปฉc giรณ 193 km/h, tฦฐฦกng ฤ‘ฦฐฦกng cแบฅp 3 trong thang ฤ‘o bรฃo 5 cแบฅp, vร o khoแบฃng 20h30 ngร y 9/10 (7h30 sรกng 10/10 giแป Hร  Nแป™i). Sau vร i tiแบฟng cร n quรฉt qua Florida, bรฃo Milton hแบก xuแป‘ng cแบฅp 2 vร  tiแบฟp tแปฅc hแบก xuแป‘ng cแบฅp 1 vร o rแบกng sรกng 10/10.

ฤรขy lร  cฦกn bรฃo thแปฉ nฤƒm แปŸ Mแปน vร  cฦกn bรฃo thแปฉ ba tแบฅn cรดng bang Florida trong nฤƒm nay. Trฦฐแป›c khi bรฃo Milton ฤ‘แป• bแป™, Thแป‘ng ฤ‘แป‘c Florida Ron DeSantis cho biแบฟt รญt nhแบฅt 19 cฦกn lแป‘c xoรกy ฤ‘รฃ xuแบฅt hiแป‡n แปŸ Florida vร  116 cแบฃnh bรกo lแป‘c xoรกy ฤ‘ฦฐแปฃc ban bแป‘ khแบฏp bang.

Mฦฐa lแป›n xแบฃy ra แปŸ cรกc khu vแปฑc, nhแบฅt lร  thร nh phแป‘ St. Petersburg khi hแปฉng chแป‹u "trแบญn mฦฐa nghรฌn nฤƒm cรณ mแป™t", vแป›i lฦฐแปฃng mฦฐa trรบt xuแป‘ng thร nh phแป‘ trong ba giแป tฦฐฦกng ฤ‘ฦฐฦกng ba thรกng trong nฤƒm. Cรกc thร nh phแป‘ McKay Creek, Clearwater Beach vร  Temple Terrace cลฉng ghi nhแบญn lฦฐแปฃng mฦฐa lแป›n, lแบงn lฦฐแปฃt lร  371 mm, 355 mm vร  344 mm.
====

TASK: ฤแบทt tiรชu ฤ‘แป vร  tรณm tแบฏt bร i bรกo trรชn thร nh 1-2 cรขu."""

Benchmarks

VMLU

We evaluate our fine-tuned models on VMLU benchmarks provided by https://vmlu.ai

Model VMLU ViSquad ViDrop ViDialog
Llama3.2-1B-Instruct 37.6 70.1 29.6 33.9
Llama3.2-3B-Instruct 47.6 90.3 63.5 50.8
Qwen2.5-0.5B-Instruct 39.1 62.5 31.5 28.0
Qwen2.5-1.5B-Instruct 48.6 86.7 54.5 39.8
Qwen2.5-3B-Instruct 52.9 88.3 72.4 54.4

Our finetuned models
Llama3.2-1B-Instruct-KAI 50.5 (+12.9) 88.4 (+18.3) 71.1 (+41.5) 50.9 (+17.0)
Llama3.2-3B-Instruct-KAI 58.1 (+10.5) 93.5 (+3.2) 81.4 (+17.9) 67.3 (+16.5)
Qwen2.5-0.5B-Instruct-KAI 49.7 (+10.6) 87.3 (+24.8) 62.3 (+30.8) 39.0 (+11.0)
Qwen2.5-1.5B-Instruct-KAI 57.5 (+8.9) 93.3 (+6.6) 76.0 (+21.5) 54.6 (+14.8)
Qwen2.5-3B-Instruct-KAI 63.5 (+10.6) 94.2 (+5.9) 80.9 (+8.5) 68.5 (+14.1)

Evaluate on ArenaHard (CohereForAI)

We follow the evaluation method outlined in https://github.com/lmarena/arena-hard-auto to assess our fine-tuned models against others on the ArenaHard benchmark.

  • Based model: Qwen/Qwen2.5-7B-Instruct
  • Judge: Qwen/Qwen2.5-72B-Instruct
# model size (B) win tie lose
1 deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 14 59,5 4,6 35,9
2 CohereForAI/aya-expanse-8b 8 55 4,6 40,4
3 Qwen/Qwen2.5-14B-Instruct 14 48,7 9,1 42,2
4 kiki-ailab/Qwen2.5-3B-Instruct-KAI 3 38,7 4,7 56,6
5 meta-llama/Llama3.1-8B-Instruct 8 38,6 4,9 56,5
6 CohereForAI/c4ai-command-r7b-12-2024 7 35,1 3,3 61,6
7 kiki-ailab/Llama3.2-3B-Instruct-KAI 3 35 4,3 60,7
8 arcee-ai/Arcee-VyLinh 3 34,8 5,4 59,8
9 kiki-ailab/Qwen2.5-1.5B-Instruct-KAI 1,5 28,9 3,9 67,2
10 deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 7 23,2 2,8 74
11 meta-llama/Llama-3.2-3B-Instruct 3 21,2 4,4 74,4
12 Qwen/Qwen2.5-3B-Instruct 3 18,6 5,8 75,6
13 zaloai/Llama3.2-1B-Instruct-ZAI 1 17,4 3,7 78,9
14 Viet-Mistral/Vistral-7B-Chat 7 17,2 3,2 79,6
15 kiki-ailab/Qwen2.5-0.5B-Instruct-KAI 0,5 10,9 2 87,1
16 meta-llama/Llama-3.2-1B-Instruct 1 6,5 1,6 91,9
17 Qwen/Qwen2.5-1.5B-Instruct 1 6,4 3 90,6
18 deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 1,5 3 1,5 95,5
19 vinai/PhoGPT-4B-Chat 4 1,2 2,7 96,1
20 Qwen/Qwen2.5-0.5B-Instruct 0,5 1 1,7 97,3

Disclaimer

  • Might still hallucinate on cultural-specific content.
  • Primary focus on Vietnamese language understanding.
  • May not perform optimally for specialized technical domains.

Benchmarks

VMLU

We evaluate our fine-tuned models on VMLU benchmarks provided by https://vmlu.ai

Model VMLU ViSquad ViDrop ViDialog
Llama3.2-1B-Instruct 37.6 70.1 29.6 33.9
Llama3.2-3B-Instruct 47.6 90.3 63.5 50.8
Qwen2.5-0.5B-Instruct 39.1 62.5 31.5 28.0
Qwen2.5-1.5B-Instruct 48.6 86.7 54.5 39.8
Qwen2.5-3B-Instruct 52.9 88.3 72.4 54.4

Our finetuned models
Llama3.2-1B-Instruct-KAI 50.5 (+12.9) 88.4 (+18.3) 71.1 (+41.5) 50.9 (+17.0)
Llama3.2-3B-Instruct-KAI 58.1 (+10.5) 93.5 (+3.2) 81.4 (+17.9) 67.3 (+16.5)
Qwen2.5-0.5B-Instruct-KAI 49.7 (+10.6) 87.3 (+24.8) 62.3 (+30.8) 39.0 (+11.0)
Qwen2.5-1.5B-Instruct-KAI 57.5 (+8.9) 93.3 (+6.6) 76.0 (+21.5) 54.6 (+14.8)
Qwen2.5-3B-Instruct-KAI 63.5 (+10.6) 94.2 (+5.9) 80.9 (+8.5) 68.5 (+14.1)

Evaluate on ArenaHard (CohereForAI)

We follow the evaluation method outlined in https://github.com/lmarena/arena-hard-auto to assess our fine-tuned models against others on the ArenaHard benchmark.

  • Based model: Qwen/Qwen2.5-7B-Instruct
  • Judge: Qwen/Qwen2.5-72B-Instruct
# model size (B) win tie lose
1 deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 14 59,5 4,6 35,9
2 CohereForAI/aya-expanse-8b 8 55 4,6 40,4
3 Qwen/Qwen2.5-14B-Instruct 14 48,7 9,1 42,2
4 kiki-ailab/Qwen2.5-3B-Instruct-KAI 3 38,7 4,7 56,6
5 meta-llama/Llama3.1-8B-Instruct 8 38,6 4,9 56,5
6 CohereForAI/c4ai-command-r7b-12-2024 7 35,1 3,3 61,6
7 kiki-ailab/Llama3.2-3B-Instruct-KAI 3 35 4,3 60,7
8 arcee-ai/Arcee-VyLinh 3 34,8 5,4 59,8
9 kiki-ailab/Qwen2.5-1.5B-Instruct-KAI 1,5 28,9 3,9 67,2
10 deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 7 23,2 2,8 74
11 meta-llama/Llama-3.2-3B-Instruct 3 21,2 4,4 74,4
12 Qwen/Qwen2.5-3B-Instruct 3 18,6 5,8 75,6
13 zaloai/Llama3.2-1B-Instruct-ZAI 1 17,4 3,7 78,9
14 Viet-Mistral/Vistral-7B-Chat 7 17,2 3,2 79,6
15 kiki-ailab/Qwen2.5-0.5B-Instruct-KAI 0,5 10,9 2 87,1
16 meta-llama/Llama-3.2-1B-Instruct 1 6,5 1,6 91,9
17 Qwen/Qwen2.5-1.5B-Instruct 1 6,4 3 90,6
18 deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 1,5 3 1,5 95,5
19 vinai/PhoGPT-4B-Chat 4 1,2 2,7 96,1
20 Qwen/Qwen2.5-0.5B-Instruct 0,5 1 1,7 97,3

Disclaimer

  • Might still hallucinate on cultural-specific content.
  • Primary focus on Vietnamese language understanding.
  • May not perform optimally for specialized technical domains.

Feedback

We welcome any feedback on these public models. Please send your comments to contact@kilm.ai.

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kiki-ailab/Qwen2.5-1.5B-Instruct-KAI

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1375)
this model

Collection including kiki-ailab/Qwen2.5-1.5B-Instruct-KAI