Any evaluation done in thinking mode?

by ablueleaf - opened Oct 29

Oct 29

I'm seeing that all eval for the Qwen Sealion are only in non-thinking mode, any reason why thinking mode eval was not considered?

Jian-Gang

AI Singapore org Oct 29

Hello,

Thank you for your interest in SEA-LION! Since we have only trained the model in instruct data for now, we have currently only evaluated Qwen-SEA-LION on non-thinking mode. That said, on the SEA-HELM leaderboard, scores for both the original Qwen3-32B non-thinking and thinking modes are available.

ablueleaf

Oct 31

Hello,

Thank you for your interest in SEA-LION! Since we have only trained the model in instruct data for now, we have currently only evaluated Qwen-SEA-LION on non-thinking mode. That said, on the SEA-HELM leaderboard, scores for both the original Qwen3-32B non-thinking and thinking modes are available.

Interesting, I thought that the CPT on SEA language tokens would have also helped it to reason with a better understanding for such languages and lead to a better conclusion overall at the end of the reasoning? Or is that not how it works?

Jian-Gang

AI Singapore org Nov 5

Hello,

For SEA-LION, we not only do CPT, but also post-training. Depending on the type of training, the data used for training is also quite different. This is what I meant when I said that we have only trained the model (referring specifically to the post-training phase) on instruct data.

Thank you for the question!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment