Any evaluation done in thinking mode?
I'm seeing that all eval for the Qwen Sealion are only in non-thinking mode, any reason why thinking mode eval was not considered?
Hello,
Thank you for your interest in SEA-LION! Since we have only trained the model in instruct data for now, we have currently only evaluated Qwen-SEA-LION on non-thinking mode. That said, on the SEA-HELM leaderboard, scores for both the original Qwen3-32B non-thinking and thinking modes are available.
Hello,
Thank you for your interest in SEA-LION! Since we have only trained the model in instruct data for now, we have currently only evaluated Qwen-SEA-LION on non-thinking mode. That said, on the SEA-HELM leaderboard, scores for both the original Qwen3-32B non-thinking and thinking modes are available.
Interesting, I thought that the CPT on SEA language tokens would have also helped it to reason with a better understanding for such languages and lead to a better conclusion overall at the end of the reasoning? Or is that not how it works?
Hello,
For SEA-LION, we not only do CPT, but also post-training. Depending on the type of training, the data used for training is also quite different. This is what I meant when I said that we have only trained the model (referring specifically to the post-training phase) on instruct data.
Thank you for the question!