internlm
/

internlm3-8b-instruct

@@ -108,6 +108,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
@@ -153,6 +155,10 @@ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.i
 #### vLLM inference
 We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
@@ -280,6 +286,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
@@ -308,6 +316,10 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
 print(response)
 ```
 #### vLLM inference
 We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
@@ -369,7 +381,7 @@ The code is licensed under Apache-2.0, while model weights are fully open for ac
 InternLM3，即书生·浦语大模型第3代，开源了80亿参数，面向通用使用与高阶推理的指令模型（InternLM3-8B-Instruct）。模型具备以下特点：
 - **更低的代价取得更高的性能**:
-在推理、知识类任务上取得同量级最优性能，超过Llama3.1-8B和Qwen2.5-7B. 值得关注的是InternLM3只用了4万亿词元进行训练，对比同级别模型训练成本节省75%以上。
 - **深度思考能力**:
 InternLM3支持通过长思维链求解复杂推理任务的深度思考模式，同时还兼顾了用户体验更流畅的通用回复模式。
@@ -445,6 +457,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
@@ -491,7 +505,12 @@ curl http://localhost:23333/v1/chat/completions \
 ##### vLLM 推理
 我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm，现在请使用以下PR链接手动安装
 ```python
@@ -616,6 +635,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
@@ -644,6 +665,10 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
 print(response)
 ```
 ##### vLLM 推理
 我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm，现在请使用以下PR链接手动安装

 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
+prompt = tokenizer.batch_decode(tokenized_chat)[0]
+print(prompt)
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
+####  Ollama inference
+TODO
 #### vLLM inference
 We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
+prompt = tokenizer.batch_decode(tokenized_chat)[0]
+print(prompt)
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
 print(response)
 ```
+####  Ollama inference
+TODO
 #### vLLM inference
 We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
 InternLM3，即书生·浦语大模型第3代，开源了80亿参数，面向通用使用与高阶推理的指令模型（InternLM3-8B-Instruct）。模型具备以下特点：
 - **更低的代价取得更高的性能**:
+在推理、知识类任务上取得同量级最优性能，超过Llama3.1-8B和Qwen2.5-7B。值得关注的是InternLM3只用了4万亿词元进行训练，对比同级别模型训练成本节省75%以上。
 - **深度思考能力**:
 InternLM3支持通过长思维链求解复杂推理任务的深度思考模式，同时还兼顾了用户体验更流畅的通用回复模式。
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
+prompt = tokenizer.batch_decode(tokenized_chat)[0]
+print(prompt)
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
+#####  Ollama 推理
+TODO
 ##### vLLM 推理
 我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm，现在请使用以下PR链接手动安装
 ```python
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
 ]
+prompt = tokenizer.batch_decode(tokenized_chat)[0]
+print(prompt)
 response = tokenizer.batch_decode(generated_ids)[0]
 print(response)
 ```
 print(response)
 ```
+#####  Ollama 推理
+TODO
 ##### vLLM 推理
 我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm，现在请使用以下PR链接手动安装