Upload README.md
Browse files
README.md
CHANGED
|
@@ -108,6 +108,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=
|
|
| 108 |
generated_ids = [
|
| 109 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 110 |
]
|
|
|
|
|
|
|
| 111 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 112 |
print(response)
|
| 113 |
```
|
|
@@ -153,6 +155,10 @@ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.i
|
|
| 153 |
|
| 154 |
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
#### vLLM inference
|
| 157 |
|
| 158 |
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
|
|
@@ -280,6 +286,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
|
|
| 280 |
generated_ids = [
|
| 281 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 282 |
]
|
|
|
|
|
|
|
| 283 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 284 |
print(response)
|
| 285 |
```
|
|
@@ -308,6 +316,10 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
|
|
| 308 |
print(response)
|
| 309 |
```
|
| 310 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 311 |
#### vLLM inference
|
| 312 |
|
| 313 |
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
|
|
@@ -369,7 +381,7 @@ The code is licensed under Apache-2.0, while model weights are fully open for ac
|
|
| 369 |
InternLM3,即书生·浦语大模型第3代,开源了80亿参数,面向通用使用与高阶推理的指令模型(InternLM3-8B-Instruct)。模型具备以下特点:
|
| 370 |
|
| 371 |
- **更低的代价取得更高的性能**:
|
| 372 |
-
在推理、知识类任务上取得同量级最优性能,超过Llama3.1-8B和Qwen2.5-7B
|
| 373 |
- **深度思考能力**:
|
| 374 |
InternLM3支持通过长思维链求解复杂推理任务的深度思考模式,同时还兼顾了用户体验更流畅的通用回复模式。
|
| 375 |
|
|
@@ -445,6 +457,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=
|
|
| 445 |
generated_ids = [
|
| 446 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 447 |
]
|
|
|
|
|
|
|
| 448 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 449 |
print(response)
|
| 450 |
```
|
|
@@ -491,7 +505,12 @@ curl http://localhost:23333/v1/chat/completions \
|
|
| 491 |
|
| 492 |
|
| 493 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 494 |
##### vLLM 推理
|
|
|
|
| 495 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
| 496 |
|
| 497 |
```python
|
|
@@ -616,6 +635,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
|
|
| 616 |
generated_ids = [
|
| 617 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 618 |
]
|
|
|
|
|
|
|
| 619 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 620 |
print(response)
|
| 621 |
```
|
|
@@ -644,6 +665,10 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
|
|
| 644 |
print(response)
|
| 645 |
```
|
| 646 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 647 |
##### vLLM 推理
|
| 648 |
|
| 649 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
|
|
|
| 108 |
generated_ids = [
|
| 109 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 110 |
]
|
| 111 |
+
prompt = tokenizer.batch_decode(tokenized_chat)[0]
|
| 112 |
+
print(prompt)
|
| 113 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 114 |
print(response)
|
| 115 |
```
|
|
|
|
| 155 |
|
| 156 |
|
| 157 |
|
| 158 |
+
#### Ollama inference
|
| 159 |
+
|
| 160 |
+
TODO
|
| 161 |
+
|
| 162 |
#### vLLM inference
|
| 163 |
|
| 164 |
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
|
|
|
|
| 286 |
generated_ids = [
|
| 287 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 288 |
]
|
| 289 |
+
prompt = tokenizer.batch_decode(tokenized_chat)[0]
|
| 290 |
+
print(prompt)
|
| 291 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 292 |
print(response)
|
| 293 |
```
|
|
|
|
| 316 |
print(response)
|
| 317 |
```
|
| 318 |
|
| 319 |
+
#### Ollama inference
|
| 320 |
+
|
| 321 |
+
TODO
|
| 322 |
+
|
| 323 |
#### vLLM inference
|
| 324 |
|
| 325 |
We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
|
|
|
|
| 381 |
InternLM3,即书生·浦语大模型第3代,开源了80亿参数,面向通用使用与高阶推理的指令模型(InternLM3-8B-Instruct)。模型具备以下特点:
|
| 382 |
|
| 383 |
- **更低的代价取得更高的性能**:
|
| 384 |
+
在推理、知识类任务上取得同量级最优性能,超过Llama3.1-8B和Qwen2.5-7B。值得关注的是InternLM3只用了4万亿词元进行训练,对比同级别模型训练成本节省75%以上。
|
| 385 |
- **深度思考能力**:
|
| 386 |
InternLM3支持通过长思维链求解复杂推理任务的深度思考模式,同时还兼顾了用户体验更流畅的通用回复模式。
|
| 387 |
|
|
|
|
| 457 |
generated_ids = [
|
| 458 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 459 |
]
|
| 460 |
+
prompt = tokenizer.batch_decode(tokenized_chat)[0]
|
| 461 |
+
print(prompt)
|
| 462 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 463 |
print(response)
|
| 464 |
```
|
|
|
|
| 505 |
|
| 506 |
|
| 507 |
|
| 508 |
+
##### Ollama 推理
|
| 509 |
+
|
| 510 |
+
TODO
|
| 511 |
+
|
| 512 |
##### vLLM 推理
|
| 513 |
+
|
| 514 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|
| 515 |
|
| 516 |
```python
|
|
|
|
| 635 |
generated_ids = [
|
| 636 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
|
| 637 |
]
|
| 638 |
+
prompt = tokenizer.batch_decode(tokenized_chat)[0]
|
| 639 |
+
print(prompt)
|
| 640 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 641 |
print(response)
|
| 642 |
```
|
|
|
|
| 665 |
print(response)
|
| 666 |
```
|
| 667 |
|
| 668 |
+
##### Ollama 推理
|
| 669 |
+
|
| 670 |
+
TODO
|
| 671 |
+
|
| 672 |
##### vLLM 推理
|
| 673 |
|
| 674 |
我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
|