continuedev
/

instinct

Text Generation

text-generation-inference

Model card Files Files and versions

Adarsh-Iyer commited on Sep 4

Commit

3f5b6c8

·

verified ·

1 Parent(s): 4e23399

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ This repo contains the model weights for **Instinct**, [Continue](https://contin
 **Ollama**: We've released a [Q4_K_M GGUF quantization of Instinct](https://huggingface.co/continuedev/instinct-GGUF) for efficient local inference. Try it with [Continue's Ollama integration](https://docs.continue.dev/guides/ollama-guide).
-Besides Ollama, there are many ways to plug a local model into Continue; we internally used an endpoint served by [SGLang](https://github.com/sgl-project/sglang), which is one of the options below. Quantizing for faster inference is also an option that worked well for us. Serve the model using either of the below options, then [connect it with Continue](https://docs.continue.dev/guides/how-to-self-host-a-model).
 SGLang: `python3 -m sglang.launch_server --model-path continuedev/instinct --load-format safetensors`
 <br>vLLM  : `vllm serve continuedev/instinct --served-model-name instinct --load-format safetensors`

 **Ollama**: We've released a [Q4_K_M GGUF quantization of Instinct](https://huggingface.co/continuedev/instinct-GGUF) for efficient local inference. Try it with [Continue's Ollama integration](https://docs.continue.dev/guides/ollama-guide).
+Besides Ollama, there are many ways to plug a local model into Continue; we internally used an endpoint served by [SGLang](https://github.com/sgl-project/sglang), which is one of the options below. Serve the model using either of the below options, then [connect it with Continue](https://docs.continue.dev/guides/how-to-self-host-a-model).
 SGLang: `python3 -m sglang.launch_server --model-path continuedev/instinct --load-format safetensors`
 <br>vLLM  : `vllm serve continuedev/instinct --served-model-name instinct --load-format safetensors`