# Xinference

Xorbits Inference(Xinference) is an open-source project to run language models on your own machine. You can use it to serve open-source LLMs like Llama-2 locally.

## Preparation[​](https://doc.chathub.gg/custom-chatbots/openai#preparation) <a href="#preparation" id="preparation"></a>

Follow the instructions at [Using Xinference](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html) to setup Xinference and run the `llama-2-chat` model.

## Configuration <a href="#configuration" id="configuration"></a>

<figure><img src="https://3481753452-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FzYU2cjsYvY3seDzkySsN%2Fuploads%2F2V6AnXcYntIunjAvipw0%2Fimage.png?alt=media&#x26;token=cdd7e2da-6de8-41c5-9c03-d8a576b74905" alt=""><figcaption></figcaption></figure>

* **API Endpoit**: `http://127.0.0.1:9997/v1/chat/completions`
* **API Key**: random strings
* **Model**: `llama-2-chat`

You can find all the available models at <https://inference.readthedocs.io/en/latest/models/builtin/llm/index.html>

## Troubleshooting <a href="#troubleshooting" id="troubleshooting"></a>

* Only models with `chat` in their name are supported.
