Xorbits Inference(Xinference) is an open-source project to run language models on your own machine. You can use it to serve open-source LLMs like Llama-2 locally.
Follow the instructions at Using Xinferencearrow-up-right to setup Xinference and run the llama-2-chat model.
llama-2-chat
API Endpoit: http://127.0.0.1:9997/v1/chat/completions
http://127.0.0.1:9997/v1/chat/completions
API Key: random strings
Model: llama-2-chat
You can find all the available models at https://inference.readthedocs.io/en/latest/models/builtin/llm/index.htmlarrow-up-right
Only models with chat in their name are supported.
chat
Last updated 1 year ago