## Running the Server PrivateGPT supports running with different LLMs & setups. ### Local models Both the LLM and the Embeddings model will run locally. Make sure you have followed the *Local LLM requirements* section before moving on. This command will start PrivateGPT using the `settings.yaml` (default profile) together with the `settings-local.yaml` configuration files. By default, it will enable both the API and the Gradio UI. Run: ```bash PGPT_PROFILES=local make run ``` or ```bash PGPT_PROFILES=local poetry run python -m private_gpt ``` When the server is started it will print a log *Application startup complete*. Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API using Swagger UI. ### Using OpenAI If you cannot run a local model (because you don't have a GPU, for example) or for testing purposes, you may decide to run PrivateGPT using OpenAI as the LLM and Embeddings model. In order to do so, create a profile `settings-openai.yaml` with the following contents: ```yaml llm: mode: openai openai: api_base: # Defaults to https://api.openai.com/v1 api_key: # You could skip this configuration and use the OPENAI_API_KEY env var instead model: # Optional model to use. Default is "gpt-3.5-turbo" # Note: Open AI Models are listed here: https://platform.openai.com/docs/models ``` And run PrivateGPT loading that profile you just created: `PGPT_PROFILES=openai make run` or `PGPT_PROFILES=openai poetry run python -m private_gpt` When the server is started it will print a log *Application startup complete*. Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API. You'll notice the speed and quality of response is higher, given you are using OpenAI's servers for the heavy computations. ### Using OpenAI compatible API Many tools, including [LocalAI](https://localai.io/) and [vLLM](https://docs.vllm.ai/en/latest/), support serving local models with an OpenAI compatible API. Even when overriding the `api_base`, using the `openai` mode doesn't allow you to use custom models. Instead, you should use the `openailike` mode: ```yaml llm: mode: openailike ``` This mode uses the same settings as the `openai` mode. As an example, you can follow the [vLLM quickstart guide](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server) to run an OpenAI compatible server. Then, you can run PrivateGPT using the `settings-vllm.yaml` profile: `PGPT_PROFILES=vllm make run` ### Using AWS Sagemaker For a fully private & performant setup, you can choose to have both your LLM and Embeddings model deployed using Sagemaker. Note: how to deploy models on Sagemaker is out of the scope of this documentation. In order to do so, create a profile `settings-sagemaker.yaml` with the following contents (remember to update the values of the llm_endpoint_name and embedding_endpoint_name to yours): ```yaml llm: mode: sagemaker sagemaker: llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140 embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479 ``` And run PrivateGPT loading that profile you just created: `PGPT_PROFILES=sagemaker make run` or `PGPT_PROFILES=sagemaker poetry run python -m private_gpt` When the server is started it will print a log *Application startup complete*. Navigate to http://localhost:8001 to use the Gradio UI or to http://localhost:8001/docs (API section) to try the API.