* feat: change ollama default model to llama3.1
* chore: bump versions
* feat: Change default model in local mode to llama3.1
* chore: make sure last poetry version is used
* fix: mypy
* fix: do not add BOS (with last llamacpp-python version)
* docs: add troubleshooting
* fix: pass HF token to setup script and prevent to download tokenizer when it is empty
* fix: improve log and disable specific tokenizer by default
* chore: change HF_TOKEN environment to be aligned with default config
* ifx: mypy
* Support for Google Gemini LLMs and Embeddings
Initial support for Gemini, enables usage of Google LLMs and embedding models (see settings-gemini.yaml)
Install via
poetry install --extras "llms-gemini embeddings-gemini"
Notes:
* had to bump llama-index-core to later version that supports Gemini
* poetry --no-update did not work: Gemini/llama_index seem to require more (transient) updates to make it work...
* fix: crash when gemini is not selected
* docs: add gemini llm
---------
Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
* Updated prompt_style to be moved to the main LLM setting since all LLMs from llama_index can utilize this. I also included temperature, context window size, max_tokens, max_new_tokens into the openailike to help ensure the settings are consistent from the other implementations.
* Removed prompt_style from llamacpp entirely
* Fixed settings-local.yaml to include prompt_style in the LLM settings instead of llamacpp.
* Adding Postgres for the doc and index store
* Adding documentation. Rename postgres database local->simple. Postgres storage dependencies
* Update documentation for postgres storage
* Renaming feature to nodestore
* update docstore -> nodestore in doc
* missed some docstore changes in doc
* Updated poetry.lock
* Formatting updates to pass ruff/black checks
* Correction to unreachable code!
* Format adjustment to pass black test
* Adjust extra inclusion name for vector pg
* extra dep change for pg vector
* storage-postgres -> storage-nodestore-postgres
* Hash change on poetry lock
* Extract optional dependencies
* Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity
* Support Ollama embeddings
* Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine
* Fix vector retriever filters
This mode behaves the same as the openai mode, except that it allows setting custom models not
supported by OpenAI. It can be used with any tool that serves models from an OpenAI compatible API.
Implements #1424
As discussed on Discord, the decision has been made to remove the system prompts by default, to better segregate the API and the UI usages.
A concurrent PR (#1353) is enabling the dynamic setting of a system prompt in the UI.
Therefore, if UI users want to use a custom system prompt, they can specify one directly in the UI.
If the API users want to use a custom prompt, they can pass it directly into their messages that they are passing to the API.
In the highlight of the two use case above, it becomes clear that default system_prompt does not need to exist.
* Fix the parallel ingestion mode, and make it available through conf
Also updated the documentation to show how to configure the ingest mode.
* PR feedback: redirect to documentation
* added max_new_tokens as a configuration option to the llm block in settings
* Update fern/docs/pages/manual/settings.mdx
Co-authored-by: lopagela <lpglm@orange.fr>
* Update private_gpt/settings/settings.py
Add default value for max_new_tokens = 256
Co-authored-by: lopagela <lpglm@orange.fr>
* Addressed location of docs comment
* reformatting from running 'make check'
* remove default config value from settings.yaml
---------
Co-authored-by: lopagela <lpglm@orange.fr>
A file that is ingested will be transformed into several documents (that
are organized into nodes).
This endpoint is deleting documents (bits of a file). These bits can be
retrieved thanks to the endpoint to list all the documents.
* Configure simple builtin logging
Changed the 2 existing `print` in the `private_gpt` code base into actual python logging, stop using loguru (dependency will be dropped in a later commit).
Try to use the `key=value` logging convention in logs (to indicate what dynamic values represents, and what is dynamic vs not).
Using `%s` log style, so that the string formatting is pushed inside the logger, giving the ability to the logger to determine if the string need to be formatted or not (i.e. strings from debug logs might not be formatted if the log level is not debug)
The (basic) builtin log configuration have been placed in `private_gpt/__init__.py` in order to initialize the logging system even before we start to launch any python code in `private_gpt` package (ensuring we get any initialization log formatted as we want to)
Disabled `uvicorn` custom logging format, resulting in having uvicorn logs being outputted in our formatted.
Some more concise format could be used if we want to, especially:
```
COMPACT_LOG_FORMAT = '%(asctime)s.%(msecs)03d [%(levelname)s] %(name)s - %(message)s'
```
Python documentation and cookbook on logging for reference:
* https://docs.python.org/3/library/logging.html
* https://docs.python.org/3/howto/logging.html
* Removing loguru from the dependencies
Result of `poetry remove loguru`
* PR feedback: using `logger` variable name instead of `log`
---------
Co-authored-by: Louis Melchior <louis@jaris.io>
* fix: docker copying extra files
* feat: allow configuring mode through env vars
* feat: Attempt to build and tag a docker image
* fix: run docker on release
* fix: typing in prompt transformation
* chore: remove tutorial comments