feat(settings): Update default model to TheBloke/Mistral-7B-Instruct-v0.2-GGUF (#1415)

* Update LlamaCPP dependency

* Default to TheBloke/Mistral-7B-Instruct-v0.2-GGUF

* Fix API docs
This commit is contained in:
Iván Martínez 2023-12-17 16:11:08 +01:00 committed by GitHub
parent c71ae7cee9
commit 8ec7cf49f4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 1433 additions and 1233 deletions

View File

@ -1 +1,14 @@
# API Reference
The API is divided in two logical blocks:
1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
embedding generation and storage.
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
engineering and the response generation.
2. Low-level API, allowing advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
documents.

View File

@ -32,21 +32,6 @@ The installation guide will help you in the [Installation section](/installation
/>
</Cards>
## API Organization
The API is divided in two logical blocks:
1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
embedding generation and storage.
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
engineering and the response generation.
2. Low-level API, allowing advanced users to implement their own complex pipelines:
- Embeddings generation: based on a piece of text.
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
documents.
<Callout intent = "info">
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
model download script, ingestion script, documents folder watch, etc.

2632
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -36,7 +36,7 @@ gradio = "^4.4.1"
[tool.poetry.group.local]
optional = true
[tool.poetry.group.local.dependencies]
llama-cpp-python = "^0.2.11"
llama-cpp-python = "^0.2.23"
numpy = "1.26.0"
sentence-transformers = "^2.2.2"
# https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path

View File

@ -48,8 +48,8 @@ qdrant:
local:
prompt_style: "llama2"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
sagemaker: