feat(settings): Update default model to TheBloke/Mistral-7B-Instruct-v0.2-GGUF (#1415)
* Update LlamaCPP dependency * Default to TheBloke/Mistral-7B-Instruct-v0.2-GGUF * Fix API docs
This commit is contained in:
parent
c71ae7cee9
commit
8ec7cf49f4
|
@ -1 +1,14 @@
|
||||||
# API Reference
|
# API Reference
|
||||||
|
|
||||||
|
The API is divided in two logical blocks:
|
||||||
|
|
||||||
|
1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
|
||||||
|
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
|
||||||
|
embedding generation and storage.
|
||||||
|
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
|
||||||
|
engineering and the response generation.
|
||||||
|
|
||||||
|
2. Low-level API, allowing advanced users to implement their own complex pipelines:
|
||||||
|
- Embeddings generation: based on a piece of text.
|
||||||
|
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
|
||||||
|
documents.
|
|
@ -32,21 +32,6 @@ The installation guide will help you in the [Installation section](/installation
|
||||||
/>
|
/>
|
||||||
</Cards>
|
</Cards>
|
||||||
|
|
||||||
## API Organization
|
|
||||||
|
|
||||||
The API is divided in two logical blocks:
|
|
||||||
|
|
||||||
1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
|
|
||||||
- Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
|
|
||||||
embedding generation and storage.
|
|
||||||
- Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
|
|
||||||
engineering and the response generation.
|
|
||||||
|
|
||||||
2. Low-level API, allowing advanced users to implement their own complex pipelines:
|
|
||||||
- Embeddings generation: based on a piece of text.
|
|
||||||
- Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
|
|
||||||
documents.
|
|
||||||
|
|
||||||
<Callout intent = "info">
|
<Callout intent = "info">
|
||||||
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
|
A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
|
||||||
model download script, ingestion script, documents folder watch, etc.
|
model download script, ingestion script, documents folder watch, etc.
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -36,7 +36,7 @@ gradio = "^4.4.1"
|
||||||
[tool.poetry.group.local]
|
[tool.poetry.group.local]
|
||||||
optional = true
|
optional = true
|
||||||
[tool.poetry.group.local.dependencies]
|
[tool.poetry.group.local.dependencies]
|
||||||
llama-cpp-python = "^0.2.11"
|
llama-cpp-python = "^0.2.23"
|
||||||
numpy = "1.26.0"
|
numpy = "1.26.0"
|
||||||
sentence-transformers = "^2.2.2"
|
sentence-transformers = "^2.2.2"
|
||||||
# https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path
|
# https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path
|
||||||
|
|
|
@ -48,8 +48,8 @@ qdrant:
|
||||||
|
|
||||||
local:
|
local:
|
||||||
prompt_style: "llama2"
|
prompt_style: "llama2"
|
||||||
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
|
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
|
||||||
llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
|
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
|
||||||
embedding_hf_model_name: BAAI/bge-small-en-v1.5
|
embedding_hf_model_name: BAAI/bge-small-en-v1.5
|
||||||
|
|
||||||
sagemaker:
|
sagemaker:
|
||||||
|
|
Loading…
Reference in New Issue