feat(settings): Update default model to TheBloke/Mistral-7B-Instruct-v0.2-GGUF (#1415)

* Update LlamaCPP dependency * Default to TheBloke/Mistral-7B-Instruct-v0.2-GGUF * Fix API docs
2023-12-17 16:11:08 +01:00 · 2023-12-17 16:11:08 +01:00 · 8ec7cf49f4
parent c71ae7cee9
commit 8ec7cf49f4
5 changed files with 1433 additions and 1233 deletions
--- a/fern/docs/pages/api-reference/api-reference.mdx
+++ b/fern/docs/pages/api-reference/api-reference.mdx
@ -1 +1,14 @@
 # API Reference
 The API is divided in two logical blocks:
 1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
    - Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
      embedding generation and storage.
    - Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
      engineering and the response generation.
 2. Low-level API, allowing advanced users to implement their own complex pipelines:
    - Embeddings generation: based on a piece of text.
    - Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
      documents.
--- a/fern/docs/pages/overview/welcome.mdx
+++ b/fern/docs/pages/overview/welcome.mdx
@ -32,21 +32,6 @@ The installation guide will help you in the [Installation section](/installation
  />
 </Cards>
 ## API Organization 
 The API is divided in two logical blocks:
 1. High-level API, abstracting all the complexity of a RAG (Retrieval Augmented Generation) pipeline implementation:
    - Ingestion of documents: internally managing document parsing, splitting, metadata extraction,
      embedding generation and storage.
    - Chat & Completions using context from ingested documents: abstracting the retrieval of context, the prompt
      engineering and the response generation.
 2. Low-level API, allowing advanced users to implement their own complex pipelines:
    - Embeddings generation: based on a piece of text.
    - Contextual chunks retrieval: given a query, returns the most relevant chunks of text from the ingested
      documents.
 <Callout intent = "info">
 A working **Gradio UI client** is provided to test the API, together with a set of useful tools such as bulk
 model download script, ingestion script, documents folder watch, etc.
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -36,7 +36,7 @@ gradio = "^4.4.1"
 [tool.poetry.group.local]
 optional = true
 [tool.poetry.group.local.dependencies]
-llama-cpp-python = "^0.2.11"
+llama-cpp-python = "^0.2.23"
 numpy = "1.26.0"
 sentence-transformers = "^2.2.2"
 # https://stackoverflow.com/questions/76327419/valueerror-libcublas-so-0-9-not-found-in-the-system-path
--- a/settings.yaml
+++ b/settings.yaml
@ -48,8 +48,8 @@ qdrant:
 local:
  prompt_style: "llama2"
-  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
+  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
-  llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
+  llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
  embedding_hf_model_name: BAAI/bge-small-en-v1.5
 sagemaker: