Iván Martínez
0db5aebf2f
Use chromadb max_batch_size public attribute
2023-09-25 11:42:16 +02:00
Iván Martínez
91163a247b
Batch embeddings to be processed by chromadb
2023-08-31 16:36:19 +02:00
Iván Martínez
2940f987c0
Merge pull request #822 from VaiTon/fix/env-not-existing
...
Better error message if .env is empty/does not exist.
2023-08-28 17:41:47 +02:00
Iván Martínez
7b294ed31f
Update dependencies. Upgrade chromadb integration.
2023-08-28 17:32:56 +02:00
parampavar
8f369dd2b9
Adding support to ingest files with extensions in uppercase
...
Files in the source_directory where ignored if their extensions where in uppercase like (*.PDF).
This change supports ingestion of files that match either lowercase or uppercase extensions like *.pdf or *.PDF.
This can be enhanced further to support camelcase like *.Pdf at a later stage. The assumption is that this scenario is probably less than 5%.
2023-08-16 16:03:56 -07:00
VaiTon
28537b6a84
Better error message if .env is empty/does not exist.
2023-07-06 00:16:11 +02:00
sj
05c7330643
Enhancement better performance for PDF loader
2023-06-07 23:51:05 +08:00
Ravi
e9b31f7dd9
Update ingest.py
...
Co-authored-by: Bailey Matthews <bailey@hey.com>
2023-05-31 22:42:10 +05:30
Ravindra Prasad
db341e2a40
fixed the the csv file reading issue
2023-05-31 00:04:56 +05:30
Iván Martínez
80b9b1d03e
Better logs during ingestion
2023-05-20 12:11:21 +02:00
Iván Martínez
4a0e0d2e70
Use chunk_size variable in logs. Make vectorstore check more flexible
2023-05-20 12:02:40 +02:00
Iván Martínez
7180d4386b
Merge branch 'main' of https://github.com/maozdemir/privateGPT into maozdemir-main
2023-05-20 11:48:29 +02:00
Iván Martínez
20554a7c9d
Merge pull request #292 from jiangzhuo/feature/multiprocessing-for-document-loading
...
Optimize load_documents function with multiprocessing
2023-05-20 10:57:42 +02:00
MDW
7f918a9fa1
Make scripts executeable, add basic pre-commit setup
2023-05-19 23:21:39 +02:00
MDW
4cda348cf8
Fix #294 (tested)
2023-05-19 16:23:09 +02:00
jiangzhuo
ba0dbe8d1c
Add progress bar to load_documents function
...
Enhanced the load_documents() function by adding a progress bar using the tqdm library. This change improves user experience by providing real-time feedback on the progress of document loading. Now, users can easily track the progress of this operation, especially when loading a large number of documents.
2023-05-19 10:59:38 +09:00
jiangzhuo
81b221bccb
Optimize load_documents function with multiprocessing
2023-05-19 10:58:28 +09:00
MDW
a862ff2be6
Add fallback for plain elm #294 #290
2023-05-19 01:04:42 +02:00
Iván Martínez
b9f8dc312f
Merge pull request #254 from Fabio3rs/formatOffice97-2003
...
Add .doc .ppt (Word and PowerPoint 97/2003 formats)
2023-05-18 23:49:40 +02:00
impulsivus
7844553ca1
Implement a way of ingesting more documents
...
Move environment variables to the global scope
Add a better check for vectorstore existence
Introduced a new function for better readability
Co-authored-by: Pulp <51127079+PulpCattel@users.noreply.github.com>
2023-05-18 17:45:38 +03:00
Fabio Rossini Sluzala
ec126b51d8
Fix loader mapping order
2023-05-17 22:38:30 -03:00
vilaca
79a3c00313
remove duplicate
2023-05-17 23:45:27 +01:00
Fabio Rossini Sluzala
66a9f9cde0
Add .doc .ppt (Word and PowerPoint 97/2003 formats)
2023-05-17 12:04:16 -03:00
Iván Martínez
bf3bddfbb6
More loaders, generic method
...
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-17 00:55:21 +02:00
Iván Martínez
23d24c88e9
Update code to use sentence-transformers through huggingfaceembeddings
2023-05-17 00:32:41 +02:00
Andrea Pinto
d0aa57178a
ingest unlimited number of documents
2023-05-12 15:36:20 +02:00
Andrea Pinto
01f55441e7
fix persist db directory at ingestion
2023-05-12 10:37:10 +02:00
Sorin Neacsu
544ddd9631
load .env
2023-05-11 15:34:17 -07:00
alxspiker
f60dbb520e
Merge branch 'main' into main
2023-05-11 14:34:13 -06:00
alxspiker
52ae6c0866
.env + LlamaCpp + PDF/CSV + Ingest All
...
.env
Added an env file to make configuration easier
LlamaCpp
Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)
PDF/CSV
Added support for PDF and CSV files.
Ingest All
All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.
2023-05-11 14:24:39 -06:00
R-Y-M-R
f12ea568e5
Use constants.py file
2023-05-11 10:29:07 -04:00
R-Y-M-R
8c6a81a07f
Fix: Disable Chroma Telemetry
...
Opts-out of anonymized telemetry being tracked in Chroma.
See: https://docs.trychroma.com/telemetry
2023-05-11 10:17:18 -04:00
Iván Martínez
026b9f895c
Use RecursiveCharacterTextSplitter to avoid llama_tokenize: too many tokens error during ingestion
2023-05-09 00:21:02 +02:00
Iván Martínez
92244a90b4
Use a different text splitter to improve results. Ingest takes an argument pointing to the doc to ingest.
2023-05-05 17:32:31 +02:00
Iván martínez
55338b8f6e
End-to-end working version
2023-05-02 20:32:28 +02:00