Commit Graph

26 Commits

Author SHA1 Message Date
Iván Martínez 80b9b1d03e Better logs during ingestion 2023-05-20 12:11:21 +02:00
Iván Martínez 4a0e0d2e70 Use chunk_size variable in logs. Make vectorstore check more flexible 2023-05-20 12:02:40 +02:00
Iván Martínez 7180d4386b Merge branch 'main' of https://github.com/maozdemir/privateGPT into maozdemir-main 2023-05-20 11:48:29 +02:00
Iván Martínez 20554a7c9d
Merge pull request #292 from jiangzhuo/feature/multiprocessing-for-document-loading
Optimize load_documents function with multiprocessing
2023-05-20 10:57:42 +02:00
MDW 7f918a9fa1 Make scripts executeable, add basic pre-commit setup 2023-05-19 23:21:39 +02:00
MDW 4cda348cf8 Fix #294 (tested) 2023-05-19 16:23:09 +02:00
jiangzhuo ba0dbe8d1c Add progress bar to load_documents function
Enhanced the load_documents() function by adding a progress bar using the tqdm library. This change improves user experience by providing real-time feedback on the progress of document loading. Now, users can easily track the progress of this operation, especially when loading a large number of documents.
2023-05-19 10:59:38 +09:00
jiangzhuo 81b221bccb Optimize load_documents function with multiprocessing 2023-05-19 10:58:28 +09:00
MDW a862ff2be6 Add fallback for plain elm #294 #290 2023-05-19 01:04:42 +02:00
Iván Martínez b9f8dc312f
Merge pull request #254 from Fabio3rs/formatOffice97-2003
Add .doc .ppt (Word and PowerPoint 97/2003 formats)
2023-05-18 23:49:40 +02:00
impulsivus 7844553ca1
Implement a way of ingesting more documents
Move environment variables to the global scope
Add a better check for vectorstore existence
Introduced a new function for better readability
Co-authored-by: Pulp <51127079+PulpCattel@users.noreply.github.com>
2023-05-18 17:45:38 +03:00
Fabio Rossini Sluzala ec126b51d8
Fix loader mapping order 2023-05-17 22:38:30 -03:00
vilaca 79a3c00313 remove duplicate 2023-05-17 23:45:27 +01:00
Fabio Rossini Sluzala 66a9f9cde0
Add .doc .ppt (Word and PowerPoint 97/2003 formats) 2023-05-17 12:04:16 -03:00
Iván Martínez bf3bddfbb6 More loaders, generic method
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-17 00:55:21 +02:00
Iván Martínez 23d24c88e9 Update code to use sentence-transformers through huggingfaceembeddings 2023-05-17 00:32:41 +02:00
Andrea Pinto d0aa57178a ingest unlimited number of documents 2023-05-12 15:36:20 +02:00
Andrea Pinto 01f55441e7 fix persist db directory at ingestion 2023-05-12 10:37:10 +02:00
Sorin Neacsu 544ddd9631
load .env 2023-05-11 15:34:17 -07:00
alxspiker f60dbb520e
Merge branch 'main' into main 2023-05-11 14:34:13 -06:00
alxspiker 52ae6c0866 .env + LlamaCpp + PDF/CSV + Ingest All
.env

Added an env file to make configuration easier

LlamaCpp

Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)

PDF/CSV

Added support for PDF and CSV files.

Ingest All

All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.
2023-05-11 14:24:39 -06:00
R-Y-M-R f12ea568e5 Use constants.py file 2023-05-11 10:29:07 -04:00
R-Y-M-R 8c6a81a07f Fix: Disable Chroma Telemetry
Opts-out of anonymized telemetry being tracked in Chroma.

See: https://docs.trychroma.com/telemetry
2023-05-11 10:17:18 -04:00
Iván Martínez 026b9f895c Use RecursiveCharacterTextSplitter to avoid llama_tokenize: too many tokens error during ingestion 2023-05-09 00:21:02 +02:00
Iván Martínez 92244a90b4 Use a different text splitter to improve results. Ingest takes an argument pointing to the doc to ingest. 2023-05-05 17:32:31 +02:00
Iván martínez 55338b8f6e End-to-end working version 2023-05-02 20:32:28 +02:00