Commit Graph

19 Commits

Author SHA1 Message Date
MDW 04f6706bbb Make scripts executeable, add basic pre-commit setup 2023-05-20 11:15:58 +02:00
MDW 4cda348cf8 Fix #294 (tested) 2023-05-19 16:23:09 +02:00
MDW a862ff2be6 Add fallback for plain elm #294 #290 2023-05-19 01:04:42 +02:00
Iván Martínez b9f8dc312f
Merge pull request #254 from Fabio3rs/formatOffice97-2003
Add .doc .ppt (Word and PowerPoint 97/2003 formats)
2023-05-18 23:49:40 +02:00
Fabio Rossini Sluzala ec126b51d8
Fix loader mapping order 2023-05-17 22:38:30 -03:00
vilaca 79a3c00313 remove duplicate 2023-05-17 23:45:27 +01:00
Fabio Rossini Sluzala 66a9f9cde0
Add .doc .ppt (Word and PowerPoint 97/2003 formats) 2023-05-17 12:04:16 -03:00
Iván Martínez bf3bddfbb6 More loaders, generic method
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-17 00:55:21 +02:00
Iván Martínez 23d24c88e9 Update code to use sentence-transformers through huggingfaceembeddings 2023-05-17 00:32:41 +02:00
Andrea Pinto d0aa57178a ingest unlimited number of documents 2023-05-12 15:36:20 +02:00
Andrea Pinto 01f55441e7 fix persist db directory at ingestion 2023-05-12 10:37:10 +02:00
Sorin Neacsu 544ddd9631
load .env 2023-05-11 15:34:17 -07:00
alxspiker f60dbb520e
Merge branch 'main' into main 2023-05-11 14:34:13 -06:00
alxspiker 52ae6c0866 .env + LlamaCpp + PDF/CSV + Ingest All
.env

Added an env file to make configuration easier

LlamaCpp

Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)

PDF/CSV

Added support for PDF and CSV files.

Ingest All

All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.
2023-05-11 14:24:39 -06:00
R-Y-M-R f12ea568e5 Use constants.py file 2023-05-11 10:29:07 -04:00
R-Y-M-R 8c6a81a07f Fix: Disable Chroma Telemetry
Opts-out of anonymized telemetry being tracked in Chroma.

See: https://docs.trychroma.com/telemetry
2023-05-11 10:17:18 -04:00
Iván Martínez 026b9f895c Use RecursiveCharacterTextSplitter to avoid llama_tokenize: too many tokens error during ingestion 2023-05-09 00:21:02 +02:00
Iván Martínez 92244a90b4 Use a different text splitter to improve results. Ingest takes an argument pointing to the doc to ingest. 2023-05-05 17:32:31 +02:00
Iván martínez 55338b8f6e End-to-end working version 2023-05-02 20:32:28 +02:00