jiangzhuo
e3b769d33a
Optimize load_documents function with multiprocessing
2023-05-20 11:16:13 +02:00
MDW
04f6706bbb
Make scripts executeable, add basic pre-commit setup
2023-05-20 11:15:58 +02:00
MDW
4cda348cf8
Fix #294 (tested)
2023-05-19 16:23:09 +02:00
MDW
a862ff2be6
Add fallback for plain elm #294 #290
2023-05-19 01:04:42 +02:00
Iván Martínez
b9f8dc312f
Merge pull request #254 from Fabio3rs/formatOffice97-2003
...
Add .doc .ppt (Word and PowerPoint 97/2003 formats)
2023-05-18 23:49:40 +02:00
Fabio Rossini Sluzala
ec126b51d8
Fix loader mapping order
2023-05-17 22:38:30 -03:00
vilaca
79a3c00313
remove duplicate
2023-05-17 23:45:27 +01:00
Fabio Rossini Sluzala
66a9f9cde0
Add .doc .ppt (Word and PowerPoint 97/2003 formats)
2023-05-17 12:04:16 -03:00
Iván Martínez
bf3bddfbb6
More loaders, generic method
...
- Update the README with extra formats
- Add Powerpoint, requested in #138
- Add ePub requested in #138 comment - https://github.com/imartinez/privateGPT/pull/138#issuecomment-1549564535
- Update requirements
2023-05-17 00:55:21 +02:00
Iván Martínez
23d24c88e9
Update code to use sentence-transformers through huggingfaceembeddings
2023-05-17 00:32:41 +02:00
Andrea Pinto
d0aa57178a
ingest unlimited number of documents
2023-05-12 15:36:20 +02:00
Andrea Pinto
01f55441e7
fix persist db directory at ingestion
2023-05-12 10:37:10 +02:00
Sorin Neacsu
544ddd9631
load .env
2023-05-11 15:34:17 -07:00
alxspiker
f60dbb520e
Merge branch 'main' into main
2023-05-11 14:34:13 -06:00
alxspiker
52ae6c0866
.env + LlamaCpp + PDF/CSV + Ingest All
...
.env
Added an env file to make configuration easier
LlamaCpp
Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)
PDF/CSV
Added support for PDF and CSV files.
Ingest All
All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.
2023-05-11 14:24:39 -06:00
R-Y-M-R
f12ea568e5
Use constants.py file
2023-05-11 10:29:07 -04:00
R-Y-M-R
8c6a81a07f
Fix: Disable Chroma Telemetry
...
Opts-out of anonymized telemetry being tracked in Chroma.
See: https://docs.trychroma.com/telemetry
2023-05-11 10:17:18 -04:00
Iván Martínez
026b9f895c
Use RecursiveCharacterTextSplitter to avoid llama_tokenize: too many tokens error during ingestion
2023-05-09 00:21:02 +02:00
Iván Martínez
92244a90b4
Use a different text splitter to improve results. Ingest takes an argument pointing to the doc to ingest.
2023-05-05 17:32:31 +02:00
Iván martínez
55338b8f6e
End-to-end working version
2023-05-02 20:32:28 +02:00