Files in the source_directory where ignored if their extensions where in uppercase like (*.PDF).
This change supports ingestion of files that match either lowercase or uppercase extensions like *.pdf or *.PDF.
This can be enhanced further to support camelcase like *.Pdf at a later stage. The assumption is that this scenario is probably less than 5%.
Enhanced the load_documents() function by adding a progress bar using the tqdm library. This change improves user experience by providing real-time feedback on the progress of document loading. Now, users can easily track the progress of this operation, especially when loading a large number of documents.
Move environment variables to the global scope
Add a better check for vectorstore existence
Introduced a new function for better readability
Co-authored-by: Pulp <51127079+PulpCattel@users.noreply.github.com>
.env
Added an env file to make configuration easier
LlamaCpp
Added support for LlamaCpp in .env (MODEL_TYPE=LlamaCpp)
PDF/CSV
Added support for PDF and CSV files.
Ingest All
All files in source_documents will automatically get stored in vector store based on their file type when running ingest, no longer need a path argument.