File Connector
Access knowledge from Local Files
How it works
The File Connector indexes user uploaded files. Currently supports .txt
, .pdf
, .docx
, .pptx
, .xlsx
, .csv
, .md
, .mdx
, .conf
, .log
, .json
, .tsv
, .xml
, .yml
, .yaml
, .eml
, and .epub
files.
You can also upload a .zip
containing these files - If there are other file types in the zip, the other file types are ignored.
There is also an optional metadata line that supports links, document owners, and time updated as metadata for Danswer’s retrieval and AI Answer.
Adding Metadata
The metadata line should be placed at the very top of the file and can take one of two formats:
#DANSWER_METADATA={"link": "<LINK>"}
<!-- DANSWER_METADATA={"link": "<LINK>"} -->
Where DANSWER_METADATA= is followed by a json. The valid json keys are:
- link
- primary_owners
- secondary_owners
- doc_updated_at
- file_display_name
You can also include arbitrary key/value pairs which will be understood as “tags”. These tags can then be used in the UI as a filter if you want to constrain your search / conversation to only documents with certain tag(s) attached
For example:
#DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md", "primary_owners": ["yuhong@danswer.ai", "chris@danswer.ai"], "secondary_owners": ["founders@danswer.ai"], "doc_updated_at": "2023-11-30T13:06:08.589616-08:00", "file_display_name": "Desired File Name!", "tag_of_your_choice": "tag_value"}
Full file example:
#DANSWER_METADATA={"link": "https://www.danswer.com/captcha", "file_display_name": "Captcha Setup"}
How to set up captcha
Follow the example below to set up a captcha
like you saw when you visited this page!
By including a captcha, this page is able to
prevent web scrapers from reading it.
As we can see, there are many web pages or internal tools that aren’t directly scrape-able. In addition to handling local file uploads, the file connector is offered as an option for these tools which may have APIs for accessing the contents.
Alternatively, if you are uploading a .zip
filled with files, you can define a .danswer_metadata.json
file and include that in the root of the zip file.
If your zip file structure looks like:
| file1.txt
| file2.txt
| .danswer_metadata.json
then your .danswer_metadata.json
file might look like:
[
{
"filename": "file1.txt",
"link": "<LINK_TO_FILE1>",
"file_display_name": "<WHAT_YOU_WANT_THE_NAME_OF_FILE1_TO_BE_IN_THE_UI>"},
"primary_owners": ["<FILE1_OWNER>"],
// this is an arbitrary tag, can be any key/value pair and can be used in the UI as
// a filter if you want to constrain your search / conversation to only documents with
// this tag attached
"status": "<SOME_STATUS>"
},
{
"filename": "file2.txt",
"link": "<LINK_TO_FILE2>",
"file_display_name": "<WHAT_YOU_WANT_THE_NAME_OF_FILE2_TO_BE_IN_THE_UI>"},
"primary_owners": ["<FILE2_OWNER>"],
// this is an arbitrary tag, can be any key/value pair and can be used in the UI as
// a filter if you want to constrain your search / conversation to only documents with
// this tag attached
"status": "<SOME_OTHER_STATUS>"
}
]
Setting up
Authorization
- No external auth flows required.
- Admins can upload files and make them available to everyone
- [WIP] Admins or normal users will be able to upload files via personal connectors and make them accessible for just themselves.
Indexing
- Navigate to the Admin Dashboard and select the File Connector.
- Select a
.txt
file or a.zip
file and click