Refer to Model Configs for how to set the environment variables for your particular deployment.

Note: While we support local LLMs, you will get significantly better responses with a more powerful model like GPT-4.

What is GPT4All

GPT4All provides a way to run LLMs (closed and opensource) by calling APIs or running in memory. For self-hosted models, GPT4All offers models that are quantized or running with reduced float precision. Both of these are ways to compress models to run on weaker hardware at a slight cost in model capabilities.

GPT4All provides a Python wrapper which Danswer uses to run the models in same container as the Danswer API Server.

Because GPT4All is not compatible with certain architectures, Danswer does not package it by default. You will have to install it in your deployment by uncommenting gpt4all==2.0.2 in danswer/backend/requirements/default.txt and rebuilding with GPT4All installed.

Note: Despite GPT4All offering quantized models, it is still significantly slower than models fully hosted on GPUs. If you’re running the models purely on CPU, there may be significant delay to processing the context documents and in generating answers.

Set Danswer to use GPT4All

GEN_AI_MODEL_PROVIDER=gpt4all
GEN_AI_MODEL_VERSION=mistral-7b-openorca.Q4_0.gguf  # Or any other GPT4All model

# Let's also make some changes to accommodate the weaker locally hosted LLM
QA_TIMEOUT=120  # Set a longer timeout, running models on CPU can be slow
# Always run search, never skip
DISABLE_LLM_CHOOSE_SEARCH=True
# Don't use LLM for reranking, the prompts aren't properly tuned for these models
DISABLE_LLM_CHUNK_FILTER=True
# Don't try to rephrase the user query, the prompts aren't properly tuned for these models
DISABLE_LLM_QUERY_REPHRASE=True
# Don't use LLM to automatically discover time/source filters
DISABLE_LLM_FILTER_EXTRACTION=True
# Use only 1 section from the documents and do not require quotes
QA_PROMPT_OVERRIDE=weak