XERJ reads one TOML file. Every key has a production-ready default, so the smallest working config is an empty file. The table below is the full surface; the sections below it walk each group with a runnable example.
Paths for the config are, in precedence order: --config /path/to/xerj.toml on the command line, then /etc/xerj/xerj.toml, then ./xerj.toml in the working directory. If none exist, the full default is used.
Full key table
KEY
TYPE
DEFAULT
DESCRIPTION
[server]
rest_port
u16
8080
Native REST API listener port. All native endpoints (/v1/*) live here.
grpc_port
u16
8081
Reserved for a future gRPC API. Not wired in v0.1 — leave at default.
es_compat_port
u16
9200
Elasticsearch-compatible wire port. Point Kibana, Logstash, or the ES client here. Set to 0 to disable.
bind_address
string
"0.0.0.0"
Interface to bind every listener to. Use "127.0.0.1" for dev or a private address in production.
data_dir
path
"./data"
Root for indices, WAL files, and segments. Needs fast I/O and enough free space.
[auth]
enabled
bool
true
Require an API key on every request. Set to false only on a trusted network.
admin_api_key
string
""
Static admin key. Left empty, a 256-bit key is generated on first run and written to /admin.key.
[tls]
enabled
bool
false
Terminate TLS at the server. Requires cert_path + key_path. If you terminate at a proxy, leave false.
cert_path
path
""
PEM X.509 certificate. Use a CA-signed cert in production.
key_path
path
""
PEM private key. Permissions should be 0600.
[storage]
wal_sync
enum
"batched"
"sync" · "batched" · "async". Durability vs throughput. Batched is the recommended default.
wal_batch_ms
u32
100
Fsync cadence when wal_sync="batched" (ms). Range 1–10000. Lower = smaller loss window.
OpenAI-compatible embeddings URL. Empty disables auto-embedding on ingest. Example: "https://api.openai.com/v1/embeddings".
default_model
string
""
Model name passed to the endpoint. Example: "text-embedding-3-small" or "nomic-embed-text".
batch_size
u32
64
Docs per embedding API call. Range 1–2048. Bigger batches amortise round-trip cost.
timeout_ms
u32
5000
HTTP timeout for embedding calls (ms). Ingest fails with a timeout error if exceeded.
[limits]
max_query_memory_mb
u32
512
Per-query memory cap (MiB). Queries that exceed it are cancelled.
max_concurrent_searches
u32
64
Global in-flight search ceiling. Extras are queued.
max_fields_per_index
u32
500
Field-explosion protection. ES default is 1000; 500 is intentionally stricter.
[indexing]
turbo_batch_size
u32
1000
Docs per batch in turbo mode. Range 500–5000. Larger = higher throughput, slightly higher latency.
turbo_parallel
bool
true
Parallel tokenisation on Rayon threads. Disable only for debugging.
turbo_fast_analyzer
bool
false
Skip stemming and stop-word removal in turbo mode. Trades recall for speed.
[cluster]
enabled
bool
false
Enable multi-node cluster mode. When true, the Raft state machine and cluster transport are started on port.
port
u16
9300
TCP port for intra-cluster Raft and search messages. Must be reachable from every peer.
peers
array
[]
Peer list in "node_id=host:port" format. The local node identifies itself from the entry matching bind_address:port. Example: ["a=10.0.0.1:9300","b=10.0.0.2:9300","c=10.0.0.3:9300"].
tick_ms
u64
50
Raft tick interval (ms). Lower = faster leader election at the cost of CPU.
[server]
Network listeners and the data directory. Most deployments only touch data_dir and the bind address.
[server]
rest_port = 8080 # native /v1/* API
es_compat_port = 9200 # ES wire-compatible API
grpc_port = 8081 # reserved
bind_address = "0.0.0.0" # listen on all interfaces
data_dir = "/var/lib/xerj" # absolute path is strongly recommended
[auth]
Static API-key authentication. The first-run admin key is written to <data_dir>/admin.key; subsequent starts reuse it. Clients pass Authorization: ApiKey <key> on every request.
[auth]
enabled = true
admin_api_key = "" # blank → auto-generated on first run
# Or provide your own:
# admin_api_key = "ak_live_c8f9a4…"
[tls]
TLS termination at the server. In Kubernetes or behind a load balancer, leave this off and terminate at the proxy instead — one place to rotate certs, one place to log handshakes.
The WAL and flush tuning. wal_sync is the durability knob everyone looks for — pick "sync" for financial/compliance workloads, "batched" for everything else, and "async" only in benchmarks.
[storage]
wal_sync = "batched"
wal_batch_ms = 100 # fsync every 100 ms
wal_max_size_mb = 512 # roll WAL every 512 MiB
flush_size_mb = 256 # flush memtable at 256 MiB
flush_interval_secs = 30
[merge]
Segment compaction. size_tiered is the right default: it merges same-size segments, which is cheap and write-optimal. Switch to log_structured if your reads are doing a lot of full-range scans.
See Compression for the encoding catalog. This section picks the outer block codec only — the inner per-column encodings are chosen automatically at write time.
Default analyzer applied to untyped text fields. Override per-field in the mapping when creating an index. See Analyzers for the built-ins.
[fts]
default_analyzer = "standard" # unicode words + lowercase
[vector]
HNSW index tuning and the default quantization scheme. These defaults are chosen so a 1 M × 768-dim index fits in a few GiB of RAM with ~99% recall. Override per-field in the mapping for exotic cases.
Delegates vector generation to an OpenAI-compatible endpoint. Leave default_endpoint empty if clients provide vectors themselves. Token limits are model-specific; the chunker in the ai crate splits long documents to fit the model's window.
[embedding]
# OpenAI:
default_endpoint = "https://api.openai.com/v1/embeddings"
default_model = "text-embedding-3-small"
batch_size = 64
timeout_ms = 5000
# Or a local Ollama:
# default_endpoint = "http://localhost:11434/v1/embeddings"
# default_model = "nomic-embed-text"
[limits]
Hard caps to protect the server from runaway queries and mapping explosions. Lower these on shared nodes, raise max_query_memory_mb for aggregation-heavy workloads.
Turbo mode knobs. Turbo is opt-in per-request via POST /v1/indices/:name/turbo-ingest or the X-Turbo: true header on _bulk; these settings only apply when turbo is active.
[indexing]
turbo_batch_size = 2000
turbo_parallel = true
turbo_fast_analyzer = false # true only if recall doesn't matter
[cluster]
Multi-node mode. Default is off — single-node doesn't need a consensus layer. When enabled, the embedded Raft implementation replicates metadata only (schemas, shard assignments, node roster). See Clustering for the full story.