02 · REFERENCE

Config TOML

XERJ reads one TOML file. Every key has a production-ready default, so the smallest working config is an empty file. The table below is the full surface; the sections below it walk each group with a runnable example.

Paths for the config are, in precedence order: --config /path/to/xerj.toml on the command line, then /etc/xerj/xerj.toml, then ./xerj.toml in the working directory. If none exist, the full default is used.

Full key table

KEY

TYPE

DEFAULT

DESCRIPTION

[server]

rest_port

u16

8080

Native REST API listener port. All native endpoints (/v1/*) live here.

grpc_port

u16

8081

Reserved for a future gRPC API. Not wired in v0.1 — leave at default.

es_compat_port

u16

9200

Elasticsearch-compatible wire port. Point Kibana, Logstash, or the ES client here. Set to 0 to disable.

bind_address

string

"0.0.0.0"

Interface to bind every listener to. Use "127.0.0.1" for dev or a private address in production.

data_dir

path

"./data"

Root for indices, WAL files, and segments. Needs fast I/O and enough free space.

[auth]

enabled

bool

true

Require an API key on every request. Set to false only on a trusted network.

admin_api_key

string

Static admin key. Left empty, a 256-bit key is generated on first run and written to /admin.key.

[tls]

enabled

bool

false

Terminate TLS at the server. Requires cert_path + key_path. If you terminate at a proxy, leave false.

cert_path

path

PEM X.509 certificate. Use a CA-signed cert in production.

key_path

path

PEM private key. Permissions should be 0600.

[storage]

wal_sync

enum

"batched"

"sync" · "batched" · "async". Durability vs throughput. Batched is the recommended default.

wal_batch_ms

u32

100

Fsync cadence when wal_sync="batched" (ms). Range 1–10000. Lower = smaller loss window.

wal_max_size_mb

u32

512

WAL rollover threshold (MiB). Larger means fewer rollovers, longer crash recovery.

flush_size_mb

u32

256

Memtable buffer size that triggers a segment flush (MiB).

flush_interval_secs

u32

Maximum wall-clock interval between flushes even if the buffer is not full.

[merge]

strategy

enum

"size_tiered"

"size_tiered" or "log_structured". Size-tiered for write-heavy, log-structured for mixed workloads.

min_segments

u32

Minimum live segments before a merge pass is scheduled. Must be ≥ 2.

max_segment_mb

u32

5120

Upper bound on a mergeable segment (MiB). Segments larger than this are never merged.

io_rate_mb_per_sec

u32

100

Merge I/O throttle (MiB/s). Set 0 to disable — not recommended in production.

max_concurrent

Concurrent merge workers per index. Bump to 2–4 on fast NVMe.

[compression]

enabled

bool

true

Block-level compression. Disabling increases disk usage and lowers read CPU.

level

enum

"balanced"

"fast" (LZ4), "balanced" (Zstd L3, default), "best" (Zstd L19, cold storage).

block_size_docs

u32

128

Docs per compressed block. Range 16–4096. Larger = better ratio, higher fetch cost.

[fts]

default_analyzer

enum

"standard"

"standard" · "whitespace" · "simple" · "english". Override per-field in the mapping.

[vector]

default_metric

enum

"cosine"

"cosine" · "dot_product" · "euclidean". Cosine for text embeddings is the usual choice.

hnsw_m

u32

Bi-directional edges per layer. Higher = better recall, more RAM. Typical 8–64.

hnsw_ef_construction

u32

200

Beam width at index-build time. Must be ≥ hnsw_m. Typical 100–500.

hnsw_ef_search

u32

100

Default query-time beam width. Per-query override available via `ef_search` in the KNN body.

default_quantization

enum

"scalar8"

"none" · "scalar8" · "scalar4" · "binary". scalar8 = 4× RAM saving, 1–2% recall loss.

hnsw_offload_threshold

u32

Auto-downgrade new vectors to scalar4 once the index exceeds N vectors. 0 disables.

max_dimensions

u32

16384

Upper bound on vector dimensionality. 4× the Elasticsearch limit of 4096.

[logs]

retention_days

u32

Auto-delete log docs older than N days. 0 disables.

time_partition

enum

"1h"

"1m" · "5m" · "15m" · "1h" · "6h" · "1d". Time-slice granularity for retention pruning.

[embedding]

default_endpoint

string

OpenAI-compatible embeddings URL. Empty disables auto-embedding on ingest. Example: "https://api.openai.com/v1/embeddings".

default_model

string

Model name passed to the endpoint. Example: "text-embedding-3-small" or "nomic-embed-text".

batch_size

u32

Docs per embedding API call. Range 1–2048. Bigger batches amortise round-trip cost.

timeout_ms

u32

5000

HTTP timeout for embedding calls (ms). Ingest fails with a timeout error if exceeded.

[limits]

max_query_memory_mb

u32

512

Per-query memory cap (MiB). Queries that exceed it are cancelled.

max_concurrent_searches

u32

Global in-flight search ceiling. Extras are queued.

max_fields_per_index

u32

500

Field-explosion protection. ES default is 1000; 500 is intentionally stricter.

[indexing]

turbo_batch_size

u32

1000

Docs per batch in turbo mode. Range 500–5000. Larger = higher throughput, slightly higher latency.

turbo_parallel

bool

true

Parallel tokenisation on Rayon threads. Disable only for debugging.

turbo_fast_analyzer

bool

false

Skip stemming and stop-word removal in turbo mode. Trades recall for speed.

[cluster]

enabled

bool

false

Enable multi-node cluster mode. When true, the Raft state machine and cluster transport are started on port.

port

u16

9300

TCP port for intra-cluster Raft and search messages. Must be reachable from every peer.

peers

array

[]

Peer list in "node_id=host:port" format. The local node identifies itself from the entry matching bind_address:port. Example: ["a=10.0.0.1:9300","b=10.0.0.2:9300","c=10.0.0.3:9300"].

tick_ms

u64

Raft tick interval (ms). Lower = faster leader election at the cost of CPU.

[server]

Network listeners and the data directory. Most deployments only touch data_dir and the bind address.

[server]
rest_port      = 8080            # native /v1/* API
es_compat_port = 9200            # ES wire-compatible API
grpc_port      = 8081            # reserved
bind_address   = "0.0.0.0"       # listen on all interfaces
data_dir       = "/var/lib/xerj" # absolute path is strongly recommended

[auth]

Static API-key authentication. The first-run admin key is written to <data_dir>/admin.key; subsequent starts reuse it. Clients pass Authorization: ApiKey <key> on every request.

[auth]
enabled       = true
admin_api_key = ""               # blank → auto-generated on first run

# Or provide your own:
# admin_api_key = "ak_live_c8f9a4…"

[tls]

TLS termination at the server. In Kubernetes or behind a load balancer, leave this off and terminate at the proxy instead — one place to rotate certs, one place to log handshakes.

[tls]
enabled   = true
cert_path = "/etc/xerj/certs/server.crt"
key_path  = "/etc/xerj/certs/server.key"

[storage]

The WAL and flush tuning. wal_sync is the durability knob everyone looks for — pick "sync" for financial/compliance workloads, "batched" for everything else, and "async" only in benchmarks.

[storage]
wal_sync            = "batched"
wal_batch_ms        = 100        # fsync every 100 ms
wal_max_size_mb     = 512        # roll WAL every 512 MiB
flush_size_mb       = 256        # flush memtable at 256 MiB
flush_interval_secs = 30

[merge]

Segment compaction. size_tiered is the right default: it merges same-size segments, which is cheap and write-optimal. Switch to log_structured if your reads are doing a lot of full-range scans.

[merge]
strategy           = "size_tiered"
min_segments       = 10
max_segment_mb     = 5120        # 5 GiB cap
io_rate_mb_per_sec = 100         # don't starve queries
max_concurrent     = 2           # bump on NVMe

[compression]

See Compression for the encoding catalog. This section picks the outer block codec only — the inner per-column encodings are chosen automatically at write time.

[compression]
enabled         = true
level           = "balanced"     # LZ4 / Zstd L3 / Zstd L19
block_size_docs = 128

[fts]

Default analyzer applied to untyped text fields. Override per-field in the mapping when creating an index. See Analyzers for the built-ins.

[fts]
default_analyzer = "standard"    # unicode words + lowercase

[vector]

HNSW index tuning and the default quantization scheme. These defaults are chosen so a 1 M × 768-dim index fits in a few GiB of RAM with ~99% recall. Override per-field in the mapping for exotic cases.

[vector]
default_metric         = "cosine"
hnsw_m                 = 16
hnsw_ef_construction   = 200
hnsw_ef_search         = 100
default_quantization   = "scalar8"   # 4× RAM saving
hnsw_offload_threshold = 1000000     # auto-scalar4 past 1 M vectors
max_dimensions         = 16384

[logs]

Time-series retention. Log indices are sliced into partitions of time_partition width so retention prunes are O(partitions), not O(documents).

[logs]
retention_days = 30              # keep 30 days
time_partition = "1h"            # 1-hour partitions

[embedding]

Delegates vector generation to an OpenAI-compatible endpoint. Leave default_endpoint empty if clients provide vectors themselves. Token limits are model-specific; the chunker in the ai crate splits long documents to fit the model's window.

[embedding]
# OpenAI:
default_endpoint = "https://api.openai.com/v1/embeddings"
default_model    = "text-embedding-3-small"
batch_size       = 64
timeout_ms       = 5000

# Or a local Ollama:
# default_endpoint = "http://localhost:11434/v1/embeddings"
# default_model    = "nomic-embed-text"

[limits]

Hard caps to protect the server from runaway queries and mapping explosions. Lower these on shared nodes, raise max_query_memory_mb for aggregation-heavy workloads.

[limits]
max_query_memory_mb     = 512
max_concurrent_searches = 64
max_fields_per_index    = 500

[indexing]

Turbo mode knobs. Turbo is opt-in per-request via POST /v1/indices/:name/turbo-ingest or the X-Turbo: true header on _bulk; these settings only apply when turbo is active.

[indexing]
turbo_batch_size    = 2000
turbo_parallel      = true
turbo_fast_analyzer = false      # true only if recall doesn't matter

[cluster]

Multi-node mode. Default is off — single-node doesn't need a consensus layer. When enabled, the embedded Raft implementation replicates metadata only (schemas, shard assignments, node roster). See Clustering for the full story.

[cluster]
enabled = true
port    = 9300                   # intra-cluster gRPC + Raft
peers   = [
  "a=10.0.0.11:9300",
  "b=10.0.0.12:9300",
  "c=10.0.0.13:9300",
]
tick_ms = 50

Source · engine/xerj.default.toml · engine/crates/common/src/config.rs

◀ PREVCLI

NEXT ▶Environment