02 · REFERENCE

Config TOML

XERJ reads one TOML file. Every key has a production-ready default, so the smallest working config is an empty file. The table below is the full surface; the sections below it walk each group with a runnable example.

Paths for the config are, in precedence order: --config /path/to/xerj.toml on the command line, then /etc/xerj/xerj.toml, then ./xerj.toml in the working directory. If none exist, the full default is used.

Full key table

KEY
TYPE
DEFAULT
DESCRIPTION
[server]
rest_port
u16
8080
Native REST API listener port. All native endpoints (/v1/*) live here.
grpc_port
u16
8081
Reserved for a future gRPC API. Not wired in v0.1 — leave at default.
es_compat_port
u16
9200
Elasticsearch-compatible wire port. Point Kibana, Logstash, or the ES client here. Set to 0 to disable.
bind_address
string
"0.0.0.0"
Interface to bind every listener to. Use "127.0.0.1" for dev or a private address in production.
data_dir
path
"./data"
Root for indices, WAL files, and segments. Needs fast I/O and enough free space.
[auth]
enabled
bool
true
Require an API key on every request. Set to false only on a trusted network.
admin_api_key
string
""
Static admin key. Left empty, a 256-bit key is generated on first run and written to /admin.key.
[tls]
enabled
bool
false
Terminate TLS at the server. Requires cert_path + key_path. If you terminate at a proxy, leave false.
cert_path
path
""
PEM X.509 certificate. Use a CA-signed cert in production.
key_path
path
""
PEM private key. Permissions should be 0600.
[storage]
wal_sync
enum
"batched"
"sync" · "batched" · "async". Durability vs throughput. Batched is the recommended default.
wal_batch_ms
u32
100
Fsync cadence when wal_sync="batched" (ms). Range 1–10000. Lower = smaller loss window.
wal_max_size_mb
u32
512
WAL rollover threshold (MiB). Larger means fewer rollovers, longer crash recovery.
flush_size_mb
u32
256
Memtable buffer size that triggers a segment flush (MiB).
flush_interval_secs
u32
30
Maximum wall-clock interval between flushes even if the buffer is not full.
[merge]
strategy
enum
"size_tiered"
"size_tiered" or "log_structured". Size-tiered for write-heavy, log-structured for mixed workloads.
min_segments
u32
10
Minimum live segments before a merge pass is scheduled. Must be ≥ 2.
max_segment_mb
u32
5120
Upper bound on a mergeable segment (MiB). Segments larger than this are never merged.
io_rate_mb_per_sec
u32
100
Merge I/O throttle (MiB/s). Set 0 to disable — not recommended in production.
max_concurrent
u8
1
Concurrent merge workers per index. Bump to 2–4 on fast NVMe.
[compression]
enabled
bool
true
Block-level compression. Disabling increases disk usage and lowers read CPU.
level
enum
"balanced"
"fast" (LZ4), "balanced" (Zstd L3, default), "best" (Zstd L19, cold storage).
block_size_docs
u32
128
Docs per compressed block. Range 16–4096. Larger = better ratio, higher fetch cost.
[fts]
default_analyzer
enum
"standard"
"standard" · "whitespace" · "simple" · "english". Override per-field in the mapping.
[vector]
default_metric
enum
"cosine"
"cosine" · "dot_product" · "euclidean". Cosine for text embeddings is the usual choice.
hnsw_m
u32
16
Bi-directional edges per layer. Higher = better recall, more RAM. Typical 8–64.
hnsw_ef_construction
u32
200
Beam width at index-build time. Must be ≥ hnsw_m. Typical 100–500.
hnsw_ef_search
u32
100
Default query-time beam width. Per-query override available via `ef_search` in the KNN body.
default_quantization
enum
"scalar8"
"none" · "scalar8" · "scalar4" · "binary". scalar8 = 4× RAM saving, 1–2% recall loss.
hnsw_offload_threshold
u32
0
Auto-downgrade new vectors to scalar4 once the index exceeds N vectors. 0 disables.
max_dimensions
u32
16384
Upper bound on vector dimensionality. 4× the Elasticsearch limit of 4096.
[logs]
retention_days
u32
90
Auto-delete log docs older than N days. 0 disables.
time_partition
enum
"1h"
"1m" · "5m" · "15m" · "1h" · "6h" · "1d". Time-slice granularity for retention pruning.
[embedding]
default_endpoint
string
""
OpenAI-compatible embeddings URL. Empty disables auto-embedding on ingest. Example: "https://api.openai.com/v1/embeddings".
default_model
string
""
Model name passed to the endpoint. Example: "text-embedding-3-small" or "nomic-embed-text".
batch_size
u32
64
Docs per embedding API call. Range 1–2048. Bigger batches amortise round-trip cost.
timeout_ms
u32
5000
HTTP timeout for embedding calls (ms). Ingest fails with a timeout error if exceeded.
[limits]
max_query_memory_mb
u32
512
Per-query memory cap (MiB). Queries that exceed it are cancelled.
max_concurrent_searches
u32
64
Global in-flight search ceiling. Extras are queued.
max_fields_per_index
u32
500
Field-explosion protection. ES default is 1000; 500 is intentionally stricter.
[indexing]
turbo_batch_size
u32
1000
Docs per batch in turbo mode. Range 500–5000. Larger = higher throughput, slightly higher latency.
turbo_parallel
bool
true
Parallel tokenisation on Rayon threads. Disable only for debugging.
turbo_fast_analyzer
bool
false
Skip stemming and stop-word removal in turbo mode. Trades recall for speed.
[cluster]
enabled
bool
false
Enable multi-node cluster mode. When true, the Raft state machine and cluster transport are started on port.
port
u16
9300
TCP port for intra-cluster Raft and search messages. Must be reachable from every peer.
peers
array
[]
Peer list in "node_id=host:port" format. The local node identifies itself from the entry matching bind_address:port. Example: ["a=10.0.0.1:9300","b=10.0.0.2:9300","c=10.0.0.3:9300"].
tick_ms
u64
50
Raft tick interval (ms). Lower = faster leader election at the cost of CPU.

[server]

Network listeners and the data directory. Most deployments only touch data_dir and the bind address.

[server]
rest_port      = 8080            # native /v1/* API
es_compat_port = 9200            # ES wire-compatible API
grpc_port      = 8081            # reserved
bind_address   = "0.0.0.0"       # listen on all interfaces
data_dir       = "/var/lib/xerj" # absolute path is strongly recommended

[auth]

Static API-key authentication. The first-run admin key is written to <data_dir>/admin.key; subsequent starts reuse it. Clients pass Authorization: ApiKey <key> on every request.

[auth]
enabled       = true
admin_api_key = ""               # blank → auto-generated on first run

# Or provide your own:
# admin_api_key = "ak_live_c8f9a4…"

[tls]

TLS termination at the server. In Kubernetes or behind a load balancer, leave this off and terminate at the proxy instead — one place to rotate certs, one place to log handshakes.

[tls]
enabled   = true
cert_path = "/etc/xerj/certs/server.crt"
key_path  = "/etc/xerj/certs/server.key"

[storage]

The WAL and flush tuning. wal_sync is the durability knob everyone looks for — pick "sync" for financial/compliance workloads, "batched" for everything else, and "async" only in benchmarks.

[storage]
wal_sync            = "batched"
wal_batch_ms        = 100        # fsync every 100 ms
wal_max_size_mb     = 512        # roll WAL every 512 MiB
flush_size_mb       = 256        # flush memtable at 256 MiB
flush_interval_secs = 30

[merge]

Segment compaction. size_tiered is the right default: it merges same-size segments, which is cheap and write-optimal. Switch to log_structured if your reads are doing a lot of full-range scans.

[merge]
strategy           = "size_tiered"
min_segments       = 10
max_segment_mb     = 5120        # 5 GiB cap
io_rate_mb_per_sec = 100         # don't starve queries
max_concurrent     = 2           # bump on NVMe

[compression]

See Compression for the encoding catalog. This section picks the outer block codec only — the inner per-column encodings are chosen automatically at write time.

[compression]
enabled         = true
level           = "balanced"     # LZ4 / Zstd L3 / Zstd L19
block_size_docs = 128

[fts]

Default analyzer applied to untyped text fields. Override per-field in the mapping when creating an index. See Analyzers for the built-ins.

[fts]
default_analyzer = "standard"    # unicode words + lowercase

[vector]

HNSW index tuning and the default quantization scheme. These defaults are chosen so a 1 M × 768-dim index fits in a few GiB of RAM with ~99% recall. Override per-field in the mapping for exotic cases.

[vector]
default_metric         = "cosine"
hnsw_m                 = 16
hnsw_ef_construction   = 200
hnsw_ef_search         = 100
default_quantization   = "scalar8"   # 4× RAM saving
hnsw_offload_threshold = 1000000     # auto-scalar4 past 1 M vectors
max_dimensions         = 16384

[logs]

Time-series retention. Log indices are sliced into partitions of time_partition width so retention prunes are O(partitions), not O(documents).

[logs]
retention_days = 30              # keep 30 days
time_partition = "1h"            # 1-hour partitions

[embedding]

Delegates vector generation to an OpenAI-compatible endpoint. Leave default_endpoint empty if clients provide vectors themselves. Token limits are model-specific; the chunker in the ai crate splits long documents to fit the model's window.

[embedding]
# OpenAI:
default_endpoint = "https://api.openai.com/v1/embeddings"
default_model    = "text-embedding-3-small"
batch_size       = 64
timeout_ms       = 5000

# Or a local Ollama:
# default_endpoint = "http://localhost:11434/v1/embeddings"
# default_model    = "nomic-embed-text"

[limits]

Hard caps to protect the server from runaway queries and mapping explosions. Lower these on shared nodes, raise max_query_memory_mb for aggregation-heavy workloads.

[limits]
max_query_memory_mb     = 512
max_concurrent_searches = 64
max_fields_per_index    = 500

[indexing]

Turbo mode knobs. Turbo is opt-in per-request via POST /v1/indices/:name/turbo-ingest or the X-Turbo: true header on _bulk; these settings only apply when turbo is active.

[indexing]
turbo_batch_size    = 2000
turbo_parallel      = true
turbo_fast_analyzer = false      # true only if recall doesn't matter

[cluster]

Multi-node mode. Default is off — single-node doesn't need a consensus layer. When enabled, the embedded Raft implementation replicates metadata only (schemas, shard assignments, node roster). See Clustering for the full story.

[cluster]
enabled = true
port    = 9300                   # intra-cluster gRPC + Raft
peers   = [
  "a=10.0.0.11:9300",
  "b=10.0.0.12:9300",
  "c=10.0.0.13:9300",
]
tick_ms = 50

Source · engine/xerj.default.toml · engine/crates/common/src/config.rs