Introduction
OpenZIM MCP is a modern, secure, and high-performance MCP (Model Context Protocol) server that enables AI models to access and search ZIM format knowledge bases offline.
8-tool advanced surface. Phase F (v2.0.0) consolidated the prior 22-tool advanced surface into 8 tools —
zim_query,zim_search,zim_get,zim_get_section,zim_browse,zim_metadata,zim_links,zim_health. The advanced-mode wire footprint drops from ~36KB to ~23.5KB, clearing the MCP Tax pain band (25–50KB schema) for small-model dispatch. Simple mode (zim_queryonly) is unchanged. New in v2.1: native libzim archive validation viazim_health(zim_file_path=...), plus identity / index introspection inzim_metadata. What’s new →Still running v1.x? Highlights for v1.2.0 and v1.0.0 remain documented in the changelog. v1.x is in maintenance mode (security + data-corruption + pre-v2.0.0 crash fixes accepted; no new features) until the FIRST of
{v2.5.0 ships, 2026-11-27}.
What Makes OpenZIM MCP Different
Built for LLM Intelligence
OpenZIM MCP provides intelligent, structured access that LLMs need:
- Dual-mode surface — Simple mode (default) exposes a single natural-language tool (
zim_query) for smaller LLMs; Advanced mode exposes the 8 specialized tools for hosts that can pick from a richer surface. - Smart navigation — browse by namespace (articles, metadata, media) instead of blind searching.
zim_browse(mode="walk")does deterministic cursor-paginated iteration. - Multi-archive search —
zim_search(cross_file=True)queries every ZIM file at once;zim_search(mode="title")resolves titles directly without full-text scoring. - Smart retrieval — automatic fallback from direct path access to search-derived term resolution, with archive-scoped path-mapping cache and bounded redirect-chain following.
- Batch retrieval —
zim_get(entry_paths=[...])fetches up to 50 entries in one call with per-entry success/error reporting. - MCP prompts — pre-built workflows (
/research,/summarize,/explore) orchestrate multi-step ZIM operations. - MCP resources —
zim://files,zim://{name},zim://{name}/entry/{path}integrate with MCP-aware client browsers and@-mention pickers; subscribe for live update notifications. - Binary content —
zim_get(binary=True)extracts PDFs, images, and other embedded media for multi-agent workflows.
Operations & Security
- Streamable HTTP transport with bearer-token auth, CORS allow-list, and
/healthz//readyzprobes. - Safe-default startup check refuses to bind a non-localhost host without an auth token.
- Path and PID redaction in error responses and diagnostics — rejected traversals no longer leak the canonical allowed-directory layout.
- Atomic rate limiting — global + per-operation token-bucket acquire is single-pass; no transient over-consumption.
- Multi-arch Docker image —
ghcr.io/cameronrye/openzim-mcp, builds forlinux/amd64andlinux/arm64, runs as non-root with a built-in healthcheck.
Use Cases
- Research & knowledge management — offline Wikipedia / Wiktionary / academic archives behind an MCP-aware assistant.
- Knowledge chatbots — give a small/local LLM real reference material instead of relying on weights.
- Compliance / air-gapped environments — offline knowledge access without internet egress.
- Multi-agent workflows — extract binary entries (PDFs, images) and pass to specialized processors.
Project Status
- Version: 2.1.1
- License: MIT
- Python: 3.12+
- Test Coverage: 80%+
- Container:
ghcr.io/cameronrye/openzim-mcp:2.1.1+:latest(linux/amd64, linux/arm64)
Quick Links
- GitHub Repository
- CHANGELOG — version history
- v1 → v2 migration table — the mechanical 22→8 mapping
- Issues & Bug Reports
- Discussions
- Releases
Need help? Start with the Quick start; for HTTP/Docker deployment see HTTP and Docker Deployment; for failure modes see Troubleshooting.