Introduction

OpenZIM MCP is a modern, secure, and high-performance MCP (Model Context Protocol) server that enables AI models to access and search ZIM format knowledge bases offline.

8-tool advanced surface. Phase F (v2.0.0) consolidated the prior 22-tool advanced surface into 8 tools — zim_query, zim_search, zim_get, zim_get_section, zim_browse, zim_metadata, zim_links, zim_health. The advanced-mode wire footprint drops from ~36KB to ~23.5KB, clearing the MCP Tax pain band (25–50KB schema) for small-model dispatch. Simple mode (zim_query only) is unchanged. New in v2.1: native libzim archive validation via zim_health(zim_file_path=...), plus identity / index introspection in zim_metadata. What’s new →

Still running v1.x? Highlights for v1.2.0 and v1.0.0 remain documented in the changelog. v1.x is in maintenance mode (security + data-corruption + pre-v2.0.0 crash fixes accepted; no new features) until the FIRST of {v2.5.0 ships, 2026-11-27}.

What Makes OpenZIM MCP Different

Built for LLM Intelligence

OpenZIM MCP provides intelligent, structured access that LLMs need:

  • Dual-mode surface — Simple mode (default) exposes a single natural-language tool (zim_query) for smaller LLMs; Advanced mode exposes the 8 specialized tools for hosts that can pick from a richer surface.
  • Smart navigation — browse by namespace (articles, metadata, media) instead of blind searching. zim_browse(mode="walk") does deterministic cursor-paginated iteration.
  • Multi-archive searchzim_search(cross_file=True) queries every ZIM file at once; zim_search(mode="title") resolves titles directly without full-text scoring.
  • Smart retrieval — automatic fallback from direct path access to search-derived term resolution, with archive-scoped path-mapping cache and bounded redirect-chain following.
  • Batch retrievalzim_get(entry_paths=[...]) fetches up to 50 entries in one call with per-entry success/error reporting.
  • MCP prompts — pre-built workflows (/research, /summarize, /explore) orchestrate multi-step ZIM operations.
  • MCP resourceszim://files, zim://{name}, zim://{name}/entry/{path} integrate with MCP-aware client browsers and @-mention pickers; subscribe for live update notifications.
  • Binary contentzim_get(binary=True) extracts PDFs, images, and other embedded media for multi-agent workflows.

Operations & Security

  • Streamable HTTP transport with bearer-token auth, CORS allow-list, and /healthz / /readyz probes.
  • Safe-default startup check refuses to bind a non-localhost host without an auth token.
  • Path and PID redaction in error responses and diagnostics — rejected traversals no longer leak the canonical allowed-directory layout.
  • Atomic rate limiting — global + per-operation token-bucket acquire is single-pass; no transient over-consumption.
  • Multi-arch Docker imageghcr.io/cameronrye/openzim-mcp, builds for linux/amd64 and linux/arm64, runs as non-root with a built-in healthcheck.

Use Cases

  • Research & knowledge management — offline Wikipedia / Wiktionary / academic archives behind an MCP-aware assistant.
  • Knowledge chatbots — give a small/local LLM real reference material instead of relying on weights.
  • Compliance / air-gapped environments — offline knowledge access without internet egress.
  • Multi-agent workflows — extract binary entries (PDFs, images) and pass to specialized processors.

Project Status

  • Version: 2.1.1
  • License: MIT
  • Python: 3.12+
  • Test Coverage: 80%+
  • Container: ghcr.io/cameronrye/openzim-mcp:2.1.1 + :latest (linux/amd64, linux/arm64)

Need help? Start with the Quick start; for HTTP/Docker deployment see HTTP and Docker Deployment; for failure modes see Troubleshooting.

Edit this page on GitHub ↗