NEW Article Summaries & Table of Contents! Extract concise summaries and hierarchical TOC for quick content overview. Learn more →
Built for LLM Intelligence

Transform Static ZIM Archives into Dynamic Knowledge Engines

OpenZIM MCP is a modern, secure MCP server that enables AI models to access and search ZIM format knowledge bases offline with intelligent, structured access patterns. Choose between Simple mode (default) or Advanced mode to match your needs.

v0.8.2
Latest Release
80%+
Test Coverage
0
Known Vulnerabilities
2 Modes
Simple & Advanced
MCP Configuration
{
  "openzim-mcp": {
    "command": "uv",
    "args": [
      "run", "openzim-mcp",
      "/path/to/zim/files"
    ]
  }
}

Why LLMs Love OpenZIM MCP

Unlike basic file readers, OpenZIM MCP provides intelligent, structured access that LLMs need to effectively navigate and understand vast knowledge repositories.

Dual Mode Support

Choose Simple mode (1 intelligent natural language tool, default) or Advanced mode (18 specialized tools) to match your LLM's capabilities.

NEW

Article Summaries

Extract concise summaries from article opening paragraphs. Quick content overview without loading full articles.

NEW

Table of Contents

Build hierarchical TOC from article headings (h1-h6). Navigate directly to specific sections with anchor links.

NEW

Binary Content Retrieval

Extract PDFs, images, videos, and other embedded media from ZIM archives. Perfect for multi-agent workflows with specialized processors.

Smart Navigation

Browse by namespace (articles, metadata, media) instead of blind searching. Get structured access to content organization.

Context-Aware Discovery

Get article structure, relationships, and metadata for deeper understanding. Extract links and content connections.

Intelligent Search

Advanced filtering, auto-complete suggestions, and relevance-ranked results with namespace and content type filters.

High Performance

LRU cache with TTL, intelligent eviction policies, and optimized ZIM operations. 80%+ test coverage ensures reliability at scale.

Relationship Mapping

Extract internal/external links to understand content connections. Build knowledge graphs from ZIM content.

Security First

Comprehensive input validation, path traversal protection, and secure resource management with type safety.

Offline-First Architecture

Complete knowledge access without internet dependency. Perfect for air-gapped environments and privacy-sensitive deployments.

Multi-Archive Intelligence

Query across multiple ZIM archives simultaneously. Combine Wikipedia, Wiktionary, and specialized encyclopedias into a unified knowledge layer.

Smart Retrieval System

Advanced intelligent entry retrieval with automatic fallback and path mapping for reliable access to ZIM content.

Direct Access First

Attempts to retrieve entries using the exact path provided, optimizing for speed and accuracy.

Automatic Fallback

When direct access fails, automatically searches using various search terms and path variations.

Path Mapping Cache

Caches successful path mappings to improve performance for repeated access patterns.

Enhanced Error Guidance

Provides clear, actionable guidance when entries cannot be found, designed for LLM users.

How It Works

# The system automatically handles path encoding differences:
#  Direct access: "A/Machine_Learning"
#  Fallback search: "Machine Learning", "machine learning"
#  Cached mapping: Future requests use cached path

# No more manual search-first methodology needed!
get_zim_entry(zim_file, "A/Machine_Learning")

Enterprise-Grade Security

Comprehensive security measures designed to protect against vulnerabilities and ensure safe operation in production environments.

Path Traversal Protection

Advanced path validation prevents directory traversal attacks using secure path checking with Python 3.9+ features.

CRITICAL Fixed in v0.2.0

Input Validation & Sanitization

Comprehensive input validation with length limits, character filtering, and sanitization to prevent injection attacks.

HIGH Implemented

Type Safety & Validation

Full type annotations with Pydantic validation ensure data integrity and prevent type-related vulnerabilities.

MEDIUM Complete

Secure Error Handling

Sanitized error messages prevent information disclosure while providing helpful guidance for legitimate users.

MEDIUM Enhanced
0
Known Vulnerabilities
80%+
Test Coverage
100%
Type Annotated

Advanced Enterprise Features

Production-ready capabilities for enterprise deployments, monitoring, and multi-instance environments.

Multi-Instance Management

Automatic instance tracking and conflict detection ensures reliable operation when multiple server instances are running.

  • Automatic instance registration with unique process IDs
  • Configuration hash validation for compatibility
  • Stale instance cleanup and orphaned file detection
  • Real-time conflict detection and resolution

Health Monitoring & Diagnostics

Comprehensive health checks and diagnostic tools provide deep insights into server performance and status.

  • Built-in health check endpoints
  • Cache performance metrics and statistics
  • Instance tracking status and recommendations
  • Configuration validation and diagnostics

Intelligent Caching System

Advanced LRU cache with TTL support and intelligent eviction policies optimizes performance for large-scale deployments.

  • LRU (Least Recently Used) eviction strategy
  • Configurable TTL (Time To Live) for entries
  • Automatic expired entry cleanup
  • Path mapping cache for retrieval optimization

Modern Architecture

Modular design with dependency injection, full type safety, and comprehensive configuration management.

  • Dependency injection for testability
  • 100% type annotations with mypy validation
  • Pydantic-based configuration with validation
  • Structured logging with configurable levels

Developer Experience

Modern development workflow with automated releases, comprehensive tooling, and enterprise-grade CI/CD.

Automated Release System

Release-please integration with semantic versioning, automated changelog generation, and PyPI deployment.

Release Please Semantic Versioning Auto PyPI

Enhanced Makefile Workflow

Comprehensive development workflow with categorized help, security scanning, and cross-platform compatibility.

Make Targets Security Scanning Quality Checks

Comprehensive Testing

80%+ test coverage with pytest, benchmarking, integration tests, and automated quality assurance.

80%+ Coverage Benchmarks Integration Tests

Code Quality Tools

Black formatting, flake8 linting, mypy type checking, bandit security scanning, and pre-commit hooks.

Black MyPy Bandit

Development Workflow

# Complete development setup in one command
make install

# Run all quality checks
make check

# Run tests with coverage
make test

# Security scanning
make security

# Build and publish
make build && make publish

Quick Installation

Get up and running with OpenZIM MCP in just a few minutes.

1

Install with uv

# Install OpenZIM MCP with uv (recommended)
uv add openzim-mcp

# Or install globally with uv
uv tool install openzim-mcp
2

Prepare ZIM Files

# Create directory for ZIM files
mkdir ~/zim-files

# Download ZIM files from Kiwix Library
# https://browse.library.kiwix.org/
3

Run the Server

# Start the MCP server
uv run openzim-mcp /path/to/zim/files

# Or if installed globally
openzim-mcp /path/to/zim/files

Development Installation

For contributors and developers who want to work with the source code or need the latest features:

# Clone the repository
git clone https://github.com/cameronrye/openzim-mcp.git
cd openzim-mcp

# Install dependencies
uv sync

# Run from source
uv run python -m openzim_mcp /path/to/zim/files

Usage Examples

See OpenZIM MCP in action with real-world examples and API calls.

Browse Namespaces

{
  "name": "browse_namespace",
  "arguments": {
    "zim_file_path": "wikipedia_en_100_2025-08.zim",
    "namespace": "C",
    "limit": 10,
    "offset": 0
  }
}
Response:
{
  "namespace": "C",
  "total_in_namespace": 80000,
  "offset": 0,
  "limit": 10,
  "returned_count": 10,
  "has_more": true,
  "entries": [
    {
      "path": "C/Biology",
      "title": "Biology",
      "content_type": "text/html",
      "preview": "Biology is the scientific study of life..."
    }
  ]
}

Get Article Structure

{
  "name": "get_article_structure",
  "arguments": {
    "zim_file_path": "wikipedia_en_100_2025-08.zim",
    "entry_path": "C/Evolution"
  }
}
Response:
{
  "title": "Evolution",
  "path": "C/Evolution",
  "content_type": "text/html",
  "headings": [
    {"level": 1, "text": "Evolution", "id": "evolution"},
    {"level": 2, "text": "History", "id": "history"},
    {"level": 2, "text": "Mechanisms", "id": "mechanisms"}
  ],
  "sections": [
    {
      "title": "Evolution",
      "level": 1,
      "content_preview": "Evolution is the change in heritable traits...",
      "word_count": 150
    }
  ],
  "word_count": 5000
}

MCP Client Configuration - Simple Mode (Default)

{
  "mcpServers": {
    "openzim-mcp": {
      "command": "uv",
      "args": [
        "run",
        "openzim-mcp",
        "/path/to/zim/files"
      ]
    }
  }
}
Advanced Mode (18 specialized tools):
{
  "mcpServers": {
    "openzim-mcp-advanced": {
      "command": "openzim-mcp",
      "args": [
        "--mode", "advanced",
        "/path/to/zim/files"
      ]
    }
  }
}
Environment Variables (Optional):
# Tool mode (default: simple)
export OPENZIM_MCP_TOOL_MODE=simple

# Cache configuration
export OPENZIM_MCP_CACHE__ENABLED=true
export OPENZIM_MCP_CACHE__MAX_SIZE=200
export OPENZIM_MCP_CACHE__TTL_SECONDS=7200

# Content configuration
export OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH=200000
export OPENZIM_MCP_CONTENT__SNIPPET_LENGTH=2000

# Logging configuration
export OPENZIM_MCP_LOGGING__LEVEL=INFO

Documentation & Resources

Comprehensive guides, API references, and community resources to help you get the most out of OpenZIM MCP.

API Reference

Complete documentation of all available MCP tools, parameters, and response formats.

View API Docs →

Quick Start Guide

Step-by-step tutorial to get OpenZIM MCP running in your environment quickly.

Start Tutorial →

Configuration Guide

Advanced configuration options, environment variables, and performance tuning.

Configure →

Troubleshooting

Common issues, solutions, and debugging tips for OpenZIM MCP deployment.

Get Help →

Architecture Overview

Deep dive into the system architecture, components, and design decisions.

Learn More →

Contributing

Guidelines for contributing code, reporting issues, and joining the community.

Contribute →
Copied to clipboard!