2025-04-30 22:10:55 +02:00
2025-04-26 15:03:50 +02:00
2025-04-09 13:40:03 +02:00
2025-04-30 22:06:29 +02:00
2025-04-29 17:20:52 +02:00
2025-04-26 22:21:41 +02:00
2025-04-18 17:26:37 +02:00
2025-04-09 13:50:08 +02:00
2025-04-09 13:40:03 +02:00
2025-04-09 13:44:23 +02:00
2025-04-26 19:40:41 +02:00
2025-04-26 19:40:41 +02:00

Structured Wikivoyage Exports

Small utility to convert the wikitext data from the Wikivoyage dumps into a structured format. The goal is to make it easier to work with the data and extract useful information programmatically.

Usage

Docker

This script is intended to be run with docker. A docker image is available from the GitHub registry. For example, you may run it using the filesystem handler with docker run -e HANDLER=filesystem -e HANDLER_FILESYSTEM_OUTPUT_DIR=/output -v ./output:/output --ulimit nofile=65536:65536 ghcr.io/bcye/structured-wikivoyage-exports. For all the different options, refer to the docs.

Types

TypeScript types for consuming the json output are available, you may install them from the @bcye/structured-wikivoyage-types npm package. Refer to the included docstrings in types/index.d.ts for reference.

Documentation

See docs for more information on how to use this utility.

Testing

Run PYTHONPATH=. pytest from inside the venv

License

Code

(c) 2025 bcye and moll-re

All code and documentation unless otherwise stated is licensed under the AGPLv3 license, refer to LICENSE for the full license text.

Examples

Files in the docs/example and tests/fixtures are copies (.txt) or derivatives (.json) of the Boston Article on Wikivoyage and licensed under CC BY-SA 4.0. A list of contributors is available on the original article.

Description
Fetch and transform Wikivoyage data dumps into structured JSON trees
Readme AGPL-3.0 475 KiB
Languages
Python 52.9%
Jupyter Notebook 39.5%
TypeScript 7.4%
Dockerfile 0.2%