mirror of
https://github.com/bcye/structured-wikivoyage-exports.git
synced 2025-05-14 03:50:19 +00:00
Documentation
Overview
The tool performs three main tasks:
- It downloads the latest Wikivoyage dump from Wikimedia.
- It parses the dump and produces structured data in JSON format.
- It outputs the structured data to a specified target.
Configuration
Configuration is handled through environment variables. The following variables are available:
-
general setup
DEBUG
: Increases the verbosity of the output if set. If unset, the program will run in normal mode.MAX_CONCURRENT
: The maximum number of concurrent operations to perform. This is useful for limiting the number of concurrent requests to the various APIs. By default, this is set to 0, which means no limit.
-
output handler setup
HANDLER
: The output handler to use. The available handlers are defined in theoutput_handler
module. Use their file name as the value (currently implemented:filesystem
orbunny_storage
).- Different handlers may have different configuration options. Specify them through
HANDLER_<handler_name>_<option>
:HANDLER_FILESYSTEM_OUTPUT_DIR
: The directory to output the structured data to.HANDLER_FILESYSTEM_FAIL_ON_ERROR
: By default the handler will fail if a particular write operation fails. If this is set tofalse
, the handler will skip the erronous writes and continue with the next one.HANDLER_BUNNY_STORAGE_API_KEY
: The API key for Bunny Storage.HANDLER_BUNNY_STORAGE_ENDPOINT
: The endpoint for Bunny Storage.HANDLER_BUNNY_STORAGE_BASE_PATH
: The base path to output the structured data to.HANDLER_BUNNY_STORAGE_FAIL_ON_ERROR
: By default the handler will fail if a particular write operation fails. If this is set tofalse
, the handler will skip the erronous writes and continue with the next one.
Environment files can be specified through as an .env
file. Sample files are provided: see filesystem.env and bunny_storage.env.
Fetching
TBD
Parsing
The result of the parsing is a JSON object, see an example under example.
Output
TBD
Output
According to the output handler, the structured data is written to a file or uploaded to a storage service. The handlers are kept modular and we encourage you to implement your own handler, contributions are welcome. The only design constraint we have is that the outputs to individual files.