From 54aaaddea5e7016c80db75daaa15344ce5524dd8 Mon Sep 17 00:00:00 2001 From: Remy Moll Date: Wed, 23 Apr 2025 15:10:39 +0200 Subject: [PATCH] Add some preliminary documentation --- README.md | 10 ++++++- docs/README.md | 40 +++++++++++++++++++++++++++ docs/bunny_storage.env | 8 ++++++ {example => docs/example}/input.txt | 0 {example => docs/example}/output.json | 0 docs/filesystem.env | 6 ++++ 6 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 docs/README.md create mode 100644 docs/bunny_storage.env rename {example => docs/example}/input.txt (100%) rename {example => docs/example}/output.json (100%) create mode 100644 docs/filesystem.env diff --git a/README.md b/README.md index 5cde5c3..d73e166 100644 --- a/README.md +++ b/README.md @@ -1 +1,9 @@ -# Structured Wikivoyage Exports \ No newline at end of file +# Structured Wikivoyage Exports + +Small utility to convert the wikitext data from the Wikivoyage dumps into a structured format. The goal is to make it easier to work with the data and extract useful information programmatically. + +## Installation + + +## Documentation +See [docs](docs) for more information on how to use this utility. \ No newline at end of file diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..3eecd01 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,40 @@ +## Documentation + +### Overview +The tool performs three main tasks: +1. It downloads the latest Wikivoyage dump from Wikimedia. +2. It parses the dump and produces structured data in JSON format. +3. It outputs the structured data to a specified target. + + +### Configuration +Configuration is handled through environment variables. The following variables are available: + +- general setup + - `DEBUG`: Increases the verbosity of the output if set. If unset, the program will run in normal mode. + - `MAX_CONCURRENT`: The maximum number of concurrent operations to perform. This is useful for limiting the number of concurrent requests to the various APIs. By default, this is set to 0, which means no limit. + +- output handler setup + - `HANDLER`: The output handler to use. The available handlers are defined in the `output_handler` module. Use their file name as the value (currently implemented: `filesystem` or `bunny_storage`). + - Different handlers may have different configuration options. Specify them through `HANDLER__