2022-08-23 15:12:39 +02:00
2022-08-11 13:42:45 +02:00
2022-08-23 15:12:39 +02:00
2022-08-23 15:12:39 +02:00

COSS_ARCHIVING

A utility to

  • fetch article requests from slack
  • generate pdfs for them
  • compress them
  • send them via slack + email
  • upload them to the COSS NAS

... fully automatically. Run it now, thank me later.


Running - Docker compose

The included docker-compose file is now necessary for easy orchestration of the various services.

All relevant passthroughs and mounts are specified through the env-file, for which I configured 4 versions:

  • production
  • debug (development in general)
  • upload
  • check

These files will have to be adapted to your individual setup but won't change significantly once set up.

Overview of the modes

The production mode performs all automatic actions and therfore does not require any manual intervention. It queries the slack workspace, adds the new requests to the database, downloads all files and metadata, uploads the urls to archive.org and sends out the downloaded article. As a last step the newly created file is synced to the COSS-NAS.

The debug mode is more sophisticated and allows for big code changes without the need to recompile. It directly mounts the code-directory into the cotainer. As a failsafe the environment-variable DEBUG=true is set. The whole utility is then run on a sandbox environment (slack-channel, database, email) so that Dirk is not affected by any mishaps.

The check mode is less sophisticated but shows the downloaded articles to the host for visual verification. This requires passthroughs for X11.

Upload mode is much simpler, it goes over the exisiting database and operates on the articles, where the upload to archive.org has not yet occured (archive.org is slow and the other operations usually finish before the queue was consumed). It retries their upload.

  • For normal production mode run:

    docker compose --env-file env/production run news_fetch

  • For debug mode run:

    docker compose --env-file env/debug run news_fetch

    which drops you into an interactive shell (ctrl+d to exit the container shell).

    Note: The live-mounted code is now under /code. Note that the DEBUG=true environment variable is still set. If you want to test things on production, run export DEBUG=false. Running python runner.py will now run the newly written code but, with the production database and storage.

  • For check mode, some env-variables are also changed and you still require interactivity. You don't need the geckodriver service however. The simplest way is to run

    docker compose --env-file env/check run --no-deps --rm news_fetch

  • Finally, for upload mode no interactivity is required and no additional services are required. Simply run:

    docker compose --env-file env/upload run --no-deps --rm news_fetch

Stopping

Run

docker compose --env-file env/production down

which terminates all containers associated with the docker-compose.yaml.

Building

The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly clean build the docker image! This is also crucial to update the code itself.

In docker compose, run

docker compose --env-file env/production build

Roadmap:

[_] handle paywalled sites like faz, spiegel, ... through their dedicated sites (see nexisuni.com for instance), available through the ETH network

Manual Sync to NAS:

I use rsync. Mounting the NAS locally, I navigate to the location of the local folder (notice the trailing slash). Then run rsync -Razq --no-perms --no-owner --no-group --temp-dir=/tmp --progress --log-file=rsync.log <local folder>/ "<remote>" where <remote> is the location where the NAS is mounted. (options:R - relative paths , a - archive mode (multiple actions), z - ??, q - quiet. We also don't copy most of the metadata and we keep a log of the transfers.)

Description
No description provided
Readme 11 MiB
Languages
Python 77.8%
Svelte 9.7%
JavaScript 7.7%
Makefile 2.5%
Dockerfile 1%
Other 1.3%