Working, refactored news_fetch, better documentation for launch
This commit is contained in:
87
README.md
87
README.md
@@ -11,18 +11,15 @@ A utility to
|
||||
... fully automatically. Run it now, thank me later.
|
||||
|
||||
---
|
||||
## Running - Docker compose
|
||||
|
||||
The included `docker-compose` file is now necessary for easy orchestration of the various services.
|
||||
## Running - through launch file
|
||||
> Prerequisite: make `launch.cexecutable:
|
||||
>
|
||||
> `chmod +x launch`
|
||||
|
||||
All relevant passthroughs and mounts are specified through the env-file, for which I configured 4 versions:
|
||||
Execute the file by runnning `./launch`. This won't do anything in itself. You need to specify a mode, and then a command
|
||||
|
||||
* production
|
||||
* debug (development in general)
|
||||
* upload
|
||||
* check
|
||||
|
||||
These files will have to be adapted to your individual setup but won't change significantly once set up.
|
||||
`./launch <mode> <command> <command options>`
|
||||
|
||||
### Overview of the modes
|
||||
|
||||
@@ -30,47 +27,67 @@ The production mode performs all automatic actions and therfore does not require
|
||||
|
||||
The debug mode is more sophisticated and allows for big code changes without the need to recompile. It directly mounts the code-directory into the cotainer. As a failsafe the environment-variable `DEBUG=true` is set. The whole utility is then run on a sandbox environment (slack-channel, database, email) so that Dirk is not affected by any mishaps.
|
||||
|
||||
The check mode is less sophisticated but shows the downloaded articles to the host for visual verification. This requires passthroughs for X11.
|
||||
|
||||
Upload mode is much simpler, it goes over the exisiting database and operates on the articles, where the upload to archive.org has not yet occured (archive.org is slow and the other operations usually finish before the queue was consumed). It retries their upload.
|
||||
|
||||
* For normal `production` mode run:
|
||||
|
||||
`docker compose --env-file env/production run news_fetch`
|
||||
Two additional 'modes' are `build` and `down`. Build rebuilds the container, which is necessary after code changes. Down ensures a clean shutdown of *all* containers. Usually the launch-script handles this already but it sometimes fails, in which case `down` needs to be called again.
|
||||
|
||||
|
||||
* For `debug` mode run:
|
||||
### Overview of the commands
|
||||
|
||||
`docker compose --env-file env/debug run news_fetch`
|
||||
|
||||
which drops you into an interactive shell (`ctrl+d` to exit the container shell).
|
||||
In essence a command is simply a service from docker-compose, which is run in an interactive environment. As such all services defined in `docker-compose.yaml` can be called as commands. Only two of them will be of real use:
|
||||
|
||||
> Note:
|
||||
> The live-mounted code is now under `/code`. Note that the `DEBUG=true` environment variable is still set. If you want to test things on production, run `export DEBUG=false`. Running `python runner.py` will now run the newly written code but, with the production database and storage.
|
||||
`news_fetch` does the majority of the actions mentionned above. By default, that is without any options, it runs a metadata-fetch, download, compression, and upload to archive.org. The upload is usually the slowest which is why articles that are processed but don't yet have an archive.org url tend to pile up. You can therefore specify the option `upload` which only starts the upload for the concerned articles, as a catch-up if you will.
|
||||
|
||||
* For `check` mode, some env-variables are also changed and you still require interactivity. You don't need the geckodriver service however. The simplest way is to run
|
||||
Example usage:
|
||||
|
||||
`docker compose --env-file env/check run --no-deps --rm news_fetch`
|
||||
```bash
|
||||
./launch production news_fetch # full mode
|
||||
./launch production news_fetch upload # upload mode (lighter resource usage)
|
||||
./launch debug news_fetch # debug mode, which drops you inside a new shell
|
||||
|
||||
* Finally, for `upload` mode no interactivity is required and no additional services are required. Simply run:
|
||||
|
||||
`docker compose --env-file env/upload run --no-deps --rm news_fetch`
|
||||
./launch production news_check
|
||||
```
|
||||
|
||||
### Stopping
|
||||
Run
|
||||
`news_check` starts a webapp, accessible under [http://localhost:8080](http://localhost:8080) and allows you to easily check the downloaded articles.
|
||||
|
||||
`docker compose --env-file env/production down`
|
||||
|
||||
which terminates all containers associated with the `docker-compose.yaml`.
|
||||
## (Running - Docker compose)
|
||||
> I strongly recommend sticking to the usage of `./launch`.
|
||||
|
||||
Instead of using the launch file you can manually issue `docker compose` comands. Example: check for logs.
|
||||
|
||||
All relevant mounts and env-variables are easiest specified through the env-file, for which I configured 2 versions:
|
||||
|
||||
* production
|
||||
* debug (development in general)
|
||||
|
||||
These files will have to be adapted to your individual setup but won't change significantly once set up.
|
||||
|
||||
Example usage:
|
||||
|
||||
```bash
|
||||
docker compose --env-file env/production run news_fetch # full mode
|
||||
docker compose --env-file env/production run news_fetch upload # upload mode (lighter resource usage)
|
||||
docker compose --env-file env/debug run news_fetch # debug mode, which drops you inside a new shell
|
||||
|
||||
docker copose --env-file env/production news_check
|
||||
|
||||
# Misc:
|
||||
docker compose --env-file env/production up # starts all services and shows their combined logs
|
||||
docker compose --env-file env/production logs -f news_fetch # follows along with the logs of only one service
|
||||
docker compose --env-file env/production down
|
||||
```
|
||||
|
||||
|
||||
## Building
|
||||
|
||||
> The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly clean build the docker image! This is also crucial to update the code itself.
|
||||
> The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly re build the docker image! This is also crucial to update the code itself.
|
||||
|
||||
In docker compose, run
|
||||
|
||||
`docker compose --env-file env/production build`
|
||||
|
||||
Or simpler, just run
|
||||
|
||||
`./launch build`
|
||||
|
||||
|
||||
|
||||
@@ -80,6 +97,10 @@ In docker compose, run
|
||||
|
||||
|
||||
## Manual Sync to NAS:
|
||||
Manual sync is sadly still necessary, as the lsync client, sometimes gets overwhelmed by quick writes.
|
||||
|
||||
I use `rsync`. Mounting the NAS locally, I navigate to the location of the local folder (notice the trailing slash). Then run
|
||||
`rsync -Razq --no-perms --no-owner --no-group --temp-dir=/tmp --progress --log-file=rsync.log <local folder>/ "<remote>"`
|
||||
where `<remote>` is the location where the NAS is mounted. (options:`R` - relative paths , `a` - archive mode (multiple actions), `z` - ??, `q` - quiet. We also don't copy most of the metadata and we keep a log of the transfers.)
|
||||
where `<remote>` is the location where the NAS is mounted. (options:`R` - relative paths , `a` - archive mode (multiple actions), `z` - ??, `q` - quiet. We also don't copy most of the metadata and we keep a log of the transfers.)
|
||||
|
||||
You can also use your OS' native copy option and select *de not overwrite*. This should only copy the missing files, significantly speeding up the operation.
|
||||
Reference in New Issue
Block a user