124 lines
5.8 KiB
Markdown
124 lines
5.8 KiB
Markdown
# COSS_ARCHIVING
|
|
|
|
A utility to
|
|
|
|
* fetch article requests from slack
|
|
* generate pdfs for them
|
|
* compress them
|
|
* send them via slack + email
|
|
* upload them to the COSS NAS
|
|
|
|
... fully automatically. Run it now, thank me later.
|
|
|
|
---
|
|
|
|
## Running - through launch file
|
|
> Prerequisite: make `launch.cexecutable:
|
|
>
|
|
> `chmod +x launch`
|
|
|
|
Execute the file by runnning `./launch`. This won't do anything in itself. You need to specify a mode, and then a command
|
|
|
|
`./launch <mode> <command> <command options>`
|
|
|
|
### Overview of the modes
|
|
|
|
The production mode performs all automatic actions and therfore does not require any manual intervention. It queries the slack workspace, adds the new requests to the database, downloads all files and metadata, uploads the urls to archive.org and sends out the downloaded article. As a last step the newly created file is synced to the COSS-NAS.
|
|
|
|
The debug mode is more sophisticated and allows for big code changes without the need to recompile. It directly mounts the code-directory into the cotainer. As a failsafe the environment-variable `DEBUG=true` is set. The whole utility is then run on a sandbox environment (slack-channel, database, email) so that Dirk is not affected by any mishaps.
|
|
|
|
Two additional 'modes' are `build` and `down`. Build rebuilds the container, which is necessary after code changes. Down ensures a clean shutdown of *all* containers. Usually the launch-script handles this already but it sometimes fails, in which case `down` needs to be called again.
|
|
|
|
|
|
### Overview of the commands
|
|
|
|
In essence a command is simply a service from docker-compose, which is run in an interactive environment. As such all services defined in `docker-compose.yaml` can be called as commands. Only two of them will be of real use:
|
|
|
|
`news_fetch` does the majority of the actions mentionned above. By default, that is without any options, it runs a metadata-fetch, download, and upload to archive.org. The upload is usually the slowest which is why articles that are processed but don't yet have an archive.org url tend to pile up. You can therefore specify the option `upload` which only starts the upload for the concerned articles, as a catch-up if you will.
|
|
|
|
Example usage:
|
|
|
|
```bash
|
|
./launch production news_fetch # full mode
|
|
./launch production news_fetch upload # upload mode (lighter resource usage)
|
|
./launch debug news_fetch # debug mode, which drops you inside a new shell
|
|
|
|
./launch production news_check
|
|
```
|
|
|
|
`news_check` starts a webapp, accessible under [http://localhost:8080](http://localhost:8080) and allows you to easily check the downloaded articles.
|
|
|
|
|
|
## (Running - Docker compose)
|
|
> I strongly recommend sticking to the usage of `./launch`.
|
|
|
|
Instead of using the launch file you can manually issue `docker compose` comands. Example: check for logs.
|
|
|
|
All relevant mounts and env-variables are easiest specified through the env-file, for which I configured 2 versions:
|
|
|
|
* production
|
|
* debug (development in general)
|
|
|
|
These files will have to be adapted to your individual setup but won't change significantly once set up.
|
|
|
|
Example usage:
|
|
|
|
```bash
|
|
docker compose --env-file env/production run news_fetch # full mode
|
|
docker compose --env-file env/production run news_fetch upload # upload mode (lighter resource usage)
|
|
docker compose --env-file env/debug run news_fetch # debug mode, which drops you inside a new shell
|
|
|
|
docker copose --env-file env/production news_check
|
|
|
|
# Misc:
|
|
docker compose --env-file env/production up # starts all services and shows their combined logs
|
|
docker compose --env-file env/production logs -f news_fetch # follows along with the logs of only one service
|
|
docker compose --env-file env/production down
|
|
```
|
|
|
|
### First run:
|
|
> The program relies on a functioning firefox profile!
|
|
|
|
For the first run ever, run
|
|
|
|
`./launch edit_profile`
|
|
|
|
This will generate a new firefox profile under `coss_archiving/dependencies/news_fetch.profile`.
|
|
You can then go to [http://localhost:7900](http://localhost:7900) in your browser. Check the profile (under firefox://profile-internals).
|
|
|
|
Now install two addons: Idontcareaboutcookies and bypass paywalls clean (from firefox://extensions). They ensure that most sites just work out of the box. You can additionally install adblockers such as ublock origin.
|
|
|
|
You can then use this profile to further tweak various sites. The state of the sites (namely their cookies) will be used by `news_fetch`.
|
|
|
|
> Whenever you need to make changes to the profile, for instance re-log in to websites, just rerun `./launch edit_profile`.
|
|
|
|
Exit the mode by closing the firefox window. You can then run `./launch down` and then proceed normally.
|
|
|
|
|
|
## Building
|
|
|
|
> The software **will** change. Because the images referenced in docker compose are usually the `latest` ones, it is sufficient to update the containers.
|
|
|
|
In docker compose, run
|
|
|
|
`docker compose --env-file env/production build`
|
|
|
|
Or simpler, just run
|
|
|
|
`./launch build` (should issues occur you can also run `./launch build --no-cache`)
|
|
|
|
|
|
|
|
## Roadmap:
|
|
|
|
[_] handle paywalled sites like faz, spiegel, ... through their dedicated sites (see nexisuni.com for instance), available through the ETH network
|
|
|
|
|
|
## Manual Sync to NAS:
|
|
Manual sync is sadly still necessary, as the lsync client, sometimes gets overwhelmed by quick writes.
|
|
|
|
I use `rsync`. Mounting the NAS locally, I navigate to the location of the local folder (notice the trailing slash). Then run
|
|
`rsync -Razq --no-perms --no-owner --no-group --temp-dir=/tmp --progress --log-file=rsync.log <local folder>/ "<remote>"`
|
|
where `<remote>` is the location where the NAS is mounted. (options:`R` - relative paths , `a` - archive mode (multiple actions), `z` - ??, `q` - quiet. We also don't copy most of the metadata and we keep a log of the transfers.)
|
|
|
|
You can also use your OS' native copy option and select *de not overwrite*. This should only copy the missing files, significantly speeding up the operation. |