Working, refactored news_fetch, better documentation for launch

2022-09-08 16:19:15 +02:00 · 2022-09-08 16:19:15 +02:00 · afead44d6c
commit afead44d6c
parent 713406dc67
14 changed files with 220 additions and 247 deletions
--- a/README.md
+++ b/README.md
@ -11,18 +11,15 @@ A utility to
 ... fully automatically. Run it now, thank me later.
 ---
 ## Running - Docker compose 
-The included `docker-compose` file is now necessary for easy orchestration of the various services. 
+## Running - through launch file
 > Prerequisite: make `launch.cexecutable:
 > 
 > `chmod +x launch`
-All relevant passthroughs and mounts are specified through the env-file, for which I configured 4 versions: 
+Execute the file by runnning `./launch`. This won't do anything in itself. You need to specify a mode, and then a command
-* production
+`./launch <mode> <command> <command options>`
 * debug (development in general)
 * upload
 * check 
 These files will have to be adapted to your individual setup but won't change significantly once set up.
 ### Overview of the modes
@ -30,47 +27,67 @@ The production mode performs all automatic actions and therfore does not require
 The debug mode is more sophisticated and allows for big code changes without the need to recompile. It directly mounts the code-directory into the cotainer. As a failsafe the environment-variable `DEBUG=true` is set. The whole utility is then run on a sandbox environment (slack-channel, database, email) so that Dirk is not affected by any mishaps.
-The check mode is less sophisticated but shows the downloaded articles to the host for visual verification. This requires passthroughs for X11.
+Two additional 'modes' are `build` and `down`. Build rebuilds the container, which is necessary after code changes. Down ensures a clean shutdown of *all* containers. Usually the launch-script handles this already but it sometimes fails, in which case `down` needs to be called again.
 Upload mode is much simpler, it goes over the exisiting database and operates on the articles, where the upload to archive.org has not yet occured (archive.org is slow and the other operations usually finish before the queue was consumed). It retries their upload. 
 * For normal `production` mode run:
    `docker compose --env-file env/production run news_fetch`
-* For `debug` mode run:
+### Overview of the commands
-    `docker compose --env-file env/debug run news_fetch`
+In essence a command is simply a service from docker-compose, which is run in an interactive environment. As such all services defined in `docker-compose.yaml` can be called as commands. Only two of them will be of real use:
-    which drops you into an interactive shell (`ctrl+d` to exit the container shell).
+`news_fetch` does the majority of the actions mentionned above. By default, that is without any options, it runs a metadata-fetch, download, compression, and upload to archive.org. The upload is usually the slowest which is why articles that are processed but don't yet have an archive.org url tend to pile up. You can therefore specify the option `upload` which only starts the upload for the concerned articles, as a catch-up if you will.
-    > Note:
+Example usage:
    > The live-mounted code is now under `/code`. Note that the `DEBUG=true` environment variable is still set. If you want to test things on production, run `export DEBUG=false`. Running `python runner.py` will now run the newly written code but, with the production database and storage.
-* For `check` mode, some env-variables are also changed and you still require interactivity. You don't need the geckodriver service however. The simplest way is to run
+```bash
 ./launch production news_fetch # full mode
 ./launch production news_fetch upload # upload mode (lighter resource usage)
 ./launch debug news_fetch # debug mode, which drops you inside a new shell
-    `docker compose --env-file env/check run --no-deps --rm news_fetch`
+./launch production news_check
 ```
-* Finally, for `upload` mode no interactivity is required and no additional services are required. Simply run:
+`news_check` starts a webapp, accessible under [http://localhost:8080](http://localhost:8080) and allows you to easily check the downloaded articles.
    `docker compose --env-file env/upload run --no-deps --rm news_fetch`
-### Stopping
+## (Running - Docker compose) 
-Run 
+> I strongly recommend sticking to the usage of `./launch`.
-`docker compose --env-file env/production down`
+Instead of using the launch file you can manually issue `docker compose` comands. Example: check for logs.
 All relevant mounts and env-variables are easiest specified through the env-file, for which I configured 2 versions: 
 * production
 * debug (development in general)
 These files will have to be adapted to your individual setup but won't change significantly once set up.
 Example usage:
 ```bash
 docker compose --env-file env/production run news_fetch # full mode
 docker compose --env-file env/production run news_fetch upload # upload mode (lighter resource usage)
 docker compose --env-file env/debug run news_fetch # debug mode, which drops you inside a new shell
 docker copose --env-file env/production news_check
 # Misc:
 docker compose --env-file env/production up # starts all services and shows their combined logs
 docker compose --env-file env/production logs -f news_fetch # follows along with the logs of only one service
 docker compose --env-file env/production down
 ```
 which terminates all containers associated with the `docker-compose.yaml`. 
 ## Building
-> The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly clean build the docker image! This is also crucial to update the code itself.
+> The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly re build the docker image! This is also crucial to update the code itself.
 In docker compose, run 
 `docker compose --env-file env/production build`
 Or simpler, just run
 `./launch build`
@ -80,6 +97,10 @@ In docker compose, run
 ## Manual Sync to NAS:
 Manual sync is sadly still necessary, as the lsync client, sometimes gets overwhelmed by quick writes.
 I use `rsync`. Mounting the NAS locally, I navigate to the location of the local folder (notice the trailing slash). Then run
 `rsync -Razq --no-perms --no-owner --no-group --temp-dir=/tmp --progress --log-file=rsync.log <local folder>/ "<remote>"`
 where `<remote>` is the location where the NAS is mounted. (options:`R` - relative paths  , `a` - archive mode (multiple actions), `z` - ??, `q` - quiet. We also don't copy most of the metadata and we keep a log of the transfers.)
 You can also use your OS' native copy option and select *de not overwrite*. This should only copy the missing files, significantly speeding up the operation.
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -38,6 +38,7 @@ services:
    environment:
      - START_VNC=${HEADFULL-false} # as opposed to headless, used when requiring supervision (eg. for websites that crash)
      - START_XVFB=${HEADFULL-false}
      - SE_VNC_NO_PASSWORD=1
    expose: ["4444"] # exposed to other docker-compose services only
    ports:
      - 7900:7900 # port for webvnc
--- a/env/check
+++ b/env/check
@ -1,15 +0,0 @@
 # Does not run any downloads but displays the previously downloaded but not yet checked files. Requires display-acces via xauth
 CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
 XAUTHORTIY=$XAUTHORTIY
 XSOCK=/tmp/.X11-unix
 DEBUG=false
 CHECK=true
 HEADLESS=true
 UPLOAD=false
 REDUCEDFETCH=false
 # ENTRYPOINT="/bin/bash"
 INTERACTIVE=true
--- a/env/debug
+++ b/env/debug
@ -1,14 +1,10 @@
 # Runs in a debugging mode, does not launch anything at all but starts a bash process
-CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
+export CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
 export UNAME=remy
-CODE=./
+export GECKODRIVER_IMG=selenium/standalone-firefox:104.0
-
+export DEBUG=true
-DEBUG=true
+export HEADFULL=true
-CHECK=false
+export CODE=./
-UPLOAD=false
+export ENTRYPOINT=/bin/bash
 HEADLESS=false
 REDUCEDFETCH=false
 ENTRYPOINT="/bin/bash"
 INTERACTIVE=true
--- a/env/production
+++ b/env/production
@ -2,9 +2,6 @@
 CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
-CONTAINERS_TO_RUN=nas_sync, geckodriver
+export UNAME=remy
-DEBUG=false
+export GECKODRIVER_IMG=selenium/standalone-firefox:104.0
-CHECK=false
+export DEBUG=false
 UPLOAD=false
 HEADLESS=true
 REDUCEDFETCH=true
--- a/env/upload
+++ b/env/upload
@ -1,10 +0,0 @@
 # Does not run any other workers and only upploads to archive the urls that weren't previously uploaded
 CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
 NEWS_FETCH_DEPENDS_ON="[]"
 DEBUG=false
 CHECK=false
 UPLOAD=true
 HEADLESS=true
 REDUCEDFETCH=false
--- a/2
+++ b/2
@ -9,7 +9,7 @@ echo "Bash script launching COSS_ARCHIVING..."
 export CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
 export UNAME=remy
 # CHANGE ME WHEN UPDATING FIREFOX
-export GECKODRIVER_IMG=selenium/standalone-firefox:103.0
+export GECKODRIVER_IMG=selenium/standalone-firefox:104.0
 # version must be >= than the one on the host or firefox will not start (because of mismatched config)
 if [[ $1 == "debug" ]]
--- a/news_check/client/src/ArticleStatus.svelte
+++ b/news_check/client/src/ArticleStatus.svelte
@ -25,7 +25,7 @@
                    <td>{ item.name }</td>
                    <!-- <td>Quality Control Specialist</td> -->
                    {#if item.value != ""}
-                        <td class='bg-emerald-200' style="white-space: normal">{ item.value }</td>
+                        <td class='bg-emerald-200' style="white-space: normal; width:70%">{ item.value }</td>
                    {:else}
                        <td class='bg-red-200'>{ item.value }</td>
                    {/if}
--- a/news_fetch/Dockerfile
+++ b/news_fetch/Dockerfile
@ -2,6 +2,8 @@ FROM python:latest
 ENV TZ Europe/Zurich
 RUN apt-get update && apt-get install -y ghostscript
 # for compression of pdfs
 RUN useradd --create-home --shell /bin/bash --uid 1001 autonews
 # id mapped to local user
--- a/news_fetch/configuration.py
+++ b/news_fetch/configuration.py
@ -1,7 +1,8 @@
 import os
 import shutil
 import configparser
 import logging
 import time
 import shutil
 from datetime import datetime
 from peewee import SqliteDatabase, PostgresqlDatabase
 from rich.logging import RichHandler
@ -41,6 +42,7 @@ if os.getenv("DEBUG", "false") == "true":
 else:
    logger.warning("Found 'DEBUG=false' and running on production databases, I hope you know what you're doing...")
    time.sleep(10) # wait for the vpn to connect (can't use a healthcheck because there is no depends_on)
    cred = db_config["DATABASE"]
    download_db = PostgresqlDatabase(
        cred["db_name"], user=cred["user_name"], password=cred["password"], host="vpn", port=5432
--- a/news_fetch/runner.py
+++ b/news_fetch/runner.py
@ -3,125 +3,91 @@ import configuration
 models = configuration.models
 from threading import Thread
 import logging
 import sys
 logger = logging.getLogger(__name__)
 import sys
 from collections import OrderedDict
-from utils_mail import runner as mail_runner
+
-from utils_slack import runner as slack_runner
+from utils_mail import runner as MailRunner
 from utils_slack import runner as SlackRunner
 from utils_worker.workers import CompressWorker, DownloadWorker, FetchWorker, UploadWorker
 class ArticleWatcher:
    """Wrapper for a newly created article object. Notifies the coordinator upon change/completition"""
-    def __init__(self, article, **kwargs) -> None:
+    def __init__(self, article, workers_in, workers_out) -> None:
        self.article_id = article.id # in case article becomes None at any point, we can still track the article
        self.article = article
-        self.completition_notifier = kwargs.get("notifier")
+        self.workers_in = workers_in
-        self.fetch = kwargs.get("worker_fetch", None)
+        self.workers_out = workers_out
        self.download = kwargs.get("worker_download", None)
        self.compress = kwargs.get("worker_compress", None)
        self.upload = kwargs.get("worker_upload", None)
        self.completition_notified = False
        # self._download_called = self._compression_called = False
        self._fetch_completed = self._download_completed = self._compression_completed = self._upload_completed = False
-        # first step: gather metadata
+        for w_dict in self.workers_in:
-        if self.fetch and self.upload:
+            worker = self.get_next_worker(w_dict) # gets the first worker of each dict (they get processed independently)
-            self.fetch.process(self) # this will call the update_status method
+            worker.process(self)
            self.upload.process(self) # idependent from the rest
        else: # the full kwargs were not provided, only do a manual run
            # overwrite update_status() because calls from the workers will result in erros
            self.update_status = lambda completed: logger.info(f"Completed action {completed}")
            for w in kwargs.get("workers_manual"):
                w.process(self)
-    def update_status(self, completed_action):
+    def get_next_worker(self, worker_dict, worker_name=""):
-        """Checks and notifies internal completition-status.
+        """Returns the worker coming after the one with key worker_name"""
-        Article download is complete iff fetch and download were successfull and compression was run
+
-        """
+        if worker_name == "": # first one
-        # if self.completition_notified and self._compression_completed and self._fetch_completed and self._download_completed and self._upload_completed, we are done
+            return worker_dict[list(worker_dict.keys())[0]]
-        if completed_action == "fetch":
+        # for i,w_dict in enumerate(workers_list):
-            self.download.process(self)
+        keys = list(worker_dict.keys())
-        elif completed_action == "download":
+        next_key_ind = keys.index(worker_name) + 1
-            self.compress.process(self)
+        try:
-        elif completed_action == "compress": # last step
+            key = keys[next_key_ind]
-            self.completition_notifier(self.article)
+            return worker_dict[key]
-            # triggers action in Coordinator
+        except IndexError:
-        elif completed_action == "upload":
+            return None
            # this case occurs when upload was faster than compression
            pass
        else:
            logger.warning(f"update_status called with unusual configuration: {completed_action}")
-    # ====== Attributes to be modified by the util workers
+    def update(self, worker_name):
-    @property
+        """Called by the workers to notify the watcher of a completed step"""
-    def fetch_completed(self):
+        for w_dict in self.workers_in:
-        return self._fetch_completed
+            if worker_name in w_dict.keys():
                next_worker = self.get_next_worker(w_dict, worker_name)
                if next_worker:
                    if next_worker == "out":
                        self.completion_notifier()
                    else: # it's just another in-worker
                        next_worker.process(self)
                else: # no next worker, we are done
                    logger.info(f"No worker after {worker_name}")
    @fetch_completed.setter
    def fetch_completed(self, value: bool):
        self._fetch_completed = value
        self.update_status("fetch")
-    @property
+    def completion_notifier(self):
-    def download_completed(self):
+        """Triggers the out-workers to process the article, that is to send out a message"""
-        return self._download_completed 
+        for w_dict in self.workers_out:
            worker = self.get_next_worker(w_dict)
            worker.send(self.article)
            self.article.sent = True
            self.article.save()
    @download_completed.setter
    def download_completed(self, value: bool):
        self._download_completed = value
        self.update_status("download")
    @property
    def compression_completed(self):
        return self._compression_completed
    @compression_completed.setter
    def compression_completed(self, value: bool):
        self._compression_completed = value
        self.update_status("compress")
    @property
    def upload_completed(self):
        return self._upload_completed
    @upload_completed.setter
    def upload_completed(self, value: bool):
        self._upload_completed = value
        self.update_status("upload")
    def __str__(self) -> str:
-        return f"Article with id {self.article_id}"
+        return f"ArticleWatcher with id {self.article_id}"
 class Coordinator(Thread):
    def __init__(self, **kwargs) -> None:
        """Launcher calls this Coordinator as the main thread to handle connections between the other workers (threaded)."""
        super().__init__(target = self.launch, daemon=True)
-    def add_workers(self, **kwargs):
+class Dispatcher(Thread):
-        self.worker_slack = kwargs.pop("worker_slack", None) 
+    def __init__(self) -> None:
-        self.worker_mail = kwargs.pop("worker_mail", None)
+        """Thread to handle handle incoming requests and control the workers"""
-        # the two above won't be needed in the Watcher
+        self.workers_in = []
-        self.worker_download = kwargs.get("worker_download", None)
+        self.workers_out = []
-        self.worker_fetch = kwargs.get("worker_fetch", None)
+        super().__init__(target = self.launch)
        self.worker_compress = kwargs.get("worker_compress", None)
        self.worker_upload = kwargs.get("worker_upload", None)
        self.kwargs = kwargs
    def launch(self) -> None:
-        for w in [self.worker_download, self.worker_fetch, self.worker_upload, self.worker_compress]:
+        # start workers (each worker is a thread)
-            if not w is None: # for reduced operations such as upload, some workers are set to None
+        for w_dict in self.workers_in: # for reduced operations such as upload, some workers are not set
            for w in w_dict.values():
                if isinstance(w, Thread):
                    w.start()
-        # if past messages have not been sent, they must be reevaluated
+        # get all articles not fully processed
-        unsent = models.ArticleDownload.filter(sent = False)
+        unsent = models.ArticleDownload.filter(sent = False) # if past messages have not been sent, they must be reevaluated
        # .objects.filter(sent = False)
        for a in unsent:
            self.incoming_request(article=a)
@ -136,82 +102,82 @@ class Coordinator(Thread):
                return
            article, is_new = models.ArticleDownload.get_or_create(article_url=url)
            article.slack_ts = message.ts # either update the timestamp (to the last reference to the article) or set it for the first time
            article.save()
        elif article is not None:
            is_new = False
            logger.info(f"Received article {article} in incoming_request")
        else:
-            logger.error("Coordinator.incoming_request called with no arguments")
+            logger.error("Dispatcher.incoming_request called with no arguments")
            return
        self.kwargs.update({"notifier" : self.article_complete_notifier})
        if is_new or (article.file_name == "" and article.verified == 0):
            # check for models that were created but were abandonned. This means they have missing information, most importantly no associated file
            # this overwrites previously set information, but that should not be too important
            ArticleWatcher(
                article,
-                **self.kwargs   
+                workers_in=self.workers_in,
                workers_out=self.workers_out,
            )
            # All workers are implemented as a threaded queue. But the individual model requires a specific processing order:
            # fetch -> download -> compress -> complete
            # the watcher orchestrates the procedure and notifies upon completition
            # the watcher will notify once it is sufficiently populated
        else: # manually trigger notification immediatly
            logger.info(f"Found existing article {article}. Now sending")
            self.article_complete_notifier(article)
-    def manual_processing(self, articles, workers):
+    # def manual_processing(self, articles, workers):
-        for w in workers:
+    #     for w in workers:
-            w.start()
+    #         w.start()
-        for article in articles:
+    #     for article in articles:
-            notifier = lambda article: logger.info(f"Completed manual actions for {article}")
+    #         notifier = lambda article: logger.info(f"Completed manual actions for {article}")
-            ArticleWatcher(article, workers_manual = workers, notifier = notifier) # Article watcher wants a thread to link article to TODO: handle threads as a kwarg 
+    #         ArticleWatcher(article, workers_manual = workers, notifier = notifier) # Article watcher wants a thread to link article to TODO: handle threads as a kwarg 
    def article_complete_notifier(self, article):
        if self.worker_slack is None:
            logger.warning("Skipping slack notification because worker is None")
        else:
            self.worker_slack.bot_worker.respond_channel_message(article)
        if self.worker_mail is None:
            logger.warning("Skipping mail notification because worker is None")
        else:
            self.worker_mail.send(article)
        article.sent = True
        article.save()
 if __name__ == "__main__":
-    coordinator = Coordinator()
+    dispatcher = Dispatcher()
    if "upload" in sys.argv:
        class PrintWorker:
            def send(self, article):
                print(f"Uploaded article {article}")
        articles = models.ArticleDownload.select().where(models.ArticleDownload.archive_url == "" or models.ArticleDownload.archive_url == "TODO:UPLOAD").execute()
        logger.info(f"Launching upload to archive for {len(articles)} articles.")
-        coordinator.manual_processing(articles, [UploadWorker()])
+
        dispatcher.workers_in = [{"UploadWorker": UploadWorker()}]
        dispatcher.workers_out = [{"PrintWorker": PrintWorker()}]
        dispatcher.start()
    else: # launch with full action
        slack_runner = slack_runner.BotRunner(coordinator.incoming_request)
        kwargs = {
            "worker_download" : DownloadWorker(),
            "worker_fetch" : FetchWorker(),
            "worker_upload" : UploadWorker(),
            "worker_compress" : CompressWorker(),
            "worker_slack" : slack_runner,
            "worker_mail" : mail_runner,
        }
        try:
-            coordinator.add_workers(**kwargs)
+            slack_runner = SlackRunner.BotRunner(dispatcher.incoming_request)
-            coordinator.start()
+            # All workers are implemented as a threaded queue. But the individual model requires a specific processing order:
            # fetch -> download -> compress -> complete
            # This is reflected in the following list of workers:
            workers_in = [
                OrderedDict({"FetchWorker": FetchWorker(), "DownloadWorker": DownloadWorker(), "CompressWorker": CompressWorker(), "NotifyRunner": "out"}),
                OrderedDict({"UploadWorker": UploadWorker()})
            ]
            # The two dicts are processed independently. First element of first dict is called at the same time as the first element of the second dict
            # Inside a dict, the order of the keys gives the order of execution (only when the first element is done, the second is called, etc...)
            workers_out = [{"SlackRunner": slack_runner},{"MailRunner": MailRunner}]
            dispatcher.workers_in = workers_in
            dispatcher.workers_out = workers_out
            dispatcher.start() # starts the thread, (ie. runs launch())
            slack_runner.start() # last one to start, inside the main thread
        except KeyboardInterrupt:
-            logger.info("Keyboard interrupt. Stopping Slack and Coordinator")
+            logger.info("Keyboard interrupt. Stopping Slack and dispatcher")
            slack_runner.stop()
-            logger.info("BYE!")
+            dispatcher.join()
-            # coordinator was set as a daemon thread, so it will be stopped automatically
+            for w_dict in workers_in:
                for w in w_dict.values():
                    if isinstance(w, Thread):
                        w.stop()
            # All threads are launched as a daemon thread, meaning that any 'leftover' should exit along with the sys call
            sys.exit(0)
--- a/news_fetch/utils_slack/runner.py
+++ b/news_fetch/utils_slack/runner.py
@ -157,7 +157,12 @@ class BotApp(App):
        if say is None:
            say = self.say_substitute
        answers = article.slack_info
-        for a in answers:
+        if article.slack_ts == 0:
            self.logger.error(f"{article} has no slack_ts")
        else:
            self.logger.info("Skipping slack reply because it is broken")
            for a in []:
            # for a in answers:
                if a["file_path"]:
                    try:
                        self.client.files_upload(
@ -166,20 +171,20 @@ class BotApp(App):
                            file = a["file_path"],
                            thread_ts = article.slack_ts_full
                        )
-                    status = True
+                        # status = True
                    except SlackApiError as e: # upload resulted in an error
                        say(
                            "File {} could not be uploaded.".format(a),
                            thread_ts = article.slack_ts_full
                        )
-                    status = False
+                        # status = False
                        self.logger.error(f"File upload failed: {e}")
                else: # anticipated that there is no file!
                    say(
                        f"{a['reply_text']}",
                        thread_ts = article.slack_ts_full
                    )
-                status = True
+                    # status = True
    def startup_status(self):
@ -230,6 +235,9 @@ class BotRunner():
        self.logger.info("Closed Slack-Socketmodehandler")
    def send(self, article):
        """Proxy function to send a message to the slack channel, Called by ArticleWatcher once the Article is ready"""
        self.bot_worker.respond_channel_message(article)
--- a/news_fetch/utils_worker/worker_template.py
+++ b/news_fetch/utils_worker/worker_template.py
@ -7,12 +7,10 @@ class TemplateWorker(Thread):
    """Parent class for any subsequent worker of the article-download pipeline. They should all run in parallel, thus the Thread subclassing"""
    logger = logging.getLogger(__name__)
-    def __init__(self, *args, **kwargs) -> None:
+    def __init__(self, **kwargs) -> None:
        target = self._queue_processor # will be executed on Worker.start()
-        group = kwargs.get("group", None)
+        self.keep_running = True
-        name = kwargs.get("name", None)
+        super().__init__(target=target, daemon=True)
        super().__init__(group=group, target=target, name=name)
        self._article_queue = []
        self.logger.info(f"Worker thread {self.__class__.__name__} initialized successfully")
@ -23,7 +21,7 @@ class TemplateWorker(Thread):
    def _queue_processor(self):
        """This method is launched by thread.run() and idles when self._article_queue is empty. When an external caller appends to the queue it jumps into action"""
-        while True: # PLEASE tell me if I'm missing an obvious better way of doing this!
+        while self.keep_running: # PLEASE tell me if I'm missing an obvious better way of doing this!
            if len(self._article_queue) == 0:
                time.sleep(5)
            else:
@ -39,3 +37,10 @@ class TemplateWorker(Thread):
            article = article_watcher.article
            article = action(article) # action updates the article object but does not save the change
            article.save()
            article_watcher.update(self.__class__.__name__)
    def stop(self):
        self.logger.info(f"Stopping worker {self.__class__.__name__} whith {len(self._article_queue)} articles left in queue")
        self.keep_running = False
        self.join()
--- a/news_fetch/utils_worker/workers.py
+++ b/news_fetch/utils_worker/workers.py
@ -25,7 +25,7 @@ class DownloadWorker(TemplateWorker):
            action = self.dl_runner
        super()._handle_article(article_watcher, action)
-        article_watcher.download_completed = True
+        # article_watcher.download_completed = True
@ -36,7 +36,7 @@ class FetchWorker(TemplateWorker):
    def _handle_article(self, article_watcher):
        action = get_description # function
        super()._handle_article(article_watcher, action)
-        article_watcher.fetch_completed = True
+        # article_watcher.fetch_completed = True
@ -52,7 +52,7 @@ class UploadWorker(TemplateWorker):
            return run_upload(*args, **kwargs)
        super()._handle_article(article_watcher, action)
-        article_watcher.upload_completed = True
+        # article_watcher.upload_completed = True
@ -63,4 +63,4 @@ class CompressWorker(TemplateWorker):
    def _handle_article(self, article_watcher):
        action = shrink_pdf
        super()._handle_article(article_watcher, action)
-        article_watcher.compression_completed = True
+        # article_watcher.compression_completed = True