Working, refactored news_fetch, better documentation for launch

2022-09-08 16:19:15 +02:00 · 2022-09-08 16:19:15 +02:00 · afead44d6c
commit afead44d6c
parent 713406dc67
14 changed files with 220 additions and 247 deletions
--- a/README.md
+++ b/README.md
@ -11,18 +11,15 @@ A utility to
 ... fully automatically. Run it now, thank me later.

 ---
-## Running - Docker compose 

-The included `docker-compose` file is now necessary for easy orchestration of the various services. 
+## Running - through launch file
+> Prerequisite: make `launch.cexecutable:
+> 
+> `chmod +x launch`

-All relevant passthroughs and mounts are specified through the env-file, for which I configured 4 versions: 
+Execute the file by runnning `./launch`. This won't do anything in itself. You need to specify a mode, and then a command

-* production
-* debug (development in general)
-* upload
-* check 
-
-These files will have to be adapted to your individual setup but won't change significantly once set up.
+`./launch <mode> <command> <command options>`

 ### Overview of the modes

@ -30,47 +27,67 @@ The production mode performs all automatic actions and therfore does not require

 The debug mode is more sophisticated and allows for big code changes without the need to recompile. It directly mounts the code-directory into the cotainer. As a failsafe the environment-variable `DEBUG=true` is set. The whole utility is then run on a sandbox environment (slack-channel, database, email) so that Dirk is not affected by any mishaps.

-The check mode is less sophisticated but shows the downloaded articles to the host for visual verification. This requires passthroughs for X11.
-
-Upload mode is much simpler, it goes over the exisiting database and operates on the articles, where the upload to archive.org has not yet occured (archive.org is slow and the other operations usually finish before the queue was consumed). It retries their upload. 
-
-* For normal `production` mode run:
-
-    `docker compose --env-file env/production run news_fetch`
+Two additional 'modes' are `build` and `down`. Build rebuilds the container, which is necessary after code changes. Down ensures a clean shutdown of *all* containers. Usually the launch-script handles this already but it sometimes fails, in which case `down` needs to be called again.


-* For `debug` mode run:
+### Overview of the commands

-    `docker compose --env-file env/debug run news_fetch`
+In essence a command is simply a service from docker-compose, which is run in an interactive environment. As such all services defined in `docker-compose.yaml` can be called as commands. Only two of them will be of real use:

-    which drops you into an interactive shell (`ctrl+d` to exit the container shell).
+`news_fetch` does the majority of the actions mentionned above. By default, that is without any options, it runs a metadata-fetch, download, compression, and upload to archive.org. The upload is usually the slowest which is why articles that are processed but don't yet have an archive.org url tend to pile up. You can therefore specify the option `upload` which only starts the upload for the concerned articles, as a catch-up if you will.

-    > Note:
-    > The live-mounted code is now under `/code`. Note that the `DEBUG=true` environment variable is still set. If you want to test things on production, run `export DEBUG=false`. Running `python runner.py` will now run the newly written code but, with the production database and storage.
+Example usage:

-* For `check` mode, some env-variables are also changed and you still require interactivity. You don't need the geckodriver service however. The simplest way is to run
+```bash
+./launch production news_fetch # full mode
+./launch production news_fetch upload # upload mode (lighter resource usage)
+./launch debug news_fetch # debug mode, which drops you inside a new shell

-    `docker compose --env-file env/check run --no-deps --rm news_fetch`
+./launch production news_check
+```

-* Finally, for `upload` mode no interactivity is required and no additional services are required. Simply run:
+`news_check` starts a webapp, accessible under [http://localhost:8080](http://localhost:8080) and allows you to easily check the downloaded articles.

-    `docker compose --env-file env/upload run --no-deps --rm news_fetch`

-### Stopping
-Run 
+## (Running - Docker compose) 
+> I strongly recommend sticking to the usage of `./launch`.

-`docker compose --env-file env/production down`
+Instead of using the launch file you can manually issue `docker compose` comands. Example: check for logs.
+
+All relevant mounts and env-variables are easiest specified through the env-file, for which I configured 2 versions: 
+
+* production
+* debug (development in general)
+
+These files will have to be adapted to your individual setup but won't change significantly once set up.
+
+Example usage:
+
+```bash
+docker compose --env-file env/production run news_fetch # full mode
+docker compose --env-file env/production run news_fetch upload # upload mode (lighter resource usage)
+docker compose --env-file env/debug run news_fetch # debug mode, which drops you inside a new shell
+
+docker copose --env-file env/production news_check
+
+# Misc:
+docker compose --env-file env/production up # starts all services and shows their combined logs
+docker compose --env-file env/production logs -f news_fetch # follows along with the logs of only one service
+docker compose --env-file env/production down
+```

-which terminates all containers associated with the `docker-compose.yaml`. 

 ## Building

-> The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly clean build the docker image! This is also crucial to update the code itself.
+> The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly re build the docker image! This is also crucial to update the code itself.

 In docker compose, run 

 `docker compose --env-file env/production build`

+Or simpler, just run
+
+`./launch build`



@ -80,6 +97,10 @@ In docker compose, run


 ## Manual Sync to NAS:
+Manual sync is sadly still necessary, as the lsync client, sometimes gets overwhelmed by quick writes.
+
 I use `rsync`. Mounting the NAS locally, I navigate to the location of the local folder (notice the trailing slash). Then run
 `rsync -Razq --no-perms --no-owner --no-group --temp-dir=/tmp --progress --log-file=rsync.log <local folder>/ "<remote>"`
 where `<remote>` is the location where the NAS is mounted. (options:`R` - relative paths  , `a` - archive mode (multiple actions), `z` - ??, `q` - quiet. We also don't copy most of the metadata and we keep a log of the transfers.)
+
+You can also use your OS' native copy option and select *de not overwrite*. This should only copy the missing files, significantly speeding up the operation.
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -38,6 +38,7 @@ services:
    environment:
      - START_VNC=${HEADFULL-false} # as opposed to headless, used when requiring supervision (eg. for websites that crash)
      - START_XVFB=${HEADFULL-false}
+      - SE_VNC_NO_PASSWORD=1
    expose: ["4444"] # exposed to other docker-compose services only
    ports:
      - 7900:7900 # port for webvnc
--- a/env/check
+++ b/env/check
@ -1,15 +0,0 @@
-# Does not run any downloads but displays the previously downloaded but not yet checked files. Requires display-acces via xauth
-
-CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
-
-XAUTHORTIY=$XAUTHORTIY
-XSOCK=/tmp/.X11-unix
-
-DEBUG=false
-CHECK=true
-HEADLESS=true
-UPLOAD=false
-REDUCEDFETCH=false
-
-# ENTRYPOINT="/bin/bash"
-INTERACTIVE=true
--- a/env/debug
+++ b/env/debug
@ -1,14 +1,10 @@
 # Runs in a debugging mode, does not launch anything at all but starts a bash process

-CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
+export CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
+export UNAME=remy

-CODE=./
-
-DEBUG=true
-CHECK=false
-UPLOAD=false
-HEADLESS=false
-REDUCEDFETCH=false
-
-ENTRYPOINT="/bin/bash"
-INTERACTIVE=true
+export GECKODRIVER_IMG=selenium/standalone-firefox:104.0
+export DEBUG=true
+export HEADFULL=true
+export CODE=./
+export ENTRYPOINT=/bin/bash
--- a/env/production
+++ b/env/production
@ -2,9 +2,6 @@

 CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving

-CONTAINERS_TO_RUN=nas_sync, geckodriver
-DEBUG=false
-CHECK=false
-UPLOAD=false
-HEADLESS=true
-REDUCEDFETCH=true
+export UNAME=remy
+export GECKODRIVER_IMG=selenium/standalone-firefox:104.0
+export DEBUG=false
--- a/env/upload
+++ b/env/upload
@ -1,10 +0,0 @@
-# Does not run any other workers and only upploads to archive the urls that weren't previously uploaded
-
-CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
-
-NEWS_FETCH_DEPENDS_ON="[]"
-DEBUG=false
-CHECK=false
-UPLOAD=true
-HEADLESS=true
-REDUCEDFETCH=false
--- a/2
+++ b/2
@ -9,7 +9,7 @@ echo "Bash script launching COSS_ARCHIVING..."
 export CONTAINER_DATA=~/Bulk/COSS/Downloads/coss_archiving
 export UNAME=remy
 # CHANGE ME WHEN UPDATING FIREFOX
-export GECKODRIVER_IMG=selenium/standalone-firefox:103.0
+export GECKODRIVER_IMG=selenium/standalone-firefox:104.0
 # version must be >= than the one on the host or firefox will not start (because of mismatched config)

 if [[ $1 == "debug" ]]
--- a/news_check/client/src/ArticleStatus.svelte
+++ b/news_check/client/src/ArticleStatus.svelte
@ -25,7 +25,7 @@
                    <td>{ item.name }</td>
                    <!-- <td>Quality Control Specialist</td> -->
                    {#if item.value != ""}
-                        <td class='bg-emerald-200' style="white-space: normal">{ item.value }</td>
+                        <td class='bg-emerald-200' style="white-space: normal; width:70%">{ item.value }</td>
                    {:else}
                        <td class='bg-red-200'>{ item.value }</td>
                    {/if}
--- a/news_fetch/Dockerfile
+++ b/news_fetch/Dockerfile
@ -2,6 +2,8 @@ FROM python:latest

 ENV TZ Europe/Zurich

+RUN apt-get update && apt-get install -y ghostscript
+# for compression of pdfs

 RUN useradd --create-home --shell /bin/bash --uid 1001 autonews
 # id mapped to local user
--- a/news_fetch/configuration.py
+++ b/news_fetch/configuration.py
@ -1,7 +1,8 @@
 import os
-import shutil
 import configparser
 import logging
+import time
+import shutil
 from datetime import datetime
 from peewee import SqliteDatabase, PostgresqlDatabase
 from rich.logging import RichHandler
@ -41,6 +42,7 @@ if os.getenv("DEBUG", "false") == "true":
 else:
    logger.warning("Found 'DEBUG=false' and running on production databases, I hope you know what you're doing...")
    
+    time.sleep(10) # wait for the vpn to connect (can't use a healthcheck because there is no depends_on)
    cred = db_config["DATABASE"]
    download_db = PostgresqlDatabase(
        cred["db_name"], user=cred["user_name"], password=cred["password"], host="vpn", port=5432
--- a/news_fetch/runner.py
+++ b/news_fetch/runner.py
@ -3,125 +3,91 @@ import configuration
 models = configuration.models
 from threading import Thread
 import logging
-import sys
 logger = logging.getLogger(__name__)
+import sys
+from collections import OrderedDict

-from utils_mail import runner as mail_runner
-from utils_slack import runner as slack_runner
+
+from utils_mail import runner as MailRunner
+from utils_slack import runner as SlackRunner
 from utils_worker.workers import CompressWorker, DownloadWorker, FetchWorker, UploadWorker


 class ArticleWatcher:
    """Wrapper for a newly created article object. Notifies the coordinator upon change/completition"""
-    def __init__(self, article, **kwargs) -> None:
-        self.article_id = article.id # in case article becomes None at any point, we can still track the article
+    def __init__(self, article, workers_in, workers_out) -> None:
        self.article = article

-        self.completition_notifier = kwargs.get("notifier")
-        self.fetch = kwargs.get("worker_fetch", None)
-        self.download = kwargs.get("worker_download", None)
-        self.compress = kwargs.get("worker_compress", None)
-        self.upload = kwargs.get("worker_upload", None)
+        self.workers_in = workers_in
+        self.workers_out = workers_out

        self.completition_notified = False
-        # self._download_called = self._compression_called = False
-        self._fetch_completed = self._download_completed = self._compression_completed = self._upload_completed = False

-        # first step: gather metadata
-        if self.fetch and self.upload:
-            self.fetch.process(self) # this will call the update_status method
-            self.upload.process(self) # idependent from the rest
-        else: # the full kwargs were not provided, only do a manual run
-            # overwrite update_status() because calls from the workers will result in erros
-            self.update_status = lambda completed: logger.info(f"Completed action {completed}")
-            for w in kwargs.get("workers_manual"):
-                w.process(self)
+        for w_dict in self.workers_in:
+            worker = self.get_next_worker(w_dict) # gets the first worker of each dict (they get processed independently)
+            worker.process(self)


-    def update_status(self, completed_action):
-        """Checks and notifies internal completition-status.
-        Article download is complete iff fetch and download were successfull and compression was run
-        """
-        # if self.completition_notified and self._compression_completed and self._fetch_completed and self._download_completed and self._upload_completed, we are done
-        if completed_action == "fetch":
-            self.download.process(self)
-        elif completed_action == "download":
-            self.compress.process(self)
-        elif completed_action == "compress": # last step
-            self.completition_notifier(self.article)
-            # triggers action in Coordinator
-        elif completed_action == "upload":
-            # this case occurs when upload was faster than compression
-            pass
-        else:
-            logger.warning(f"update_status called with unusual configuration: {completed_action}")
+    def get_next_worker(self, worker_dict, worker_name=""):
+        """Returns the worker coming after the one with key worker_name"""
+
+        if worker_name == "": # first one
+            return worker_dict[list(worker_dict.keys())[0]]
+        # for i,w_dict in enumerate(workers_list):
+        keys = list(worker_dict.keys())
+        next_key_ind = keys.index(worker_name) + 1
+        try:
+            key = keys[next_key_ind]
+            return worker_dict[key]
+        except IndexError:
+            return None


-    # ====== Attributes to be modified by the util workers
-    @property
-    def fetch_completed(self):
-        return self._fetch_completed
+    def update(self, worker_name):
+        """Called by the workers to notify the watcher of a completed step"""
+        for w_dict in self.workers_in:
+            if worker_name in w_dict.keys():
+                next_worker = self.get_next_worker(w_dict, worker_name)
+                if next_worker:
+                    if next_worker == "out":
+                        self.completion_notifier()
+                    else: # it's just another in-worker
+                        next_worker.process(self)
+                else: # no next worker, we are done
+                    logger.info(f"No worker after {worker_name}")
        
-    @fetch_completed.setter
-    def fetch_completed(self, value: bool):
-        self._fetch_completed = value
-        self.update_status("fetch")

-    @property
-    def download_completed(self):
-        return self._download_completed 
+    def completion_notifier(self):
+        """Triggers the out-workers to process the article, that is to send out a message"""
+        for w_dict in self.workers_out:
+            worker = self.get_next_worker(w_dict)
+            worker.send(self.article)
+            self.article.sent = True
+            self.article.save()

-    @download_completed.setter
-    def download_completed(self, value: bool):
-        self._download_completed = value
-        self.update_status("download")
-
-    @property
-    def compression_completed(self):
-        return self._compression_completed
-
-    @compression_completed.setter
-    def compression_completed(self, value: bool):
-        self._compression_completed = value
-        self.update_status("compress")
-
-    @property
-    def upload_completed(self):
-        return self._upload_completed
-
-    @upload_completed.setter
-    def upload_completed(self, value: bool):
-        self._upload_completed = value
-        self.update_status("upload")

    def __str__(self) -> str:
-        return f"Article with id {self.article_id}"
+        return f"ArticleWatcher with id {self.article_id}"


-class Coordinator(Thread):
-    def __init__(self, **kwargs) -> None:
-        """Launcher calls this Coordinator as the main thread to handle connections between the other workers (threaded)."""
-        super().__init__(target = self.launch, daemon=True)

-    def add_workers(self, **kwargs):
-        self.worker_slack = kwargs.pop("worker_slack", None) 
-        self.worker_mail = kwargs.pop("worker_mail", None)
-        # the two above won't be needed in the Watcher
-        self.worker_download = kwargs.get("worker_download", None)
-        self.worker_fetch = kwargs.get("worker_fetch", None)
-        self.worker_compress = kwargs.get("worker_compress", None)
-        self.worker_upload = kwargs.get("worker_upload", None)
+class Dispatcher(Thread):
+    def __init__(self) -> None:
+        """Thread to handle handle incoming requests and control the workers"""
+        self.workers_in = []
+        self.workers_out = []
+        super().__init__(target = self.launch)

-        self.kwargs = kwargs

    def launch(self) -> None:
-        for w in [self.worker_download, self.worker_fetch, self.worker_upload, self.worker_compress]:
-            if not w is None: # for reduced operations such as upload, some workers are set to None
-                w.start()
+        # start workers (each worker is a thread)
+        for w_dict in self.workers_in: # for reduced operations such as upload, some workers are not set
+            for w in w_dict.values():
+                if isinstance(w, Thread):
+                    w.start()

-        # if past messages have not been sent, they must be reevaluated
-        unsent = models.ArticleDownload.filter(sent = False)
-        # .objects.filter(sent = False)
+        # get all articles not fully processed
+        unsent = models.ArticleDownload.filter(sent = False) # if past messages have not been sent, they must be reevaluated
        for a in unsent:
            self.incoming_request(article=a)

@ -136,82 +102,82 @@ class Coordinator(Thread):
                return
            article, is_new = models.ArticleDownload.get_or_create(article_url=url)
            article.slack_ts = message.ts # either update the timestamp (to the last reference to the article) or set it for the first time
+            article.save()
        elif article is not None:
            is_new = False
            logger.info(f"Received article {article} in incoming_request")
        else:
-            logger.error("Coordinator.incoming_request called with no arguments")
+            logger.error("Dispatcher.incoming_request called with no arguments")
            return

-        self.kwargs.update({"notifier" : self.article_complete_notifier})
-
        if is_new or (article.file_name == "" and article.verified == 0):
            # check for models that were created but were abandonned. This means they have missing information, most importantly no associated file
            # this overwrites previously set information, but that should not be too important
            ArticleWatcher(
                article,
-                **self.kwargs   
+                workers_in=self.workers_in,
+                workers_out=self.workers_out,
            )

-            # All workers are implemented as a threaded queue. But the individual model requires a specific processing order:
-            # fetch -> download -> compress -> complete
-            # the watcher orchestrates the procedure and notifies upon completition
-            # the watcher will notify once it is sufficiently populated
        else: # manually trigger notification immediatly
            logger.info(f"Found existing article {article}. Now sending")
            self.article_complete_notifier(article)



-    def manual_processing(self, articles, workers):
-        for w in workers:
-            w.start()
+    # def manual_processing(self, articles, workers):
+    #     for w in workers:
+    #         w.start()

-        for article in articles:
-            notifier = lambda article: logger.info(f"Completed manual actions for {article}")
-            ArticleWatcher(article, workers_manual = workers, notifier = notifier) # Article watcher wants a thread to link article to TODO: handle threads as a kwarg 
-
-    def article_complete_notifier(self, article):
-        if self.worker_slack is None:
-            logger.warning("Skipping slack notification because worker is None")
-        else:
-            self.worker_slack.bot_worker.respond_channel_message(article)
-        if self.worker_mail is None:
-            logger.warning("Skipping mail notification because worker is None")
-        else:
-            self.worker_mail.send(article)
-        
-        article.sent = True
-        article.save()
+    #     for article in articles:
+    #         notifier = lambda article: logger.info(f"Completed manual actions for {article}")
+    #         ArticleWatcher(article, workers_manual = workers, notifier = notifier) # Article watcher wants a thread to link article to TODO: handle threads as a kwarg 



 if __name__ == "__main__":
-    coordinator = Coordinator()
-
+    dispatcher = Dispatcher()

    if "upload" in sys.argv:
+        class PrintWorker:
+            def send(self, article):
+                print(f"Uploaded article {article}")
+
        articles = models.ArticleDownload.select().where(models.ArticleDownload.archive_url == "" or models.ArticleDownload.archive_url == "TODO:UPLOAD").execute()
        logger.info(f"Launching upload to archive for {len(articles)} articles.")
-        coordinator.manual_processing(articles, [UploadWorker()])
+
+        dispatcher.workers_in = [{"UploadWorker": UploadWorker()}]
+        dispatcher.workers_out = [{"PrintWorker": PrintWorker()}]
+        dispatcher.start()

    else: # launch with full action
-        slack_runner = slack_runner.BotRunner(coordinator.incoming_request)
-        kwargs = {
-            "worker_download" : DownloadWorker(),
-            "worker_fetch" : FetchWorker(),
-            "worker_upload" : UploadWorker(),
-            "worker_compress" : CompressWorker(),
-            "worker_slack" : slack_runner,
-            "worker_mail" : mail_runner,
-        }
        try:
-            coordinator.add_workers(**kwargs)
-            coordinator.start()
+            slack_runner = SlackRunner.BotRunner(dispatcher.incoming_request)
+            # All workers are implemented as a threaded queue. But the individual model requires a specific processing order:
+            # fetch -> download -> compress -> complete
+            # This is reflected in the following list of workers:
+            workers_in = [
+                OrderedDict({"FetchWorker": FetchWorker(), "DownloadWorker": DownloadWorker(), "CompressWorker": CompressWorker(), "NotifyRunner": "out"}),
+                OrderedDict({"UploadWorker": UploadWorker()})
+            ]
+            # The two dicts are processed independently. First element of first dict is called at the same time as the first element of the second dict
+            # Inside a dict, the order of the keys gives the order of execution (only when the first element is done, the second is called, etc...)
+            
+            workers_out = [{"SlackRunner": slack_runner},{"MailRunner": MailRunner}]
+
+            dispatcher.workers_in = workers_in
+            dispatcher.workers_out = workers_out
+            
+            dispatcher.start() # starts the thread, (ie. runs launch())
            slack_runner.start() # last one to start, inside the main thread
        except KeyboardInterrupt:
-            logger.info("Keyboard interrupt. Stopping Slack and Coordinator")
+            logger.info("Keyboard interrupt. Stopping Slack and dispatcher")
            slack_runner.stop()
-            logger.info("BYE!")
-            # coordinator was set as a daemon thread, so it will be stopped automatically
+            dispatcher.join()
+            for w_dict in workers_in:
+                for w in w_dict.values():
+                    if isinstance(w, Thread):
+                        w.stop()
+
+            # All threads are launched as a daemon thread, meaning that any 'leftover' should exit along with the sys call
            sys.exit(0)
--- a/news_fetch/utils_slack/runner.py
+++ b/news_fetch/utils_slack/runner.py
@ -157,29 +157,34 @@ class BotApp(App):
        if say is None:
            say = self.say_substitute
        answers = article.slack_info
-        for a in answers:
-            if a["file_path"]:
-                try:
-                    self.client.files_upload(
-                        channels = config["archive_id"],
-                        initial_comment = f"{a['reply_text']}",
-                        file = a["file_path"],
-                        thread_ts = article.slack_ts_full
-                    )
-                    status = True
-                except SlackApiError as e: # upload resulted in an error
+        if article.slack_ts == 0:
+            self.logger.error(f"{article} has no slack_ts")
+        else:
+            self.logger.info("Skipping slack reply because it is broken")
+            for a in []:
+            # for a in answers:
+                if a["file_path"]:
+                    try:
+                        self.client.files_upload(
+                            channels = config["archive_id"],
+                            initial_comment = f"{a['reply_text']}",
+                            file = a["file_path"],
+                            thread_ts = article.slack_ts_full
+                        )
+                        # status = True
+                    except SlackApiError as e: # upload resulted in an error
+                        say(
+                            "File {} could not be uploaded.".format(a),
+                            thread_ts = article.slack_ts_full
+                        )
+                        # status = False
+                        self.logger.error(f"File upload failed: {e}")
+                else: # anticipated that there is no file!
                    say(
-                        "File {} could not be uploaded.".format(a),
+                        f"{a['reply_text']}",
                        thread_ts = article.slack_ts_full
                    )
-                    status = False
-                    self.logger.error(f"File upload failed: {e}")
-            else: # anticipated that there is no file!
-                say(
-                    f"{a['reply_text']}",
-                    thread_ts = article.slack_ts_full
-                )
-                status = True
+                    # status = True
        

    def startup_status(self):
@ -230,6 +235,9 @@ class BotRunner():
        self.logger.info("Closed Slack-Socketmodehandler")


+    def send(self, article):
+        """Proxy function to send a message to the slack channel, Called by ArticleWatcher once the Article is ready"""
+        self.bot_worker.respond_channel_message(article)



--- a/news_fetch/utils_worker/worker_template.py
+++ b/news_fetch/utils_worker/worker_template.py
@ -7,12 +7,10 @@ class TemplateWorker(Thread):
    """Parent class for any subsequent worker of the article-download pipeline. They should all run in parallel, thus the Thread subclassing"""
    logger = logging.getLogger(__name__)

-    def __init__(self, *args, **kwargs) -> None:
+    def __init__(self, **kwargs) -> None:
        target = self._queue_processor # will be executed on Worker.start()
-        group = kwargs.get("group", None)
-        name = kwargs.get("name", None)
-
-        super().__init__(group=group, target=target, name=name)
+        self.keep_running = True
+        super().__init__(target=target, daemon=True)
        self._article_queue = []
        self.logger.info(f"Worker thread {self.__class__.__name__} initialized successfully")

@ -23,7 +21,7 @@ class TemplateWorker(Thread):

    def _queue_processor(self):
        """This method is launched by thread.run() and idles when self._article_queue is empty. When an external caller appends to the queue it jumps into action"""
-        while True: # PLEASE tell me if I'm missing an obvious better way of doing this!
+        while self.keep_running: # PLEASE tell me if I'm missing an obvious better way of doing this!
            if len(self._article_queue) == 0:
                time.sleep(5)
            else:
@ -39,3 +37,10 @@ class TemplateWorker(Thread):
            article = article_watcher.article
            article = action(article) # action updates the article object but does not save the change
            article.save()
+            article_watcher.update(self.__class__.__name__)
+
+
+    def stop(self):
+        self.logger.info(f"Stopping worker {self.__class__.__name__} whith {len(self._article_queue)} articles left in queue")
+        self.keep_running = False
+        self.join()
--- a/news_fetch/utils_worker/workers.py
+++ b/news_fetch/utils_worker/workers.py
@ -25,7 +25,7 @@ class DownloadWorker(TemplateWorker):
            action = self.dl_runner

        super()._handle_article(article_watcher, action)
-        article_watcher.download_completed = True
+        # article_watcher.download_completed = True



@ -36,7 +36,7 @@ class FetchWorker(TemplateWorker):
    def _handle_article(self, article_watcher):
        action = get_description # function
        super()._handle_article(article_watcher, action)
-        article_watcher.fetch_completed = True
+        # article_watcher.fetch_completed = True



@ -52,7 +52,7 @@ class UploadWorker(TemplateWorker):
            return run_upload(*args, **kwargs)

        super()._handle_article(article_watcher, action)
-        article_watcher.upload_completed = True
+        # article_watcher.upload_completed = True



@ -63,4 +63,4 @@ class CompressWorker(TemplateWorker):
    def _handle_article(self, article_watcher):
        action = shrink_pdf
        super()._handle_article(article_watcher, action)
-        article_watcher.compression_completed = True
+        # article_watcher.compression_completed = True