Switched to docker compose and wasted hours trying to have standalone firefox
This commit is contained in:
		| @@ -1 +1,2 @@ | ||||
| .dev/ | ||||
| __pycache__/ | ||||
							
								
								
									
										10
									
								
								Dockerfile
									
									
									
									
									
								
							
							
						
						
									
										10
									
								
								Dockerfile
									
									
									
									
									
								
							| @@ -1,6 +1,8 @@ | ||||
| FROM python:latest | ||||
|  | ||||
| ENV TZ Euopre/Zurich | ||||
|  | ||||
|  | ||||
| RUN echo "deb http://deb.debian.org/debian/ unstable main contrib non-free" >> /etc/apt/sources.list | ||||
| RUN apt-get update && apt-get install -y \ | ||||
| evince \ | ||||
| @@ -16,7 +18,6 @@ RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.31.0/geckod | ||||
| RUN tar -x geckodriver -zf geckodriver-v0.31.0-linux64.tar.gz -O > /usr/bin/geckodriver | ||||
| RUN chmod +x /usr/bin/geckodriver | ||||
| RUN rm geckodriver-v0.31.0-linux64.tar.gz | ||||
| RUN echo "127.0.0.1 localhost" >> /etc/hosts | ||||
|  | ||||
|  | ||||
| RUN useradd --create-home --shell /bin/bash --uid 1001 autonews | ||||
| @@ -24,15 +25,12 @@ RUN useradd --create-home --shell /bin/bash --uid 1001 autonews | ||||
| # home directory needed for pip package installation | ||||
| RUN mkdir -p /app/auto_news | ||||
| RUN chown -R autonews:autonews /app | ||||
|  | ||||
|  | ||||
| USER autonews | ||||
| RUN export PATH=/home/autonews/.local/bin:$PATH | ||||
|  | ||||
|  | ||||
| COPY requirements.txt /app/ | ||||
| RUN python3 -m pip install -r /app/requirements.txt | ||||
| COPY app /app/auto_news | ||||
| WORKDIR /app/auto_news | ||||
|  | ||||
| RUN python3 -m pip install -r requirements.txt | ||||
|  | ||||
| ENTRYPOINT ["python3", "runner.py"] | ||||
							
								
								
									
										47
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										47
									
								
								README.md
									
									
									
									
									
								
							| @@ -3,7 +3,8 @@ | ||||
| A utility to fetch article requests from slack and generate pdfs for them, fully automatically. | ||||
|  | ||||
|  | ||||
| ## Running | ||||
| ## Running - Pure docker | ||||
| > I recommend running with docker compose instead | ||||
| ### How to run - auto archiving mode | ||||
| In this mode the program is launched as a docker container, in a headless mode. For persistence purposes a local storage volume is required, but that's it! | ||||
|  | ||||
| @@ -15,6 +16,12 @@ You can specify additional parameters: | ||||
|  | ||||
| `docker run -it -v <your storage>:/app/file_storage/ auto_news upload` catches up on incomplete uploads to archive. | ||||
|  | ||||
| `docker run -it -v <your storage>:/app/file_storage/ auto_news reducedfetch` makes assumption about the status of the slack chat and greatly reduces the number of api calls (faster start up). | ||||
|  | ||||
| These parameters can be combined (mostyl for testing I guess) | ||||
|  | ||||
| Finally for manual file verification: | ||||
|  | ||||
| `docker run -it -v <your storage>:/app/file_storage/ -e DISPLAY=":0" --network host -v \$XAUTHORITY:/root/.Xauthority auto_news check` lets you visually verify the downloaded files. The additional parameters are required in order to open guis on the host. | ||||
|  | ||||
|  | ||||
| @@ -24,33 +31,51 @@ In this mode, a docker container is launched with an additional volume, the loca | ||||
| `docker run -it -v <your storage>:/app/file_storage/ -v <your code>:/code/ --entry-point /bin/bash auto_news` | ||||
| You are droppped into a bash shell, in which you can navigate to the `/code` directory and then test live. | ||||
|  | ||||
| ### Cheat-sheet Remy: | ||||
|  | ||||
| `docker run -it -v /mnt/Data/COSS/Downloads/auto_news.container/:/app/file_storage/ auto_news` | ||||
|  | ||||
| `docker run -it -v /mnt/Data/COSS/Downloads/auto_news.container/:/app/file_storage/ -v /mnt/Data/COSS/Development/auto_news/app:/code --entrypoint /bin/bash auto_news` | ||||
|  | ||||
|  | ||||
| `docker run -it -v /mnt/Data/COSS/Downloads/auto_news.container/:/app/file_storage/ -e DISPLAY=":0" --network host -v XAUTHORITY:/root/.Xauthority auto_news check` | ||||
|  | ||||
|  | ||||
| ## Running - Docker compose  | ||||
|  | ||||
| I also wrote a rudimentary docker compose file which makes running much more simple. Just run | ||||
|  | ||||
| `docker compose --env-file <desired mode> up` | ||||
|  | ||||
| All relevant passthroughs and mounts are specified through the env-file, for which I configured 4 versions: production, debug (development in general), upload and check. These files will have to be adapted to your individual setup but can be reused more easily. | ||||
|  | ||||
| > Note: | ||||
| > | ||||
| > The `debug` requires additional input. Once `docker compose up` is running, in a new session run `docker compose --env-file env/debug exec bash`. The live-mounted code is then under `/code`. Note that the `DEBUG=true` environment variable is still set. If you want to test things on production, run `export DEBUG=false`.  | ||||
|  | ||||
|  | ||||
| ## Building | ||||
|  | ||||
| ### Things to keep in mind | ||||
| The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly clean build the docker image! This is also crucial to update the code itself. | ||||
| > The software (firefox, selenium, python) changes frequently. For non-breaking changes it is useful to regularly clean build the docker image! This is also crucial to update the code itself. | ||||
|  | ||||
| In docker, simply run: | ||||
|  | ||||
| `docker build -t auto_news --no-cache .` | ||||
|  | ||||
| where the `Dockerfile` has to be in the working directory | ||||
|  | ||||
| In docker compose, run the usual command, but append  | ||||
|  | ||||
| `docker compose ... up --build` | ||||
|  | ||||
|  | ||||
| ## Cheat-sheet Remy: | ||||
|  | ||||
| `docker run -it -v /mnt/Data/COSS/CONTAINERDATA/:/app/file_storage/ auto_news` | ||||
|  | ||||
| `docker run -it -v /mnt/Data/COSS/CONTAINERDATA/:/app/file_storage/ -v /mnt/Data/COSS/auto_news/app:/code --entrypoint /bin/bash auto_news` | ||||
|  | ||||
|  | ||||
| `docker run -it -v /mnt/Data/COSS/CONTAINERDATA/:/app/file_storage/ -e DISPLAY=":0" --network host -v XAUTHORITY:/root/.Xauthority auto_news check` | ||||
|  | ||||
|  | ||||
|  | ||||
| ## Roadmap: | ||||
|  | ||||
| [ ] automatically upload files to NAS | ||||
| [ ] handle paywalled sites like faz, spiegel, .. through their dedicated edu-sites | ||||
|  | ||||
| [ ] handle paywalled sites like faz, spiegel, .. through their dedicated edu-friendly sites | ||||
| ... | ||||
| @@ -1,5 +1,4 @@ | ||||
| import os | ||||
| import sys | ||||
| import configparser | ||||
| import logging | ||||
| from peewee import SqliteDatabase | ||||
| @@ -19,18 +18,18 @@ logger = logging.getLogger(__name__) | ||||
| parsed = configparser.ConfigParser() | ||||
| parsed.read("/app/file_storage/config.ini") | ||||
|  | ||||
| if "debug" in sys.argv: | ||||
|     logger.warning("Running in debugging mode because launched with argument 'debug'") | ||||
|     # parsed.read("/code/config.ini") | ||||
| if os.getenv("DEBUG", "false") == "true": | ||||
|     logger.warning("Found 'DEBUG=true', setting up dummy databases") | ||||
|  | ||||
|     db_base_path = parsed["DATABASE"]["db_path_dev"] | ||||
|     parsed["SLACK"]["archive_id"] = parsed["SLACK"]["debug_id"] | ||||
|     parsed["MAIL"]["recipient"] = parsed["MAIL"]["sender"]  | ||||
| else: | ||||
|     logger.warning("Using production values, I hope you know what you're doing...") | ||||
|     logger.warning("Found 'DEBUG=false' and running on production databases, I hope you know what you're doing...") | ||||
|  | ||||
|     db_base_path = parsed["DATABASE"]["db_path_prod"] | ||||
|  | ||||
|  | ||||
| from utils_storage import models | ||||
|  | ||||
| # Set up the database | ||||
|   | ||||
| @@ -1,9 +1,9 @@ | ||||
| """Main coordination of other util classes. Handles inbound and outbound calls""" | ||||
| import configuration | ||||
| models = configuration.models | ||||
| import sys | ||||
| from threading import Thread | ||||
| import logging | ||||
| import os | ||||
| logger = logging.getLogger(__name__) | ||||
|  | ||||
| from utils_mail import runner as mail_runner | ||||
| @@ -172,12 +172,12 @@ if __name__ == "__main__": | ||||
|     coordinator = Coordinator() | ||||
|  | ||||
|  | ||||
|     if "upload" in sys.argv: | ||||
|     if os.getenv("UPLOAD", "false") == "true": | ||||
|         articles = models.ArticleDownload.select().where(models.ArticleDownload.archive_url == "").execute() | ||||
|         logger.info(f"Launching upload to archive for {len(articles)} articles.") | ||||
|         coordinator.manual_processing(articles, [UploadWorker()]) | ||||
|  | ||||
|     elif "check" in sys.argv: | ||||
|     elif os.getenv("CHECK", "false") == "true": | ||||
|         from utils_check import runner as check_runner | ||||
|         check_runner.verify_unchecked() | ||||
|  | ||||
|   | ||||
| @@ -3,7 +3,6 @@ import configuration | ||||
| import requests | ||||
| import os | ||||
| import time | ||||
| import sys | ||||
| from threading import Thread | ||||
| from slack_sdk.errors import SlackApiError | ||||
|  | ||||
| @@ -30,10 +29,10 @@ def init(client) -> None: | ||||
|     t = Thread(target = fetch_missed_channel_reactions) # threaded, runs in background (usually takes a long time) | ||||
|     t.start() | ||||
|  | ||||
|     if "reducedfetch" in sys.argv: | ||||
|         logger.warning("Only fetching empty threads for bot messages because of argument 'reducedfetch'") | ||||
|     if os.getenv("REDUCEDFETCH", "false") == "true": | ||||
|         logger.warning("Only fetching empty threads for bot messages because 'REDUCEDFETCH=true'") | ||||
|         fetch_missed_thread_messages(reduced=True) | ||||
|     else:    # perform these two asyncronously | ||||
|     else: # perform both asyncronously | ||||
|         fetch_missed_thread_messages() | ||||
|      | ||||
|  | ||||
|   | ||||
| @@ -2,7 +2,6 @@ import time | ||||
| import datetime | ||||
| import logging | ||||
| import os | ||||
| import sys | ||||
| import base64 | ||||
| import requests | ||||
| from selenium import webdriver | ||||
| @@ -20,28 +19,34 @@ class PDFDownloader: | ||||
|     running = False | ||||
|      | ||||
|     def start(self): | ||||
|         options=Options() | ||||
|         try: | ||||
|             self.finish() | ||||
|         except: | ||||
|             self.logger.info("gecko driver not yet running") | ||||
|         options = webdriver.FirefoxOptions() | ||||
|         options.profile = config["browser_profile_path"] | ||||
|         if "notheadless" in sys.argv: | ||||
|             self.logger.warning("Opening browser GUI because of Argument 'notheadless'") | ||||
|         else: | ||||
|         # should be options.set_preference("profile", config["browser_profile_path"]) as of selenium 4 but that doesn't work | ||||
|  | ||||
|         if os.getenv("HEADLESS", "false") == "true": | ||||
|             options.add_argument('--headless') | ||||
|         else: | ||||
|             self.logger.warning("Opening browser GUI because of 'HEADLESS=true'") | ||||
|  | ||||
|         # Print to pdf | ||||
|         options.set_preference("print_printer", "Mozilla Save to PDF") | ||||
|         options.set_preference("print.always_print_silent", True) | ||||
|         options.set_preference("print.show_print_progress", False) | ||||
|         options.set_preference('print.save_as_pdf.links.enabled', True) | ||||
|  | ||||
|         # Just save if the filetype is pdf already, does not work! | ||||
|  | ||||
|         options.set_preference("print.printer_Mozilla_Save_to_PDF.print_to_file", True) | ||||
|         options.set_preference("browser.download.folderList", 2) | ||||
|         # options.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") | ||||
|         # options.set_preference("pdfjs.disabled", True) | ||||
|         options.set_preference("browser.download.dir", config["default_download_path"]) | ||||
|  | ||||
|         self.logger.info("Now Starting gecko driver") | ||||
|         self.driver = webdriver.Firefox(options=options) | ||||
|         self.logger.info("Starting gecko driver") | ||||
|         self.driver = webdriver.Firefox( | ||||
|             options = options, | ||||
|             service = webdriver.firefox.service.Service( | ||||
|                 log_path = f'{config["local_storage_path"]}/geckodriver.log' | ||||
|         )) | ||||
|          | ||||
|         residues = os.listdir(config["default_download_path"]) | ||||
|         for res in residues: | ||||
| @@ -54,6 +59,7 @@ class PDFDownloader: | ||||
|             self.start() # relaunch the dl util     | ||||
|  | ||||
|     def finish(self): | ||||
|         self.logger.info("Exiting gecko driver") | ||||
|         self.driver.quit() | ||||
|         self.running = False | ||||
|  | ||||
|   | ||||
							
								
								
									
										36
									
								
								docker-compose.yaml
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										36
									
								
								docker-compose.yaml
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,36 @@ | ||||
| # docker compose --env-file env/debug up  | ||||
|  | ||||
|  | ||||
| version: "3.9" | ||||
| services: | ||||
|   auto_news: | ||||
|     build: . | ||||
|     volumes: | ||||
|       - ${CONTAINER_DATA}:/app/file_storage | ||||
|       - ${HOSTS_FILE}:/etc/hosts | ||||
|  | ||||
|       - ${CODE:-/dev/null}:/code # not set in prod, defaults to /dev/null | ||||
|       - ${XAUTHORITY-/dev/null}:/home/auto_news/.Xauthority | ||||
|     network_mode: host | ||||
|     environment: | ||||
|       - DISPLAY=$DISPLAY | ||||
|       - DEBUG=${DEBUG} | ||||
|       - CHECK=${CHECK} | ||||
|       - UPLOAD=${UPLOAD} | ||||
|       - HEADLESS=${HEADLESS} | ||||
|       - REDUCEDFETCH=${REDUCEDFETCH} | ||||
|  | ||||
|     entrypoint: ${ENTRYPOINT:-"python3 runner.py"} # by default launch workers as defined in the Dockerfile | ||||
|  | ||||
|   # geckodriver: | ||||
|   #   image: selenium/standalone-firefox:100.0 | ||||
|   #   volumes:  | ||||
|   #      | ||||
|   #     - ${CONTAINER_DATA-/dev/null}:/app/file_storage | ||||
|   #     - ${FIREFOX_PROFILE}:/auto_news.profile | ||||
|   #     - ${HOSTS_FILE}:/etc/hosts | ||||
|   #   environment: | ||||
|   #     - DISPLAY=$DISPLAY | ||||
|   #     - START_XVFB=false | ||||
|  | ||||
|   #   network_mode: host | ||||
							
								
								
									
										12
									
								
								env/check
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										12
									
								
								env/check
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @@ -0,0 +1,12 @@ | ||||
| # Does not run any downloads but displays the previously downloaded but not yet checked files. Requires display-acces via xauth | ||||
|  | ||||
| CONTAINER_DATA=/mnt/Data/COSS/Downloads/auto_news.container | ||||
| HOSTS_FILE=/mnt/Data/COSS/Downloads/auto_news.container/dependencies/hosts | ||||
|  | ||||
| XAUTHORTIY=$XAUTHORTIY | ||||
|  | ||||
| DEBUG=false | ||||
| CHECK=true | ||||
| HEADLESS=true | ||||
| UPLOAD=false | ||||
| REDUCEDFETCH=false | ||||
							
								
								
									
										15
									
								
								env/debug
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										15
									
								
								env/debug
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @@ -0,0 +1,15 @@ | ||||
| # Runs in a debugging mode, does not launch anything at all but starts a bash process | ||||
|  | ||||
| CONTAINER_DATA=/mnt/Data/COSS/Downloads/auto_news.container | ||||
| HOSTS_FILE=/mnt/Data/COSS/Downloads/auto_news.container/dependencies/hosts | ||||
|  | ||||
| CODE=./ | ||||
| XAUTHORTIY=$XAUTHORTIY | ||||
|  | ||||
| DEBUG=true | ||||
| CHECK=false | ||||
| UPLOAD=false | ||||
| HEADLESS=false | ||||
| REDUCEDFETCH=false | ||||
|  | ||||
| ENTRYPOINT="sleep infinity" | ||||
							
								
								
									
										10
									
								
								env/production
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										10
									
								
								env/production
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @@ -0,0 +1,10 @@ | ||||
| # Runs on the main slack channel with the full worker setup. If nothing funky has occured, reducedfetch is a speedup | ||||
|  | ||||
| CONTAINER_DATA=/mnt/Data/Downloads/auto_news.container | ||||
| HOSTS_FILE=/mnt/Data/COSS/Downloads/auto_news.container/dependencies/hosts | ||||
|  | ||||
| DEBUG=false | ||||
| CHECK=false | ||||
| UPLOAD=false | ||||
| HEADLESS=true | ||||
| REDUCEDFETCH=true | ||||
							
								
								
									
										11
									
								
								env/upload
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										11
									
								
								env/upload
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @@ -0,0 +1,11 @@ | ||||
| # Does not run any other workers and only upploads to archive the urls that weren't previously uploaded | ||||
|  | ||||
| CONTAINER_DATA=/mnt/Data/COSS/Downloads/auto_news.container | ||||
| HOSTS_FILE=/mnt/Data/COSS/Downloads/auto_news.container/dependencies/hosts | ||||
|  | ||||
|  | ||||
| DEBUG=false | ||||
| CHECK=false | ||||
| UPLOAD=true | ||||
| HEADLESS=true | ||||
| REDUCEDFETCH=false | ||||
		Reference in New Issue
	
	Block a user
	 Remy Moll
					Remy Moll