Airflow logging

To Nha Notes | Nov. 19, 2022, 11:02 a.m.

Log locations

By default, Airflow outputs logs to the base_log_folder configured in airflow.cfg, which is located in your $AIRFLOW_HOME directory.

Local Airflow environment

If you run Airflow locally, logging information is accessible in the following locations:

  • Scheduler: Logs are printed to the console and accessible in $AIRFLOW_HOME/logs/scheduler.
  • Webserver and Triggerer: Logs are printed to the console.
  • Task: Logs can be viewed in the Airflow UI or at $AIRFLOW_HOME/logs/.
  • Metadata database: Logs are handled differently depending on which database you use.
Remote logging

When scaling your Airflow environment you might produce more logs than your Airflow environment can store. In this case, you need reliable, resilient, and auto-scaling storage. The easiest solution is to use remote logging to a remote service which is already supported by the following community-managed providers:

  • Alibaba: OSSTaskHandler (oss://)
  • Amazon: S3TaskHandler (s3://), CloudwatchTaskHandler (cloudwatch://)
  • Elasticsearch: ElasticsearchTaskHandler (further configured with elasticsearch in airflow.cfg)
  • Google: GCSTaskHandler (gs://), StackdriverTaskHandler (stackdriver://)
  • Microsoft Azure: WasbTaskHandler (wasb)
Remote logging example: Send task logs to Amazon S3

# allow remote logging and provide a connection ID (see step 2)
ENV AIRFLOW__LOGGING__REMOTE_LOGGING=True
ENV AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=${AMAZONS3_CON_ID}

# specify the location of your remote logs using your bucket name
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://${S3BUCKET_NAME}/logs

# optional: serverside encryption for S3 logs
ENV AIRFLOW__LOGGING__ENCRYPT_S3_LOGS=True

These environment variables configure remote logging to one S3 bucket (S3BUCKET_NAME). Behind the scenes, Airflow uses these configurations to create an S3TaskHandler which overrides the default FileTaskHandler.

Advanced configuration example: Add multiple handlers to the same logger

#### Remote logging to S3

Within log_config.py, create and modify a deepcopy of DEFAULT_LOGGING_CONFIG as follows:

from copy import deepcopy
import os

# import the default logging configuration
from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG

LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)

# add an additional handler
LOGGING_CONFIG['handlers']['secondary_s3_task_handler'] = {
    # you can import your own custom handler here
    'class': 'airflow.providers.amazon.aws.log.s3_task_handler.S3TaskHandler',
    # you can add a custom formatter here
    'formatter': 'airflow',
    # the following env variables were set in the dockerfile
    'base_log_folder': os.environ['BASE_LOG_FOLDER'],
    's3_log_folder': os.environ['AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER_2'],
    'filename_template':
        # providing a custom structure for log directory and filename
        "/__.log",
    # if needed, custom filters can be added here
    "filters":[
    "mask_secrets"
    ]
}

# this line adds the "secondary_s3_task_handler" as a handler to airflow.task
LOGGING_CONFIG['loggers']['airflow.task']['handlers'] = ["task",
                                                  "secondary_s3_task_handler"]

# Define the base log folder
ENV BASE_LOG_FOLDER=/usr/local/airflow/logs

# create a directory for your custom log_config.py file and copy it
ENV PYTHONPATH=/usr/local/airflow
RUN mkdir $PYTHONPATH/config
COPY include/log_config.py $PYTHONPATH/config/
RUN touch $PYTHONPATH/config/__init__.py

# allow remote logging and provide a connection ID (the one you specified in step 2)
ENV AIRFLOW__LOGGING__REMOTE_LOGGING=True
ENV AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=hook_tutorial_s3_conn

# specify the location of your remote logs, make sure to provide your bucket names above
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://${S3BUCKET_NAME}/logs
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER_2=s3://${S3BUCKET_NAME_2}/logs

# set the new logging configuration as logging config class
ENV AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS=config.log_config.LOGGING_CONFIG

# optional: serverside encryption for S3 logs
ENV AIRFLOW__LOGGING__ENCRYPT_S3_LOGS=True

 

References

https://docs.astronomer.io/learn/logging

https://blog.beachgeek.co.uk/monitoring-and-logging-with-amazon-managed-workflows-for-apache-airflow/