To Nha Notes | Nov. 19, 2022, 11:02 a.m.
By default, Airflow outputs logs to the base_log_folder configured in airflow.cfg, which is located in your $AIRFLOW_HOME directory.
If you run Airflow locally, logging information is accessible in the following locations:
When scaling your Airflow environment you might produce more logs than your Airflow environment can store. In this case, you need reliable, resilient, and auto-scaling storage. The easiest solution is to use remote logging to a remote service which is already supported by the following community-managed providers:
# allow remote logging and provide a connection ID (see step 2)
ENV AIRFLOW__LOGGING__REMOTE_LOGGING=True
ENV AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=${AMAZONS3_CON_ID}
# specify the location of your remote logs using your bucket name
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://${S3BUCKET_NAME}/logs
# optional: serverside encryption for S3 logs
ENV AIRFLOW__LOGGING__ENCRYPT_S3_LOGS=True
These environment variables configure remote logging to one S3 bucket (S3BUCKET_NAME). Behind the scenes, Airflow uses these configurations to create an S3TaskHandler which overrides the default FileTaskHandler.
#### Remote logging to S3
Within log_config.py, create and modify a deepcopy of DEFAULT_LOGGING_CONFIG as follows:
from copy import deepcopy
import os
# import the default logging configuration
from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG
LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
# add an additional handler
LOGGING_CONFIG['handlers']['secondary_s3_task_handler'] = {
# you can import your own custom handler here
'class': 'airflow.providers.amazon.aws.log.s3_task_handler.S3TaskHandler',
# you can add a custom formatter here
'formatter': 'airflow',
# the following env variables were set in the dockerfile
'base_log_folder': os.environ['BASE_LOG_FOLDER'],
's3_log_folder': os.environ['AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER_2'],
'filename_template':
# providing a custom structure for log directory and filename
"/__.log",
# if needed, custom filters can be added here
"filters":[
"mask_secrets"
]
}
# this line adds the "secondary_s3_task_handler" as a handler to airflow.task
LOGGING_CONFIG['loggers']['airflow.task']['handlers'] = ["task",
"secondary_s3_task_handler"]
# Define the base log folder
ENV BASE_LOG_FOLDER=/usr/local/airflow/logs
# create a directory for your custom log_config.py file and copy it
ENV PYTHONPATH=/usr/local/airflow
RUN mkdir $PYTHONPATH/config
COPY include/log_config.py $PYTHONPATH/config/
RUN touch $PYTHONPATH/config/__init__.py
# allow remote logging and provide a connection ID (the one you specified in step 2)
ENV AIRFLOW__LOGGING__REMOTE_LOGGING=True
ENV AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=hook_tutorial_s3_conn
# specify the location of your remote logs, make sure to provide your bucket names above
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://${S3BUCKET_NAME}/logs
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER_2=s3://${S3BUCKET_NAME_2}/logs
# set the new logging configuration as logging config class
ENV AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS=config.log_config.LOGGING_CONFIG
# optional: serverside encryption for S3 logs
ENV AIRFLOW__LOGGING__ENCRYPT_S3_LOGS=True