Support for PythonOperatorPerArtifact flavor in Airflow DAG file
#758
yoonspark
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Our team has been working on supporting new graph refactor for pipeline file generation such that duplicate code blocks are factored out to reduce redundant computations (which might be expensive). Our previous implementation (relevant discussion here) was limited in that it produced Airflow DAG file that has operators (i.e., tasks) at the session level: each session may contain multiple artifacts and we often want individual artifact-level control/observability, which demands operators/tasks at the artifact level. This is what we recently worked on, and we've merged a PR implementing
PythonOperatorPerArtifactAirflow DAG flavor (previously supportingPythonOperatorPerSessiononly).For instance, this is an example of an Airflow DAG file under the
PythonOperatorPerSessionflavor:As you can see, all artifacts are "wrapped" into one single operator, which reduces tracking and control over each artifact. Contrast this with the following equivalent version that gets generated under the newly available
PythonOperatorPerArtifactflavor:It's longer code but it now opens room for engineers to peak and control at the artifact level, which allows for customization.
Note that this update is not yet visible to the user since the new graph refactor project has not yet been exposed to the public API. We are wrapping up remaining tasks for the final "replacement" though, so stay tuned!
Reference
Beta Was this translation helpful? Give feedback.
All reactions