airscooter package¶
Submodules¶
airscooter.cli module¶
airscooter.depositor module¶
-
class
airscooter.depositor.
Depositor
(name, filename, output, dummy=False)[source]¶ Bases:
object
A Depositor is (ontologically) a data dumping task that takes data from somewhere and places it onto the local machine.
-
as_airflow_string
()[source]¶ Returns this object’s Airflow string, ready to be written to a Python DAG file. This method is used at the orchestration layer for instantiating the Airflow DAG. However, it is incomplete, because it does not set any task dependencies, which are handled separately in orchestration.
-
airscooter.orchestration module¶
Orchestration layer.
-
airscooter.orchestration.
configure
(localize=True, local_folder='.airflow', init=False)[source]¶ Configures Airflow for use within Airscooter.
- localize: bool, default True
By default, overwrite the AIRFLOW_HOME environment variable and the values set in the airflow.cfg file to point at a local “.airflow” folder. This is desirable behavior for maintaining several separable DAGs on one machine, with each DAG corresponding with a single directory, and thus a single project or git repository thereof.
If localize is set to False, airscooter will inherit the current global Airflow settings. This is the vanilla behavior, and may be preferable in advanced circumstances (which ones TBD).
- local_folder: str, default “.airflow”
- The name of the local folder that the DAG gets written to. Airscooter configures Airflow to work against this folder.
- init: bool, default False
- Whether or not to initialize the database.
-
airscooter.orchestration.
create_airflow_string
(tasks)[source]¶ Given a task graph (as a list of tasks), generates an Airflow DAG file.
- tasks: list of {Transform, Depositor} objects, required
- The tasks constituting the task graph to be written as a DAG.
The Airflow DAG as a string, ready to be written to the file.
-
airscooter.orchestration.
deserialize
(yml_data)[source]¶ Given a task graph YAML serialization, returns the list of airscooter objects (Transform and Depositor objects) making up this task graph.
- yml_data: str, required
- The YAML representation being deserialized.
The resultant airscooter task list.
-
airscooter.orchestration.
deserialize_from_file
(yml_filename)[source]¶ Given a task graph YAML serialization, returns the constituent list of airscooter task graph objects. I/O wrapper for deserialize.
- yml_filename: str, required
- The name of the file the data will be read in from.
The resultant airscooter task list.
-
airscooter.orchestration.
run
()[source]¶ Runs a DAG.
- pin, str or None, default None
- If this parameter is None, run the task with this name in the mode set by the run_mode parameter.
- run_mode, {‘forward’, ‘backwards’, ‘all’}, default None
- If pin is None, do nothing. Otherwise, do the following. If set to ‘forward’, run this task and all tasks that come after it, stubbing out any unfulfilled requirements with dummy operators. If set to ‘backwards’, run this task and any tasks that come before, again with stubs. If set to ‘all’, run all dependent and expectant tasks, again with stubs. In this case, if this parameter is left as None, raises a ValueError.
-
airscooter.orchestration.
serialize_tasks
(tasks)[source]¶ Transforms a list of tasks into a YAML serialization thereof.
- tasks: list, required
- A list of tasks.
- yml_repr, str
- A YAML serialization thereof.
-
airscooter.orchestration.
serialize_to_file
(tasks, yml_filename)[source]¶ Given a list of tasks, writes a simplified YAML serialization thereof to a file. This method enables task graph persistence: at the CLI level, additional tasks getting written to the graph check and write to this data to maintain a consistent state.
- tasks: list, required
- A list of tasks.
- yml_filename: str, required
- The filename to which the YAML representation will be written.
-
airscooter.orchestration.
write_airflow_string
(tasks, filename)[source]¶ Writes the Airflow DAG file for the given tasks. I/O wrapper for create_airflow_string.
- tasks: list of {Transform, Depositor} objects, required
- The tasks constituting the task graph to be written as a DAG.
- filename: str, required
- The filename the DAG will be written to.
airscooter.transform module¶
-
class
airscooter.transform.
Transform
(name, filename, input, output, requirements=None, dummy=False)[source]¶ Bases:
object
A Transform is a data manipulation task that “transforms” input data into output data.
-
as_airflow_string
()[source]¶ Returns this object’s Airflow string, ready to be written to a Python DAG file. This method is used at the orchestration layer for instantiating the Airflow DAG. However, it is incomplete, because it does not set any task dependencies, which are handled separately in orchestration.
-