airscooter package

Submodules

airscooter.cli module

airscooter.depositor module

class airscooter.depositor.Depositor(name, filename, output, dummy=False)[source]

Bases: object

A Depositor is (ontologically) a data dumping task that takes data from somewhere and places it onto the local machine.

as_airflow_string()[source]

Returns this object’s Airflow string, ready to be written to a Python DAG file. This method is used at the orchestration layer for instantiating the Airflow DAG. However, it is incomplete, because it does not set any task dependencies, which are handled separately in orchestration.

datafy()[source]

Returns this object’s simplified JSON representation. This is used for serialization to YAML, which is used in turn to allow object persistence.

airscooter.orchestration module

Orchestration layer.

airscooter.orchestration.configure(localize=True, local_folder='.airflow', init=False)[source]

Configures Airflow for use within Airscooter.

localize: bool, default True

By default, overwrite the AIRFLOW_HOME environment variable and the values set in the airflow.cfg file to point at a local “.airflow” folder. This is desirable behavior for maintaining several separable DAGs on one machine, with each DAG corresponding with a single directory, and thus a single project or git repository thereof.

If localize is set to False, airscooter will inherit the current global Airflow settings. This is the vanilla behavior, and may be preferable in advanced circumstances (which ones TBD).

local_folder: str, default “.airflow”
The name of the local folder that the DAG gets written to. Airscooter configures Airflow to work against this folder.
init: bool, default False
Whether or not to initialize the database.
airscooter.orchestration.create_airflow_string(tasks)[source]

Given a task graph (as a list of tasks), generates an Airflow DAG file.

tasks: list of {Transform, Depositor} objects, required
The tasks constituting the task graph to be written as a DAG.

The Airflow DAG as a string, ready to be written to the file.

airscooter.orchestration.deserialize(yml_data)[source]

Given a task graph YAML serialization, returns the list of airscooter objects (Transform and Depositor objects) making up this task graph.

yml_data: str, required
The YAML representation being deserialized.

The resultant airscooter task list.

airscooter.orchestration.deserialize_from_file(yml_filename)[source]

Given a task graph YAML serialization, returns the constituent list of airscooter task graph objects. I/O wrapper for deserialize.

yml_filename: str, required
The name of the file the data will be read in from.

The resultant airscooter task list.

airscooter.orchestration.run()[source]

Runs a DAG.

pin, str or None, default None
If this parameter is None, run the task with this name in the mode set by the run_mode parameter.
run_mode, {‘forward’, ‘backwards’, ‘all’}, default None
If pin is None, do nothing. Otherwise, do the following. If set to ‘forward’, run this task and all tasks that come after it, stubbing out any unfulfilled requirements with dummy operators. If set to ‘backwards’, run this task and any tasks that come before, again with stubs. If set to ‘all’, run all dependent and expectant tasks, again with stubs. In this case, if this parameter is left as None, raises a ValueError.
airscooter.orchestration.serialize_tasks(tasks)[source]

Transforms a list of tasks into a YAML serialization thereof.

tasks: list, required
A list of tasks.
yml_repr, str
A YAML serialization thereof.
airscooter.orchestration.serialize_to_file(tasks, yml_filename)[source]

Given a list of tasks, writes a simplified YAML serialization thereof to a file. This method enables task graph persistence: at the CLI level, additional tasks getting written to the graph check and write to this data to maintain a consistent state.

tasks: list, required
A list of tasks.
yml_filename: str, required
The filename to which the YAML representation will be written.
airscooter.orchestration.write_airflow_string(tasks, filename)[source]

Writes the Airflow DAG file for the given tasks. I/O wrapper for create_airflow_string.

tasks: list of {Transform, Depositor} objects, required
The tasks constituting the task graph to be written as a DAG.
filename: str, required
The filename the DAG will be written to.

airscooter.transform module

class airscooter.transform.Transform(name, filename, input, output, requirements=None, dummy=False)[source]

Bases: object

A Transform is a data manipulation task that “transforms” input data into output data.

as_airflow_string()[source]

Returns this object’s Airflow string, ready to be written to a Python DAG file. This method is used at the orchestration layer for instantiating the Airflow DAG. However, it is incomplete, because it does not set any task dependencies, which are handled separately in orchestration.

datafy()[source]

Returns this object’s simplified JSON representation. This is used for serialization to YAML, which is used in turn to allow object persistence.

Module contents