fahr package

Submodules

fahr.cli module

fahr.fahr module

class fahr.fahr.TrainJob(filepath=None, job_name=None, train_image='default-cpu', train_driver='sagemaker', envfile=None, overwrite=False, config=None)[source]

Bases: object

build()[source]

Builds the model training image. This is the first step in the model training workflow.

fetch(local_path, extract=True)[source]

Extracts the model artifacts generated by the training job to path. This is the last step in the model training workflow.

This method is a convenience wrapper over the fetch static method. Use this method to fetch model artifacts generated by executing train on the current TrainingJob instance. Use the static method to fetch model artifacts from other runs.

Parameters:
  • local_path (str or pathlib.Path) – Directory to write the model artifact to.
  • extract (bool, default True) – Whether or not to untar the data on arrival.
Raises:
  • ValueError – Raised if you attempt to fetch without first running fit, or if the job
  • is not finished running yet.
fit()[source]

Executes a model training job, generating a model training artifact in a local repository.

This is a convenience method that combines the first three steps of the model training workflow: building the model training image, pushing it to a remote repository, and executing it.

Parameters:path (str or pathlib.Path) – Directory to write the model artifact to.
classmethod from_model_definition(filepath, train_image='default-cpu', train_driver='sagemaker', envfile=None, overwrite=False, config=None)[source]
classmethod from_training_job(job_name, train_driver='sagemaker', config=None)[source]
push()[source]

Pushes the model training image to a remote image repository. This is the second step in the model training workflow, as it is a necessary prerequesite to executing the model training image. The repository used depends on the train_driver:

  • ‘sagemaker’ – Pushes the image to Amazon ECR.
  • ‘kaggle’ – Does nothing, only the default runtime environment is available, so no registry is necessary.
status()[source]

Returns the status of this job instance. May be one of the following: * unlaunched — This job has not been launched (sent to compute using fit) yet. * submitted — This job has been submitted to cloud compute. It has not terminated yet. * complete — This job has finished running, and you may call fetch to get your model. * failed — This job has failed. Use the training driver’s web tools to determine why.

train()[source]

Launches a remote training job. The third and most critical step in the model training workflow. Where the job is launched depends on the train_driver:

  • sagemaker – The job is run on an EC2 machine via the AWS SageMaker API.
  • kaggle – The job is run in a Kaggle Kernel via the Kaggle API.
fahr.fahr.copy_resources(src, dest, overwrite=True, training_artifact=None)[source]

Copies training file resources from src to dest.

Parameters:
  • src (str) – The source directory.
  • dest (str) – The target directory.
  • overwrite (bool, default True) – Whether or not to overwrite existing resources in the target directory.
  • training_artifact (bool, default False) – Whether or not to include the training artifact in the list of files copied.

Module contents

class fahr.TrainJob(filepath=None, job_name=None, train_image='default-cpu', train_driver='sagemaker', envfile=None, overwrite=False, config=None)[source]

Bases: object

build()[source]

Builds the model training image. This is the first step in the model training workflow.

fetch(local_path, extract=True)[source]

Extracts the model artifacts generated by the training job to path. This is the last step in the model training workflow.

This method is a convenience wrapper over the fetch static method. Use this method to fetch model artifacts generated by executing train on the current TrainingJob instance. Use the static method to fetch model artifacts from other runs.

Parameters:
  • local_path (str or pathlib.Path) – Directory to write the model artifact to.
  • extract (bool, default True) – Whether or not to untar the data on arrival.
Raises:
  • ValueError – Raised if you attempt to fetch without first running fit, or if the job
  • is not finished running yet.
fit()[source]

Executes a model training job, generating a model training artifact in a local repository.

This is a convenience method that combines the first three steps of the model training workflow: building the model training image, pushing it to a remote repository, and executing it.

Parameters:path (str or pathlib.Path) – Directory to write the model artifact to.
classmethod from_model_definition(filepath, train_image='default-cpu', train_driver='sagemaker', envfile=None, overwrite=False, config=None)[source]
classmethod from_training_job(job_name, train_driver='sagemaker', config=None)[source]
push()[source]

Pushes the model training image to a remote image repository. This is the second step in the model training workflow, as it is a necessary prerequesite to executing the model training image. The repository used depends on the train_driver:

  • ‘sagemaker’ – Pushes the image to Amazon ECR.
  • ‘kaggle’ – Does nothing, only the default runtime environment is available, so no registry is necessary.
status()[source]

Returns the status of this job instance. May be one of the following: * unlaunched — This job has not been launched (sent to compute using fit) yet. * submitted — This job has been submitted to cloud compute. It has not terminated yet. * complete — This job has finished running, and you may call fetch to get your model. * failed — This job has failed. Use the training driver’s web tools to determine why.

train()[source]

Launches a remote training job. The third and most critical step in the model training workflow. Where the job is launched depends on the train_driver:

  • sagemaker – The job is run on an EC2 machine via the AWS SageMaker API.
  • kaggle – The job is run in a Kaggle Kernel via the Kaggle API.
fahr.copy_resources(src, dest, overwrite=True, training_artifact=None)[source]

Copies training file resources from src to dest.

Parameters:
  • src (str) – The source directory.
  • dest (str) – The target directory.
  • overwrite (bool, default True) – Whether or not to overwrite existing resources in the target directory.
  • training_artifact (bool, default False) – Whether or not to include the training artifact in the list of files copied.