Running

Train

class farm.train.EarlyStopping(head=0, metric='loss', save_dir=None, mode='min', patience=0, min_delta=0.001, min_evals=0)[source]

Bases: object

Can be used to control early stopping with a Trainer class. Any object can be used instead which implements the method check_stopping and and provides the attribute save_dir

__init__(head=0, metric='loss', save_dir=None, mode='min', patience=0, min_delta=0.001, min_evals=0)[source]
Parameters
  • head – the prediction head referenced by the metric.

  • save_dir – the directory where to save the final best model, if None, no saving.

  • metric – name of dev set metric to monitor (default: loss) to get extracted from the 0th head or a function that extracts a value from the trainer dev evaluation result. NOTE: this is different from the metric to get specified for the processor which defines how to calculate one or more evaluation matric values from prediction/target sets, while this specifies the name of one particular such metric value or a method to calculate that value from the result returned from a processor metric.

  • mode – “min” or “max”

  • patience – how many evaluations to wait after the best evaluation to stop

  • min_delta – minimum difference to a previous best value to count as an improvement.

  • min_evals – minimum number of evaluations to wait before using eval value

check_stopping(eval_result)[source]

Provide the evaluation value for the current evaluation. Returns true if stopping should occur. This will save the model, if necessary.

Parameters

eval – the current evaluation result

Returns

a tuple (stopprocessing, savemodel, evalvalue) indicating if processing should be stopped and if the current model should get saved and the evaluation value used.

class farm.train.Trainer(model, optimizer, data_silo, epochs, n_gpu, device, lr_schedule=None, evaluate_every=100, eval_report=True, use_amp=None, grad_acc_steps=1, local_rank=-1, early_stopping=None, log_learning_rate=False, log_loss_every=10, checkpoint_on_sigterm=False, checkpoint_every=None, checkpoint_root_dir=None, checkpoints_to_keep=3, from_epoch=0, from_step=0, global_step=0, evaluator_test=True, disable_tqdm=False, max_grad_norm=1.0)[source]

Bases: object

Handles the main model training procedure. This includes performing evaluation on the dev set at regular intervals during training as well as evaluation on the test set at the end of training.

__init__(model, optimizer, data_silo, epochs, n_gpu, device, lr_schedule=None, evaluate_every=100, eval_report=True, use_amp=None, grad_acc_steps=1, local_rank=-1, early_stopping=None, log_learning_rate=False, log_loss_every=10, checkpoint_on_sigterm=False, checkpoint_every=None, checkpoint_root_dir=None, checkpoints_to_keep=3, from_epoch=0, from_step=0, global_step=0, evaluator_test=True, disable_tqdm=False, max_grad_norm=1.0)[source]
Parameters
  • optimizer – An optimizer object that determines the learning strategy to be used during training

  • data_silo (DataSilo) – A DataSilo object that will contain the train, dev and test datasets as PyTorch DataLoaders

  • epochs (int) – How many times the training procedure will loop through the train dataset

  • n_gpu (int) – The number of gpus available for training and evaluation.

  • device – The device on which the train, dev and test tensors should be hosted. Choose from “cpu” and “cuda”.

  • lr_schedule – An optional scheduler object that can regulate the learning rate of the optimizer

  • evaluate_every (int) – Perform dev set evaluation after this many steps of training.

  • eval_report (bool) – If evaluate_every is not 0, specifies if an eval report should be generated when evaluating

  • use_amp (str) – Whether to use automatic mixed precision with Apex. One of the optimization levels must be chosen. “O1” is recommended in almost all cases.

  • grad_acc_steps (int) – Number of training steps for which the gradients should be accumulated. Useful to achieve larger effective batch sizes that would not fit in GPU memory.

  • local_rank (int) – Local rank of process when distributed training via DDP is used.

  • early_stopping (EarlyStopping) – an initialized EarlyStopping object to control early stopping and saving of best models.

  • log_learning_rate (bool) – Whether to log learning rate to Mlflow

  • log_loss_every (int) – Log current train loss after this many train steps.

  • checkpoint_on_sigterm (bool) – save a checkpoint for the Trainer when a SIGTERM signal is sent. The checkpoint can be used to resume training. It is useful in frameworks like AWS SageMaker with Spot instances where a SIGTERM notifies to save the training state and subsequently the instance is terminated.

  • checkpoint_every (int) – save a train checkpoint after this many steps of training.

  • checkpoint_root_dir (Path) – the Path of directory where all train checkpoints are saved. For each individual checkpoint, a subdirectory with the name epoch_{epoch_num}_step_{step_num} is created.

  • checkpoints_to_keep (int) – maximum number of train checkpoints to save.

  • from_epoch (int) – the epoch number to start the training from. In the case when training resumes from a saved checkpoint, it is used to fast-forward training to the last epoch in the checkpoint.

  • from_step (int) – the step number to start the training from. In the case when training resumes from a saved checkpoint, it is used to fast-forward training to the last step in the checkpoint.

  • global_step (int) – the global step number across the training epochs.

  • evaluator_test (bool) – whether to perform evaluation on the test set

  • disable_tqdm (bool) – Disable tqdm progress bar (helps to reduce verbosity in some environments)

  • max_grad_norm (float) – Max gradient norm for clipping, default 1.0, set to None to disable

train()[source]

Perform the training procedure.

The training is visualized by a progress bar. It counts the epochs in a zero based manner. For example, when you specify epochs=20 it starts to count from 0 to 19.

If trainer evaluates the model with a test set the result of the evaluation is stored in test_result.

Returns

Returns the model after training. When you do early_stopping with a save_dir the best model is loaded and returned.

backward_propagate(loss, step)[source]
adjust_loss(loss)[source]
log_params()[source]
classmethod create_or_load_checkpoint(data_silo, checkpoint_root_dir, model, optimizer, local_rank=-1, resume_from_checkpoint='latest', **kwargs)[source]

Try loading a saved Trainer checkpoint. If no checkpoint found, it creates a new instance of Trainer.

Parameters
  • data_silo (DataSilo) – A DataSilo object that will contain the train, dev and test datasets as PyTorch DataLoaders

  • checkpoint_root_dir (Path) – Path of the directory where all train checkpoints are saved. Each individual checkpoint is stored in a sub-directory under it.

  • resume_from_checkpoint (str) – the checkpoint name to start training from, e.g., “epoch_1_step_4532”. It defaults to “latest”, using the checkpoint with the highest train steps.

Eval

class farm.eval.Evaluator(data_loader, tasks, device, report=True)[source]

Bases: object

Handles evaluation of a given model over a specified dataset.

__init__(data_loader, tasks, device, report=True)[source]
Parameters
  • data_loader (DataLoader) – The PyTorch DataLoader that will return batches of data from the evaluation dataset

  • label_maps

  • device – The device on which the tensors should be processed. Choose from “cpu” and “cuda”.

  • metrics – The list of metrics which need to be computed, one for each prediction head.

  • metrics – list

  • report (bool) – Whether an eval report should be generated (e.g. classification report per class).

eval(model, return_preds_and_labels=False)[source]

Performs evaluation on a given model.

Parameters
  • model (AdaptiveModel) – The model on which to perform evaluation

  • return_preds_and_labels (bool) – Whether to add preds and labels in the returned dicts of the

Return all_results

A list of dictionaries, one for each prediction head. Each dictionary contains the metrics and reports generated during evaluation.

Rtype all_results

list of dicts

static log_results(results, dataset_name, steps, logging=True, print=True, num_fold=None)[source]

Infer

class farm.infer.Inferencer(model, processor, task_type, batch_size=4, gpu=False, name=None, return_class_probs=False, extraction_strategy=None, extraction_layer=None, s3e_stats=None, num_processes=None, disable_tqdm=False, benchmarking=False, dummy_ph=False)[source]

Bases: object

Loads a saved AdaptiveModel/ONNXAdaptiveModel from disk and runs it in inference mode. Can be used for a model with prediction head (down-stream predictions) and without (using LM as embedder).

Example usage:

# down-stream inference
basic_texts = [
    {"text": "Schartau sagte dem Tagesspiegel, dass Fischer ein Idiot sei"},
    {"text": "Martin Müller spielt Handball in Berlin"},
]
model = Inferencer.load(your_model_dir)
model.inference_from_dicts(dicts=basic_texts)
# LM embeddings
model = Inferencer.load(your_model_dir, extraction_strategy="cls_token", extraction_layer=-1)
model.inference_from_dicts(dicts=basic_texts)
__init__(model, processor, task_type, batch_size=4, gpu=False, name=None, return_class_probs=False, extraction_strategy=None, extraction_layer=None, s3e_stats=None, num_processes=None, disable_tqdm=False, benchmarking=False, dummy_ph=False)[source]

Initializes Inferencer from an AdaptiveModel and a Processor instance.

Parameters
  • model (AdaptiveModel) – AdaptiveModel to run in inference mode

  • processor (Processor) – A dataset specific Processor object which will turn input (file or dict) into a Pytorch Dataset.

  • task_type – Type of task the model should be used for. Currently supporting: “embeddings”, “question_answering”, “text_classification”, “ner”. More coming soon…

  • task_type – str

  • batch_size (int) – Number of samples computed once per batch

  • gpu (bool) – If GPU shall be used

  • name (string) – Name for the current Inferencer model, displayed in the REST API

  • return_class_probs (bool) – either return probability distribution over all labels or the prob of the associated label

  • extraction_strategy (str) – Strategy to extract vectors. Choices: ‘cls_token’ (sentence vector), ‘reduce_mean’ (sentence vector), reduce_max (sentence vector), ‘per_token’ (individual token vectors), ‘s3e’ (sentence vector via S3E pooling, see https://arxiv.org/abs/2002.09620)

  • extraction_layer (int) – number of layer from which the embeddings shall be extracted. Default: -1 (very last layer).

  • s3e_stats (dict) – Stats of a fitted S3E model as returned by fit_s3e_on_corpus() (only needed for task_type=”embeddings” and extraction_strategy = “s3e”)

  • num_processes (int) – the number of processes for multiprocessing.Pool. Set to value of 1 (or 0) to disable multiprocessing. Set to None to let Inferencer use all CPU cores minus one. If you want to debug the Language Model, you might need to disable multiprocessing! Warning! If you use multiprocessing you have to close the multiprocessing.Pool again! To do so call close_multiprocessing_pool() after you are done using this class. The garbage collector will not do this for you!

  • disable_tqdm (bool) – Whether to disable tqdm logging (can get very verbose in multiprocessing)

  • dummy_ph (bool) – If True, methods of the prediction head will be replaced with a dummy method. This is used to isolate lm run time from ph run time.

  • benchmarking (bool) – If True, a benchmarking object will be initialised within the class and certain parts of the code will be timed for benchmarking. Should be kept False if not benchmarking since these timing checkpoints require synchronization of the asynchronous Pytorch operations and may slow down the model.

Returns

An instance of the Inferencer.

classmethod load(model_name_or_path, revision=None, batch_size=4, gpu=False, task_type=None, return_class_probs=False, strict=True, max_seq_len=256, doc_stride=128, extraction_layer=None, extraction_strategy=None, s3e_stats=None, num_processes=None, disable_tqdm=False, tokenizer_class=None, use_fast=True, tokenizer_args=None, multithreading_rust=True, dummy_ph=False, benchmarking=False)[source]

Load an Inferencer incl. all relevant components (model, tokenizer, processor …) either by

  1. specifying a public name from transformers’ model hub (https://huggingface.co/models)

  2. or pointing to a local directory it is saved in.

Parameters
  • model_name_or_path (str) – Local directory or public name of the model to load.

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

  • batch_size (int) – Number of samples computed once per batch

  • gpu (bool) – If GPU shall be used

  • task_type – Type of task the model should be used for. Currently supporting: “embeddings”, “question_answering”, “text_classification”, “ner”. More coming soon…

  • task_type – str

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

  • max_seq_len (int) – maximum length of one text sample

  • doc_stride (int) – Only QA: When input text is longer than max_seq_len it gets split into parts, strided by doc_stride

  • extraction_strategy (str) – Strategy to extract vectors. Choices: ‘cls_token’ (sentence vector), ‘reduce_mean’ (sentence vector), reduce_max (sentence vector), ‘per_token’ (individual token vectors)

  • extraction_layer (int) – number of layer from which the embeddings shall be extracted. Default: -1 (very last layer).

  • s3e_stats (dict) – Stats of a fitted S3E model as returned by fit_s3e_on_corpus() (only needed for task_type=”embeddings” and extraction_strategy = “s3e”)

  • num_processes (int) – the number of processes for multiprocessing.Pool. Set to value of 0 to disable multiprocessing. Set to None to let Inferencer use all CPU cores minus one. If you want to debug the Language Model, you might need to disable multiprocessing! Warning! If you use multiprocessing you have to close the multiprocessing.Pool again! To do so call close_multiprocessing_pool() after you are done using this class. The garbage collector will not do this for you!

  • disable_tqdm (bool) – Whether to disable tqdm logging (can get very verbose in multiprocessing)

  • tokenizer_class (str) – (Optional) Name of the tokenizer class to load (e.g. BertTokenizer)

  • use_fast (bool) – (Optional, True by default) Indicate if FARM should try to load the fast version of the tokenizer (True) or use the Python one (False).

  • tokenizer_args (dict) – (Optional) Will be passed to the Tokenizer __init__ method. See https://huggingface.co/transformers/main_classes/tokenizer.html and detailed tokenizer documentation on Hugging Face Transformers.

  • multithreading_rust (bool) – Whether to allow multithreading in Rust, e.g. for FastTokenizers. Note: Enabling multithreading in Rust AND multiprocessing in python might cause deadlocks.

  • dummy_ph (bool) – If True, methods of the prediction head will be replaced with a dummy method. This is used to isolate lm run time from ph run time.

  • benchmarking (bool) – If True, a benchmarking object will be initialised within the class and certain parts of the code will be timed for benchmarking. Should be kept False if not benchmarking since these timing checkpoints require synchronization of the asynchronous Pytorch operations and may slow down the model.

Returns

An instance of the Inferencer.

close_multiprocessing_pool(join=False)[source]

Close the multiprocessing.Pool again.

If you use multiprocessing you have to close the multiprocessing.Pool again! To do so call this function after you are done using this class. The garbage collector will not do this for you!

Parameters

join (bool) – wait for the worker processes to exit

save(path)[source]
inference_from_file(file, multiprocessing_chunksize=None, streaming=False, return_json=True)[source]

Run down-stream inference on samples created from an input file. The file should be in the same format as the ones used during training (e.g. squad style for QA, tsv for doc classification …) as the same Processor will be used for conversion.

Parameters
  • file (str) – path of the input file for Inference

  • multiprocessing_chunksize (int) – number of dicts to put together in one chunk and feed to one process

  • streaming (bool) – return a Python generator object that yield results as they get computed, instead of blocking for all the results. To use streaming, the dicts parameter must be a generator and num_processes argument must be set. This mode can be useful to implement large scale non-blocking inference pipelines.

Returns

an iterator(list or generator) of predictions

Return type

iter

inference_from_dicts(dicts, return_json=True, multiprocessing_chunksize=None, streaming=False)[source]

Runs down-stream inference on samples created from input dictionaries. The format of the input dicts depends on the task:

  • QA (FARM style): [{“questions”: [“What is X?”], “text”: “Some context containing the answer”}]

  • Classification / NER / embeddings: [{“text”: “Some input text”}]

Inferencer has a high performance non-blocking streaming mode for large scale inference use cases. With this mode, the dicts parameter can optionally be a Python generator object that yield dicts, thus avoiding loading dicts in memory. The inference_from_dicts() method returns a generator that yield predictions. To use streaming, set the streaming param to True and determine optimal multiprocessing_chunksize by performing speed benchmarks.

Parameters
  • dicts (iter(dict)) – Samples to run inference on provided as a list(or a generator object) of dicts. One dict per sample.

  • return_json (bool) – Whether the output should be in a json appropriate format. If False, it returns the prediction object where applicable, else it returns PredObj.to_json()

  • multiprocessing_chunksize (int) – number of dicts to put together in one chunk and feed to one process (only relevant if you do multiprocessing)

  • streaming (bool) – return a Python generator object that yield results as they get computed, instead of blocking for all the results. To use streaming, the dicts parameter must be a generator and num_processes argument must be set. This mode can be useful to implement large scale non-blocking inference pipelines.

Returns

dict of predictions

Returns

an iterator(list or generator) of predictions

Return type

iter

extract_vectors(dicts, extraction_strategy='cls_token', extraction_layer=-1)[source]

Converts a text into vector(s) using the language model only (no prediction head involved).

Example:

basic_texts = [{“text”: “Some text we want to embed”}, {“text”: “And a second one”}] result = inferencer.extract_vectors(dicts=basic_texts)

Parameters
  • dicts ([dict]) – Samples to run inference on provided as a list of dicts. One dict per sample.

  • extraction_strategy (str) – Strategy to extract vectors. Choices: ‘cls_token’ (sentence vector), ‘reduce_mean’ (sentence vector), reduce_max (sentence vector), ‘per_token’ (individual token vectors)

  • extraction_layer (int) – number of layer from which the embeddings shall be extracted. Default: -1 (very last layer).

Returns

dict of predictions

class farm.infer.QAInferencer(*args, **kwargs)[source]

Bases: farm.infer.Inferencer

__init__(*args, **kwargs)[source]

Initializes Inferencer from an AdaptiveModel and a Processor instance.

Parameters
  • model (AdaptiveModel) – AdaptiveModel to run in inference mode

  • processor (Processor) – A dataset specific Processor object which will turn input (file or dict) into a Pytorch Dataset.

  • task_type – Type of task the model should be used for. Currently supporting: “embeddings”, “question_answering”, “text_classification”, “ner”. More coming soon…

  • task_type – str

  • batch_size (int) – Number of samples computed once per batch

  • gpu (bool) – If GPU shall be used

  • name (string) – Name for the current Inferencer model, displayed in the REST API

  • return_class_probs (bool) – either return probability distribution over all labels or the prob of the associated label

  • extraction_strategy (str) – Strategy to extract vectors. Choices: ‘cls_token’ (sentence vector), ‘reduce_mean’ (sentence vector), reduce_max (sentence vector), ‘per_token’ (individual token vectors), ‘s3e’ (sentence vector via S3E pooling, see https://arxiv.org/abs/2002.09620)

  • extraction_layer (int) – number of layer from which the embeddings shall be extracted. Default: -1 (very last layer).

  • s3e_stats (dict) – Stats of a fitted S3E model as returned by fit_s3e_on_corpus() (only needed for task_type=”embeddings” and extraction_strategy = “s3e”)

  • num_processes (int) – the number of processes for multiprocessing.Pool. Set to value of 1 (or 0) to disable multiprocessing. Set to None to let Inferencer use all CPU cores minus one. If you want to debug the Language Model, you might need to disable multiprocessing! Warning! If you use multiprocessing you have to close the multiprocessing.Pool again! To do so call close_multiprocessing_pool() after you are done using this class. The garbage collector will not do this for you!

  • disable_tqdm (bool) – Whether to disable tqdm logging (can get very verbose in multiprocessing)

  • dummy_ph (bool) – If True, methods of the prediction head will be replaced with a dummy method. This is used to isolate lm run time from ph run time.

  • benchmarking (bool) – If True, a benchmarking object will be initialised within the class and certain parts of the code will be timed for benchmarking. Should be kept False if not benchmarking since these timing checkpoints require synchronization of the asynchronous Pytorch operations and may slow down the model.

Returns

An instance of the Inferencer.

inference_from_dicts(dicts, return_json=True, multiprocessing_chunksize=None, streaming=False) → Union[List[farm.modeling.predictions.QAPred], Generator[[farm.modeling.predictions.QAPred, None], None]][source]

Runs down-stream inference on samples created from input dictionaries. The format of the input dicts depends on the task:

  • QA (FARM style): [{“questions”: [“What is X?”], “text”: “Some context containing the answer”}]

  • Classification / NER / embeddings: [{“text”: “Some input text”}]

Inferencer has a high performance non-blocking streaming mode for large scale inference use cases. With this mode, the dicts parameter can optionally be a Python generator object that yield dicts, thus avoiding loading dicts in memory. The inference_from_dicts() method returns a generator that yield predictions. To use streaming, set the streaming param to True and determine optimal multiprocessing_chunksize by performing speed benchmarks.

Parameters
  • dicts (iter(dict)) – Samples to run inference on provided as a list(or a generator object) of dicts. One dict per sample.

  • return_json (bool) – Whether the output should be in a json appropriate format. If False, it returns the prediction object where applicable, else it returns PredObj.to_json()

  • multiprocessing_chunksize (int) – number of dicts to put together in one chunk and feed to one process (only relevant if you do multiprocessing)

  • streaming (bool) – return a Python generator object that yield results as they get computed, instead of blocking for all the results. To use streaming, the dicts parameter must be a generator and num_processes argument must be set. This mode can be useful to implement large scale non-blocking inference pipelines.

Returns

dict of predictions

Returns

an iterator(list or generator) of predictions

Return type

iter

inference_from_file(file, multiprocessing_chunksize=None, streaming=False, return_json=True) → Union[List[farm.modeling.predictions.QAPred], Generator[[farm.modeling.predictions.QAPred, None], None]][source]

Run down-stream inference on samples created from an input file. The file should be in the same format as the ones used during training (e.g. squad style for QA, tsv for doc classification …) as the same Processor will be used for conversion.

Parameters
  • file (str) – path of the input file for Inference

  • multiprocessing_chunksize (int) – number of dicts to put together in one chunk and feed to one process

  • streaming (bool) – return a Python generator object that yield results as they get computed, instead of blocking for all the results. To use streaming, the dicts parameter must be a generator and num_processes argument must be set. This mode can be useful to implement large scale non-blocking inference pipelines.

Returns

an iterator(list or generator) of predictions

Return type

iter

inference_from_objects(objects: List[farm.data_handler.inputs.QAInput], return_json=True, multiprocessing_chunksize=None, streaming=False) → Union[List[farm.modeling.predictions.QAPred], Generator[[farm.modeling.predictions.QAPred, None], None]][source]
class farm.infer.FasttextInferencer(model, name=None)[source]

Bases: object

__init__(model, name=None)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(load_dir, batch_size=4, gpu=False)[source]
extract_vectors(dicts, extraction_strategy='reduce_mean')[source]

Converts a text into vector(s) using the language model only (no prediction head involved).

Parameters
  • dicts ([dict]) – Samples to run inference on provided as a list of dicts. One dict per sample.

  • extraction_strategy (str) – Strategy to extract vectors. Choices: ‘reduce_mean’ (mean sentence vector), ‘reduce_max’ (max per embedding dim), ‘CLS’

Returns

dict of predictions

Experiment

farm.experiment.load_experiments(file)[source]
farm.experiment.run_experiment(args)[source]
farm.experiment.get_adaptive_model(lm_output_type, prediction_heads, layer_dims, model, device, embeds_dropout_prob, class_weights=None)[source]
farm.experiment.validate_args(args)[source]
farm.experiment.save_model()[source]
farm.experiment.load_model()[source]

Metrics

farm.evaluation.metrics.register_metrics(name, implementation)[source]
farm.evaluation.metrics.register_report(name, implementation)[source]

Register a custom reporting function to be used during eval.

This can be useful: - if you want to overwrite a report for an existing output type of prediction head (e.g. “per_token”) - if you have a new type of prediction head and want to add a custom report for it

Parameters
  • name (str) – This must match the ph_output_type attribute of the PredictionHead for which the report should be used. (e.g. TokenPredictionHead => per_token, YourCustomHead => some_new_type).

  • implementation (function) – Function to be executed. It must take lists of y_true and y_pred as input and return a printable object (e.g. string or dict). See sklearns.metrics.classification_report for an example.

farm.evaluation.metrics.simple_accuracy(preds, labels)[source]
farm.evaluation.metrics.acc_and_f1(preds, labels)[source]
farm.evaluation.metrics.f1_macro(preds, labels)[source]
farm.evaluation.metrics.pearson_and_spearman(preds, labels)[source]
farm.evaluation.metrics.compute_metrics(metric, preds, labels)[source]
farm.evaluation.metrics.compute_report_metrics(head, preds, labels)[source]
farm.evaluation.metrics.squad_EM(preds, labels)[source]
farm.evaluation.metrics.squad_f1(preds, labels)[source]
farm.evaluation.metrics.squad_f1_single(pred, label, pred_idx=0)[source]
farm.evaluation.metrics.squad_base(preds, labels)[source]
farm.evaluation.metrics.squad(preds, labels)[source]

This method calculates squad evaluation metrics a) overall, b) for questions with text answer and c) for questions with no answer

farm.evaluation.metrics.top_n_accuracy(preds, labels)[source]

This method calculates the percentage of documents for which the model makes top n accurate predictions. The definition of top n accurate a top n accurate prediction is as follows: For any given question document pair, there can be multiple predictions from the model and multiple labels. If any of those predictions overlap at all with any of the labels, those predictions are considered to be top n accurate.

farm.evaluation.metrics.text_similarity_acc_and_f1(preds, labels)[source]

Returns accuracy and F1 scores for top-1(highest) ranked sequence(context/passage) for each sample/query

Parameters
  • preds (List of numpy array containing similarity scores for each sequence in batch) – list of numpy arrays of dimension n1 x n2 containing n2 predicted ranks for n1 sequences/queries

  • labels (List of list containing values(0/1)) – list of arrays of dimension n1 x n2 where each array contains n2 labels(0/1) indicating whether the sequence/passage is a positive(1) passage or hard_negative(0) passage

Returns

predicted ranks of passages for each query

farm.evaluation.metrics.text_similarity_avg_ranks(preds, labels)[source]

Calculates average predicted rank of positive sequence(context/passage) for each sample/query

Parameters
  • preds (List of numpy array containing similarity scores for each sequence in batch) – list of numpy arrays of dimension n1 x n2 containing n2 predicted ranks for n1 sequences/queries

  • labels (List of list containing values(0/1)) – list of arrays of dimension n1 x n2 where each array contains n2 labels(0/1) indicating whether the sequence/passage is a positive(1) passage or hard_negative(0) passage

Returns

average predicted ranks of positive sequence/passage for each sample/query

farm.evaluation.metrics.text_similarity_metric(preds, labels)[source]

Returns accuracy, F1 scores and average rank scores for text similarity task

Parameters
  • preds (List of numpy array containing similarity scores for each sequence in batch) – list of numpy arrays of dimension n1 x n2 containing n2 predicted ranks for n1 sequences/queries

  • labels (List of list containing values(0/1)) – list of arrays of dimension n1 x n2 where each array contains n2 labels(0/1) indicating whether the sequence/passage is a positive(1) passage or hard_negative(0) passage

:return metrics(accuracy, F1, average rank) for text similarity task

File utils

Utilities for working with the local dataset cache. This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp Copyright by the AllenNLP authors.

farm.file_utils.url_to_filename(url, etag=None)[source]

Convert url into a hashed filename in a repeatable way. If etag is specified, append its hash to the url’s, delimited by a period.

farm.file_utils.filename_to_url(filename, cache_dir=None)[source]

Return the url and etag (which may be None) stored for filename. Raise EnvironmentError if filename or its stored metadata do not exist.

farm.file_utils.download_from_s3(s3_url: str, cache_dir: str = None, access_key: str = None, secret_access_key: str = None, region_name: str = None)[source]

Download a “folder” from s3 to local. Skip already existing files. Useful for downloading all files of one model The default and recommended authentication follows boto3’s trajectory of checking for ENV variables, .aws/credentials etc. (see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html). However, there’s also the option to pass access_key, secret_access_key and region_name directly as this is needed in some enterprise enviroments with local s3 deployments.

Parameters
  • s3_url – Url of the “folder” in s3 (e.g. s3://mybucket/my_modelname)

  • cache_dir – Optional local directory where the files shall be stored. If not supplied, we’ll use a subfolder in torch’s cache dir (~/.cache/torch/farm)

  • access_key – Optional S3 Access Key

  • secret_access_key – Optional S3 Secret Access Key

  • region_name – Optional Region Name

Returns

local path of the folder

farm.file_utils.split_s3_path(url)[source]

Split a full s3 path into the bucket name and path.

farm.file_utils.s3_request(func)[source]

Wrapper function for s3 requests in order to create more helpful error messages.

farm.file_utils.s3_etag(url)[source]

Check ETag on S3 object.

farm.file_utils.s3_get(url, temp_file)[source]

Pull a file directly from S3.

farm.file_utils.http_get(url, temp_file, proxies=None)[source]
farm.file_utils.fetch_archive_from_http(url, output_dir, proxies=None)[source]

Fetch an archive (zip or tar.gz) from a url via http and extract content to an output directory.

Parameters
  • url (str) – http address

  • output_dir (str) – local path

  • proxies (dict) – proxies details as required by requests library

Returns

bool if anything got fetched

farm.file_utils.load_from_cache(pretrained_model_name_or_path, s3_dict, **kwargs)[source]
farm.file_utils.read_set_from_file(filename)[source]

Extract a de-duped collection (set) of text from a file. Expected file format is one item per line.

farm.file_utils.get_file_extension(path, dot=True, lower=True)[source]
farm.file_utils.read_config(path)[source]
farm.file_utils.unnestConfig(config)[source]

This function creates a list of config files for evaluating parameters with different values. If a config parameter is of type list this list is iterated over and a config object without lists is returned. Can handle lists inside any number of parameters.

Can handle nested (one level) configs