Modeling

Adaptive Model

class farm.modeling.adaptive_model.AdaptiveModel(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device)[source]

Bases: torch.nn.modules.module.Module

Contains all the modelling needed for your NLP task. Combines a language model and a prediction head. Allows for gradient flow back to the language model component.

__init__(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device)[source]
Parameters
  • language_model (LanguageModel) – Any model that turns token ids into vector representations

  • prediction_heads (list) – A list of models that take embeddings and return logits for a given task

  • embeds_dropout_prob – The probability that a value in the embeddings returned by the language model will be zeroed.

  • embeds_dropout_prob – float

  • lm_output_types (list or str) – How to extract the embeddings from the final layer of the language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.

  • device – The device on which this model will operate. Either “cpu” or “cuda”.

save(save_dir)[source]

Saves the language model and prediction heads. This will generate a config file and model weights for each.

Parameters

save_dir (str) – path to save to

classmethod load(load_dir, device, strict=True)[source]

Loads an AdaptiveModel from a directory. The directory must contain:

  • language_model.bin

  • language_model_config.json

  • prediction_head_X.bin multiple PH possible

  • prediction_head_X_config.json

  • processor_config.json config for transforming input

  • vocab.txt vocab file for language model, turning text to Wordpiece Tokens

Parameters
  • load_dir (str) – location where adaptive model is stored

  • device (torch.device) – to which device we want to sent the model, either cpu or cuda

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

logits_to_loss_per_head(logits, **kwargs)[source]

Collect losses from each prediction head.

Parameters

logits (object) – logits, can vary in shape and type, depending on task.

Returns

The per sample per prediciton head loss whose first two dimensions have length n_pred_heads, batch_size

logits_to_loss(logits, **kwargs)[source]

Get losses from all prediction heads & reduce to single loss per sample.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • kwargs (object) – placeholder for passing generic parameters

Return loss

torch.tensor that is the per sample loss (len: batch_size)

logits_to_preds(logits, **kwargs)[source]

Get predictions from all prediction heads.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • label_maps – Maps from label encoding to label string

  • label_maps – dict

Returns

A list of all predictions from all prediction heads

prepare_labels(**kwargs)[source]

Label conversion to original label space, per prediction head.

Parameters

label_maps (dict[int:str]) – dictionary for mapping ids to label strings

Returns

labels in the right format

formatted_preds(logits, **kwargs)[source]

Format predictions for inference.

Parameters
  • logits (torch.tensor) – model logits

  • label_maps (dict[int:str]) – dictionary for mapping ids to label strings

  • kwargs (object) – placeholder for passing generic parameters

Returns

predictions in the right format

forward(**kwargs)[source]

Push data through the whole model and returns logits. The data will propagate through the language model and each of the attached prediction heads.

Parameters

kwargs – Holds all arguments that need to be passed to the language model and prediction head(s).

Returns

all logits as torch.tensor or multiple tensors.

connect_heads_with_processor(tasks, require_labels=True)[source]

Populates prediction head with information coming from tasks.

Parameters
  • tasks – A dictionary where the keys are the names of the tasks and the values are the details of the task (e.g. label_list, metric, tensor name)

  • require_labels – If True, an error will be thrown when a task is not supplied with labels)

Returns

log_params()[source]

Logs paramteres to generic logger MlLogger

verify_vocab_size(vocab_size)[source]

Verifies that the model fits to the tokenizer vocabulary. They could diverge in case of custom vocabulary added via tokenizer.add_tokens()

Language Model

Acknowledgements: Many of the modeling parts here come from the great transformers repository: https://github.com/huggingface/transformers. Thanks for the great work!

class farm.modeling.language_model.LanguageModel[source]

Bases: torch.nn.modules.module.Module

The parent class for any kind of model that can embed language into a semantic vector space. Practically speaking, these models read in tokenized sentences and return vectors that capture the meaning of sentences or of tokens.

subclasses = {'Bert': <class 'farm.modeling.language_model.Bert'>, 'Roberta': <class 'farm.modeling.language_model.Roberta'>, 'XLNet': <class 'farm.modeling.language_model.XLNet'>}
forward(input_ids, padding_mask, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod load(pretrained_model_name_or_path, n_added_tokens=0, **kwargs)[source]

Load a pretrained language model either by

  1. specifying its name and downloading it

  2. or pointing to the directory it is saved in.

Available remote models:

  • bert-base-uncased

  • bert-large-uncased

  • bert-base-cased

  • bert-large-cased

  • bert-base-multilingual-uncased

  • bert-base-multilingual-cased

  • bert-base-chinese

  • bert-base-german-cased

  • roberta-base

  • roberta-large

  • xlnet-base-cased

  • xlnet-large-cased

Parameters

pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.

freeze(layers)[source]

To be implemented

unfreeze()[source]

To be implemented

save_config(save_dir)[source]

To be implemented

save(save_dir)[source]

Save the model state_dict and its config file so that it can be loaded again.

Parameters

save_dir (str) – The directory in which the model should be saved.

formatted_preds(input_ids, samples, extraction_strategy='pooled', extraction_layer=-1, ignore_first_token=True, padding_mask=None, **kwargs)[source]
class farm.modeling.language_model.Bert[source]

Bases: farm.modeling.language_model.LanguageModel

A BERT model that wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1810.04805

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a pretrained model by supplying

  • the name of a remote model on s3 (“bert-base-cased” …)

  • OR a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • OR a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters

pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the BERT model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
save_config(save_dir)[source]

To be implemented

class farm.modeling.language_model.Roberta[source]

Bases: farm.modeling.language_model.LanguageModel

A roberta model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1907.11692

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“roberta-base” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the Roberta model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
save_config(save_dir)[source]

To be implemented

class farm.modeling.language_model.XLNet[source]

Bases: farm.modeling.language_model.LanguageModel

A XLNet model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1906.08237

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“xlnet-base-cased” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the XLNet model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
save_config(save_dir)[source]

To be implemented

Prediction Head

class farm.modeling.prediction_head.PredictionHead[source]

Bases: torch.nn.modules.module.Module

Takes word embeddings from a language model and generates logits for a given task. Can also convert logits to loss and and logits to predictions.

classmethod create(prediction_head_name, layer_dims, class_weights=None)[source]

Create subclass of Prediction Head.

Parameters
  • prediction_head_name (str) – Classname (exact string!) of prediction head we want to create

  • layer_dims (List[Int]) – describing the feed forward block structure, e.g. [768,2]

  • class_weights (list[Float]) – The loss weighting to be assigned to certain label classes during training. Used to correct cases where there is a strong class imbalance.

Returns

Prediction Head of class prediction_head_name

save_config(save_dir, head_num=0)[source]

Saves the config as a json file.

Parameters
  • save_dir (str) – Path to save config to

  • head_num (int) – Which head to save

save(save_dir, head_num=0)[source]

Saves the prediction head state dict.

Parameters
  • save_dir (str) – path to save prediction head to

  • head_num (int) – which head to save

generate_config()[source]

Generates config file from Class parameters (only for sensible config parameters).

classmethod load(config_file, strict=True)[source]

Loads a Prediction Head. Infers the class of prediction head from config_file.

Parameters
  • config_file (str) – location where corresponding config is stored

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

Returns

PredictionHead

Return type

PredictionHead[T]

logits_to_loss(logits, labels)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

class farm.modeling.prediction_head.RegressionHead(layer_dims, task_name='regression', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims, task_name='regression', **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

formatted_preds(logits, samples, **kwargs)[source]
class farm.modeling.prediction_head.TextClassificationHead(layer_dims, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_probs(logits, return_class_probs, **kwargs)[source]
logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

formatted_preds(logits, samples, return_class_probs=False, **kwargs)[source]
class farm.modeling.prediction_head.MultiLabelTextClassificationHead(layer_dims, class_weights=None, loss_reduction='none', task_name='text_classification', pred_threshold=0.5, **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims, class_weights=None, loss_reduction='none', task_name='text_classification', pred_threshold=0.5, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_probs(logits, **kwargs)[source]
logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

formatted_preds(logits, samples, **kwargs)[source]
class farm.modeling.prediction_head.TokenClassificationHead(layer_dims, task_name='ner', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims, task_name='ner', **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, initial_mask, padding_mask=None, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits, initial_mask, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

logits_to_probs(logits, initial_mask, return_class_probs, **kwargs)[source]
prepare_labels(initial_mask, **kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

static initial_token_only(seq, initial_mask)[source]
formatted_preds(logits, initial_mask, samples, return_class_probs=False, **kwargs)[source]
class farm.modeling.prediction_head.BertLMHead(hidden_size, vocab_size, hidden_act='gelu', task_name='lm', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(hidden_size, vocab_size, hidden_act='gelu', task_name='lm', **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(pretrained_model_name_or_path, n_added_tokens=0)[source]

Loads a Prediction Head. Infers the class of prediction head from config_file.

Parameters
  • config_file (str) – location where corresponding config is stored

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

Returns

PredictionHead

Return type

PredictionHead[T]

set_shared_weights(shared_embedding_weights)[source]
forward(hidden_states)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

class farm.modeling.prediction_head.NextSentenceHead(layer_dims, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]

Bases: farm.modeling.prediction_head.TextClassificationHead

Almost identical to a TextClassificationHead. Only difference: we can load the weights from

a pretrained language model that was saved in the pytorch-transformers style (all in one model).

classmethod load(pretrained_model_name_or_path)[source]

Loads a Prediction Head. Infers the class of prediction head from config_file.

Parameters
  • config_file (str) – location where corresponding config is stored

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

Returns

PredictionHead

Return type

PredictionHead[T]

class farm.modeling.prediction_head.FeedForwardBlock(layer_dims, **kwargs)[source]

Bases: torch.nn.modules.module.Module

A feed forward neural network of variable depth and width.

__init__(layer_dims, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class farm.modeling.prediction_head.QuestionAnsweringHead(layer_dims, task_name='question_answering', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

A question answering head predicts the start and end of the answer on token level.

__init__(layer_dims, task_name='question_answering', **kwargs)[source]
Parameters
  • layer_dims (List[Int]) – dimensions of Feed Forward block, e.g. [768,2], for adjusting to BERT embedding. Output should be always 2

  • kwargs (object) – placeholder for passing generic parameters

classmethod load(pretrained_model_name_or_path)[source]
Almost identical to a QuestionAnsweringHead. Only difference: we can load the weights from

a pretrained language model that was saved in the pytorch-transformers style (all in one model).

forward(X)[source]

One forward pass through the prediction head model, starting with language model output on token level

logits_to_loss(logits, labels, **kwargs)[source]

Combine predictions and labels to a per sample loss.

logits_to_preds(logits, padding_mask, start_of_word, seq_2_start_t, max_answer_length=1000, **kwargs)[source]

Get the predicted index of start and end token of the answer. Note that the output is at token level and not word level. Note also that these logits correspond to the tokens of a sample (i.e. special tokens, question tokens, passage_tokens)

get_top_candidates(sorted_candidates, start_end_matrix, n_non_padding, max_answer_length, seq_2_start_t, n_best=5)[source]

Returns top candidate answers. Operates on a matrix of summed start and end logits. This matrix corresponds to a single sample (includes special tokens, question tokens, passage tokens). This method always returns a list of len n_best + 1 (it is comprised of the n_best positive answers along with the one no_answer)

static valid_answer_idxs(start_idx, end_idx, n_non_padding, max_answer_length, seq_2_start_t)[source]

Returns True if the supplied index span is a valid prediction. The indices being provided should be on sample/passage level (special tokens + question_tokens + passag_tokens) and not document level

formatted_preds(logits, baskets, rest_api_schema=False)[source]

Takes a list of logits, each corresponding to one sample, and converts them into document level predictions. Leverages information in the SampleBaskets. Assumes that we are being passed logits from ALL samples in the one SampleBasket i.e. all passages of a document.

stringify(top_preds, baskets)[source]

Turn prediction spans into strings

to_rest_api_schema(formatted_preds, no_ans_gaps, baskets)[source]
answer_for_api(top_preds, basket)[source]
create_context(ans_start_ch, ans_end_ch, clear_text, window_size_ch=100)[source]
static span_to_string(start_t, end_t, token_offsets, clear_text)[source]
has_no_answer_idxs(sample_top_n)[source]
aggregate_preds(preds, passage_start_t, ids, seq_2_start_t=None, labels=None)[source]

Aggregate passage level predictions to create document level predictions. This method assumes that all passages of each document are contained in preds i.e. that there are no incomplete documents. The output of this step are prediction spans. No answer is represented by a (-1, -1) span on the document level

static reduce_labels(labels)[source]

Removes repeat answers. Represents a no answer label as (-1,-1)

reduce_preds(preds, n_best=5)[source]

This function contains the logic for choosing the best answers from each passage. In the end, it returns the n_best predictions on the document level.

static get_no_answer_score(preds)[source]
static pred_to_doc_idxs(pred, passage_start_t)[source]

Converts the passage level predictions to document level predictions. Note that on the doc level we don’t have special tokens or question tokens. This means that a no answer cannot be prepresented by a (0,0) span but will instead be represented by (-1, -1)

static label_to_doc_idxs(label, passage_start_t)[source]

Converts the passage level labels to document level labels. Note that on the doc level we don’t have special tokens or question tokens. This means that a no answer cannot be prepresented by a (0,0) span but will instead be represented by (-1, -1)

prepare_labels(labels, start_of_word, **kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

Optimization

PyTorch optimization for BERT model.

class farm.modeling.optimization.ConstantLR(warmup=0.002, t_total=-1, **kw)[source]

Bases: farm.modeling.optimization._LRSchedule

get_lr_(progress)[source]
Parameters

progress – value between 0 and 1 (unless going beyond t_total steps) specifying training progress

Returns

learning rate multiplier for current update

class farm.modeling.optimization.WarmupCosineSchedule(warmup=0.002, t_total=-1, cycles=0.5, **kw)[source]

Bases: farm.modeling.optimization._LRSchedule

Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Decreases learning rate from 1. to 0. over remaining 1 - warmup steps following a cosine curve. If cycles (default=0.5) is different from default, learning rate follows cosine function after warmup.

warn_t_total = True
__init__(warmup=0.002, t_total=-1, cycles=0.5, **kw)[source]
Parameters
  • warmup – see LRSchedule

  • t_total – see LRSchedule

  • cycles – number of cycles. Default: 0.5, corresponding to cosine decay from 1. at progress==warmup and 0 at progress==1.

  • kw

get_lr_(progress)[source]
Parameters

progress – value between 0 and 1 (unless going beyond t_total steps) specifying training progress

Returns

learning rate multiplier for current update

class farm.modeling.optimization.WarmupCosineWithHardRestartsSchedule(warmup=0.002, t_total=-1, cycles=1.0, **kw)[source]

Bases: farm.modeling.optimization.WarmupCosineSchedule

Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. If cycles (default=1.) is different from default, learning rate follows cycles times a cosine decaying learning rate (with hard restarts).

__init__(warmup=0.002, t_total=-1, cycles=1.0, **kw)[source]
Parameters
  • warmup – see LRSchedule

  • t_total – see LRSchedule

  • cycles – number of cycles. Default: 0.5, corresponding to cosine decay from 1. at progress==warmup and 0 at progress==1.

  • kw

get_lr_(progress)[source]
Parameters

progress – value between 0 and 1 (unless going beyond t_total steps) specifying training progress

Returns

learning rate multiplier for current update

class farm.modeling.optimization.WarmupCosineWithWarmupRestartsSchedule(warmup=0.002, t_total=-1, cycles=1.0, **kw)[source]

Bases: farm.modeling.optimization.WarmupCosineWithHardRestartsSchedule

All training progress is divided in cycles (default=1.) parts of equal length. Every part follows a schedule with the first warmup fraction of the training steps linearly increasing from 0. to 1., followed by a learning rate decreasing from 1. to 0. following a cosine curve.

__init__(warmup=0.002, t_total=-1, cycles=1.0, **kw)[source]
Parameters
  • warmup – see LRSchedule

  • t_total – see LRSchedule

  • cycles – number of cycles. Default: 0.5, corresponding to cosine decay from 1. at progress==warmup and 0 at progress==1.

  • kw

get_lr_(progress)[source]
Parameters

progress – value between 0 and 1 (unless going beyond t_total steps) specifying training progress

Returns

learning rate multiplier for current update

class farm.modeling.optimization.WarmupConstantSchedule(warmup=0.002, t_total=-1, **kw)[source]

Bases: farm.modeling.optimization._LRSchedule

Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Keeps learning rate equal to 1. after warmup.

get_lr_(progress)[source]
Parameters

progress – value between 0 and 1 (unless going beyond t_total steps) specifying training progress

Returns

learning rate multiplier for current update

class farm.modeling.optimization.WarmupLinearSchedule(warmup=0.002, t_total=-1, **kw)[source]

Bases: farm.modeling.optimization._LRSchedule

Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Linearly decreases learning rate from 1. to 0. over remaining 1 - warmup steps.

warn_t_total = True
get_lr_(progress)[source]
Parameters

progress – value between 0 and 1 (unless going beyond t_total steps) specifying training progress

Returns

learning rate multiplier for current update

class farm.modeling.optimization.BertAdam(params, lr=<required parameter>, warmup=-1, t_total=-1, schedule='warmup_linear', b1=0.9, b2=0.999, e=1e-06, weight_decay=0.01, max_grad_norm=1.0, log_learning_rate=False, **kwargs)[source]

Bases: torch.optim.optimizer.Optimizer

Implements BERT version of Adam algorithm with weight decay fix.

__init__(params, lr=<required parameter>, warmup=-1, t_total=-1, schedule='warmup_linear', b1=0.9, b2=0.999, e=1e-06, weight_decay=0.01, max_grad_norm=1.0, log_learning_rate=False, **kwargs)[source]
Parameters
  • params

  • lr – learning rate

  • warmup – portion of t_total for the warmup, -1 means no warmup. Default: -1

  • t_total – total number of training steps for the learning rate schedule, -1 means constant learning rate of 1. (no warmup regardless of warmup setting). Default: -1

  • schedule – schedule to use for the warmup (see above). Can be ‘warmup_linear’, ‘warmup_constant’, ‘warmup_cosine’, ‘none’, None or a _LRSchedule object (see below). If None or ‘none’, learning rate is always kept constant. Default : ‘warmup_linear’

  • b1 – Adams b1. Default: 0.9

  • b2 – Adams b2. Default: 0.999

  • e – Adams epsilon. Default: 1e-6

  • weight_decay – Weight decay. Default: 0.01

  • max_grad_norm – Maximum norm for the gradients (-1 means no clipping). Default: 1.0

get_lr()[source]
step(closure=None)[source]

Performs a single optimization step.

Arguments:
closure (callable, optional): A closure that reevaluates the model

and returns the loss.

farm.modeling.optimization.initialize_optimizer(model, n_batches, n_epochs, warmup_proportion=0.1, learning_rate=2e-05, fp16=False, loss_scale=0, grad_acc_steps=1, local_rank=-1, log_learning_rate=False)[source]
farm.modeling.optimization.calculate_optimization_steps(n_batches, grad_acc_steps, n_epochs, local_rank)[source]

Tokenization

Tokenization classes.

class farm.modeling.tokenization.Tokenizer[source]

Bases: object

Simple Wrapper for Tokenizers from the transformers package. Enables loading of different Tokenizer classes with a uniform interface.

classmethod load(pretrained_model_name_or_path, tokenizer_class=None, **kwargs)[source]

Enables loading of different Tokenizer classes with a uniform interface. Either infer the class from pretrained_model_name_or_path or define it manually via tokenizer_class.

Parameters
  • pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name (e.g. bert-base-uncased)

  • tokenizer_class (str) – (Optional) Name of the tokenizer class to load (e.g. BertTokenizer)

  • kwargs

Returns

Tokenizer

farm.modeling.tokenization.tokenize_with_metadata(text, tokenizer)[source]

Performing tokenization while storing some important metadata for each token:

  • offsets: (int) Character index where the token begins in the original text

  • start_of_word: (bool) If the token is the start of a word. Particularly helpful for NER and QA tasks.

We do this by first doing whitespace tokenization and then applying the model specific tokenizer to each “word”.

Note

We don’t assume to preserve exact whitespaces in the tokens! This means: tabs, new lines, multiple whitespace etc will all resolve to a single ” “. This doesn’t make a difference for BERT + XLNet but it does for RoBERTa. For RoBERTa it has the positive effect of a shorter sequence length, but some information about whitespace type is lost which might be helpful for certain NLP tasks ( e.g tab for tables).

Parameters
  • text (str) – Text to tokenize

  • tokenizer – Tokenizer (e.g. from Tokenizer.load())

Returns

Dictionary with “tokens”, “offsets” and “start_of_word”

Return type

dict

farm.modeling.tokenization.truncate_sequences(seq_a, seq_b, tokenizer, max_seq_len, truncation_strategy='longest_first', with_special_tokens=True, stride=0)[source]

Reduces a single sequence or a pair of sequences to a maximum sequence length. The sequences can contain tokens or any other elements (offsets, masks …). If with_special_tokens is enabled, it’ll remove some additional tokens to have exactly enough space for later adding special tokens (CLS, SEP etc.)

Supported truncation strategies:

  • longest_first: (default) Iteratively reduce the inputs sequence until the input is under max_length starting from the longest one at each token (when there is a pair of input sequences). Overflowing tokens only contains overflow from the first sequence.

  • only_first: Only truncate the first sequence. raise an error if the first sequence is shorter or equal to than num_tokens_to_remove.

  • only_second: Only truncate the second sequence

  • do_not_truncate: Does not truncate (raise an error if the input sequence is longer than max_length)

Parameters
  • seq_a (list) – First sequence of tokens/offsets/…

  • seq_b (None or list) – Optional second sequence of tokens/offsets/…

  • tokenizer – Tokenizer (e.g. from Tokenizer.load())

  • max_seq_len (int) –

  • truncation_strategy (str) – how the sequence(s) should be truncated down. Default: “longest_first” (see above for other options).

  • with_special_tokens (bool) – If true, it’ll remove some additional tokens to have exactly enough space for later adding special tokens (CLS, SEP etc.)

  • stride (int) – optional stride of the window during truncation

Returns

truncated seq_a, truncated seq_b, overflowing tokens

farm.modeling.tokenization.insert_at_special_tokens_pos(seq, special_tokens_mask, insert_element)[source]

Adds elements to a sequence at the positions that align with special tokens. This is useful for expanding label ids or masks, so that they align with corresponding tokens (incl. the special tokens)

Example:

# Tokens:  ["CLS", "some", "words","SEP"]
>>> special_tokens_mask =  [1,0,0,1]
>>> lm_label_ids =  [12,200]
>>> insert_at_special_tokens_pos(lm_label_ids, special_tokens_mask, insert_element=-1)
[-1, 12, 200, -1]
Parameters
  • seq (list) – List where you want to insert new elements

  • special_tokens_mask (list) – list with “1” for positions of special chars

  • insert_element – the value you want to insert

Returns

list