Modeling

Adaptive Model

class farm.modeling.adaptive_model.BaseAdaptiveModel(prediction_heads)[source]

Bases: object

Base Class for implementing AdaptiveModel with frameworks like PyTorch and ONNX.

subclasses = {'AdaptiveModel': <class 'farm.modeling.adaptive_model.AdaptiveModel'>, 'ONNXAdaptiveModel': <class 'farm.modeling.adaptive_model.ONNXAdaptiveModel'>, 'ONNXWrapper': <class 'farm.modeling.adaptive_model.ONNXWrapper'>}
__init__(prediction_heads)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(**kwargs)[source]

Load corresponding AdaptiveModel Class(AdaptiveModel/ONNXAdaptiveModel) based on the files in the load_dir.

Parameters

kwargs – arguments to pass for loading the model.

Returns

instance of a model

logits_to_preds(logits, **kwargs)[source]

Get predictions from all prediction heads.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • label_maps – Maps from label encoding to label string

  • label_maps – dict

Returns

A list of all predictions from all prediction heads

formatted_preds(logits, **kwargs)[source]

Format predictions for inference.

Parameters
  • logits (torch.tensor) – model logits

  • kwargs (object) – placeholder for passing generic parameters

Returns

predictions in the right format

connect_heads_with_processor(tasks, require_labels=True)[source]

Populates prediction head with information coming from tasks.

Parameters
  • tasks – A dictionary where the keys are the names of the tasks and the values are the details of the task (e.g. label_list, metric, tensor name)

  • require_labels – If True, an error will be thrown when a task is not supplied with labels)

Returns

farm.modeling.adaptive_model.loss_per_head_sum(loss_per_head, global_step=None, batch=None)[source]

Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor)

class farm.modeling.adaptive_model.AdaptiveModel(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device, loss_aggregation_fn=None)[source]

Bases: torch.nn.modules.module.Module, farm.modeling.adaptive_model.BaseAdaptiveModel

PyTorch implementation containing all the modelling needed for your NLP task. Combines a language model and a prediction head. Allows for gradient flow back to the language model component.

__init__(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device, loss_aggregation_fn=None)[source]
Parameters
  • language_model (LanguageModel) – Any model that turns token ids into vector representations

  • prediction_heads (list) – A list of models that take embeddings and return logits for a given task

  • embeds_dropout_prob – The probability that a value in the embeddings returned by the language model will be zeroed.

  • embeds_dropout_prob – float

  • lm_output_types (list or str) – How to extract the embeddings from the final layer of the language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.

  • device – The device on which this model will operate. Either “cpu” or “cuda”.

  • loss_aggregation_fn (function) – Function to aggregate the loss of multiple prediction heads. Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor) Default is a simple sum: lambda loss_per_head, global_step=None, batch=None: sum(tensors) However, you can pass more complex functions that depend on the current step (e.g. for round-robin style multitask learning) or the actual content of the batch (e.g. certain labels) Note: The loss at this stage is per sample, i.e one tensor of shape (batchsize) per prediction head.

fit_heads_to_lm()[source]

This iterates over each prediction head and ensures that its input dimensionality matches the output dimensionality of the language model. If it doesn’t, it is resized so it does fit

bypass_ph()[source]

Replaces methods in the prediction heads with dummy functions. Used for benchmarking where we want to isolate the lm run time from ph run time.

save(save_dir)[source]

Saves the language model and prediction heads. This will generate a config file and model weights for each.

Parameters

save_dir (Path) – path to save to

classmethod load(load_dir, device, strict=True, lm_name=None, processor=None)[source]

Loads an AdaptiveModel from a directory. The directory must contain:

  • language_model.bin

  • language_model_config.json

  • prediction_head_X.bin multiple PH possible

  • prediction_head_X_config.json

  • processor_config.json config for transforming input

  • vocab.txt vocab file for language model, turning text to Wordpiece Tokens

Parameters
  • load_dir (Path) – location where adaptive model is stored

  • device (torch.device) – to which device we want to sent the model, either cpu or cuda

  • lm_name (str) – the name to assign to the loaded language model

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

  • processor (Processor) – populates prediction head with information coming from tasks

logits_to_loss_per_head(logits, **kwargs)[source]

Collect losses from each prediction head.

Parameters

logits (object) – logits, can vary in shape and type, depending on task.

Returns

The per sample per prediciton head loss whose first two dimensions have length n_pred_heads, batch_size

logits_to_loss(logits, global_step=None, **kwargs)[source]

Get losses from all prediction heads & reduce to single loss per sample.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • global_step (int) – number of current training step

  • kwargs (object) – placeholder for passing generic parameters. Note: Contains the batch (as dict of tensors), when called from Trainer.train().

Return loss

torch.tensor that is the per sample loss (len: batch_size)

prepare_labels(**kwargs)[source]

Label conversion to original label space, per prediction head.

Parameters

label_maps (dict[int:str]) – dictionary for mapping ids to label strings

Returns

labels in the right format

forward(**kwargs)[source]

Push data through the whole model and returns logits. The data will propagate through the language model and each of the attached prediction heads.

Parameters

kwargs – Holds all arguments that need to be passed to the language model and prediction head(s).

Returns

all logits as torch.tensor or multiple tensors.

forward_lm(**kwargs)[source]

Forward pass for the language model

Parameters

kwargs

Returns

log_params()[source]

Logs paramteres to generic logger MlLogger

verify_vocab_size(vocab_size)[source]

Verifies that the model fits to the tokenizer vocabulary. They could diverge in case of custom vocabulary added via tokenizer.add_tokens()

get_language()[source]
convert_to_transformers()[source]

Convert an adaptive model to huggingface’s transformers format. Returns a list containing one model for each prediction head.

Returns

List of huggingface transformers models.

classmethod convert_from_transformers(model_name_or_path, device, revision=None, task_type=None, processor=None)[source]
Load a (downstream) model from huggingface’s transformers format. Use cases:
  • continue training in FARM (e.g. take a squad QA model and fine-tune on your own data)

  • compare models without switching frameworks

  • use model directly for inference

Parameters
  • model_name_or_path

    local path of a saved model or name of a public one. Exemplary public names: - distilbert-base-uncased-distilled-squad - deepset/bert-large-uncased-whole-word-masking-squad2

    See https://huggingface.co/models for full list

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

  • device – “cpu” or “cuda”

  • task_type – One of : - ‘question_answering’ - ‘text_classification’ - ‘embeddings’ More tasks coming soon …

  • processor (Processor) – populates prediction head with information coming from tasks

Returns

AdaptiveModel

classmethod convert_to_onnx(model_name, output_path, task_type, convert_to_float16=False, quantize=False, opset_version=11)[source]

Convert a PyTorch model from transformers hub to an ONNX Model.

Parameters
  • model_name (str) – transformers model name

  • output_path (Path) – output Path to write the converted to

  • task_type – Type of task for the model. Available options: “embeddings”, “question_answering”, “text_classification”, “ner”.

  • convert_to_float16 (bool) – By default, the model use float32 precision. With half precision of flaot16, inference should be faster on Nvidia GPUs with Tensor core like T4 or V100. On older GPUs, float32 might be more performant.

  • quantize (bool) – convert floating point number to integers

  • opset_version (int) – ONNX opset version

Returns

class farm.modeling.adaptive_model.ONNXAdaptiveModel(onnx_session, language_model_class, language, prediction_heads, device)[source]

Bases: farm.modeling.adaptive_model.BaseAdaptiveModel

Implementation of ONNX Runtime for Inference of ONNX Models.

Existing PyTorch based FARM AdaptiveModel can be converted to ONNX format using AdaptiveModel.convert_to_onnx(). The conversion is currently only implemented for Question Answering Models.

For inference, this class is compatible with the FARM Inferencer.

__init__(onnx_session, language_model_class, language, prediction_heads, device)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(load_dir, device, **kwargs)[source]

Load corresponding AdaptiveModel Class(AdaptiveModel/ONNXAdaptiveModel) based on the files in the load_dir.

Parameters

kwargs – arguments to pass for loading the model.

Returns

instance of a model

forward(**kwargs)[source]

Perform forward pass on the model and return the logits.

Parameters

kwargs – all arguments that needs to be passed on to the model

Returns

all logits as torch.tensor or multiple tensors.

eval()[source]

Stub to make ONNXAdaptiveModel compatible with the PyTorch AdaptiveModel.

get_language()[source]

Get the language(s) the model was trained for. :return: str

class farm.modeling.adaptive_model.ONNXWrapper(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device, loss_aggregation_fn=None)[source]

Bases: farm.modeling.adaptive_model.AdaptiveModel

Wrapper Class for converting PyTorch models to ONNX.

As of torch v1.4.0, torch.onnx.export only support passing positional arguments to the forward pass of the model. However, the AdaptiveModel’s forward takes keyword arguments. This class circumvents the issue by converting positional arguments to keyword arguments.

classmethod load_from_adaptive_model(adaptive_model)[source]
forward(*batch)[source]

Push data through the whole model and returns logits. The data will propagate through the language model and each of the attached prediction heads.

Parameters

kwargs – Holds all arguments that need to be passed to the language model and prediction head(s).

Returns

all logits as torch.tensor or multiple tensors.

BiAdaptive Model

class farm.modeling.biadaptive_model.BaseBiAdaptiveModel(prediction_heads)[source]

Bases: object

Base Class for implementing AdaptiveModel with frameworks like PyTorch and ONNX.

subclasses = {'BiAdaptiveModel': <class 'farm.modeling.biadaptive_model.BiAdaptiveModel'>}
__init__(prediction_heads)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod load(**kwargs)[source]

Load corresponding AdaptiveModel Class(AdaptiveModel/ONNXAdaptiveModel) based on the files in the load_dir.

Parameters

kwargs – arguments to pass for loading the model.

Returns

instance of a model

logits_to_preds(logits, **kwargs)[source]

Get predictions from all prediction heads.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • label_maps – Maps from label encoding to label string

  • label_maps – dict

Returns

A list of all predictions from all prediction heads

formatted_preds(logits, language_model1, language_model2, **kwargs)[source]

Format predictions to strings for inference output

Parameters
  • logits (torch.tensor) – model logits

  • kwargs (object) – placeholder for passing generic parameters

Returns

predictions in the right format

connect_heads_with_processor(tasks, require_labels=True)[source]

Populates prediction head with information coming from tasks.

Parameters
  • tasks – A dictionary where the keys are the names of the tasks and the values are the details of the task (e.g. label_list, metric, tensor name)

  • require_labels – If True, an error will be thrown when a task is not supplied with labels)

Returns

farm.modeling.biadaptive_model.loss_per_head_sum(loss_per_head, global_step=None, batch=None)[source]

Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor)

class farm.modeling.biadaptive_model.BiAdaptiveModel(language_model1, language_model2, prediction_heads, embeds_dropout_prob=0.1, device='cuda', lm1_output_types=['per_sequence'], lm2_output_types=['per_sequence'], loss_aggregation_fn=None)[source]

Bases: torch.nn.modules.module.Module, farm.modeling.biadaptive_model.BaseBiAdaptiveModel

PyTorch implementation containing all the modelling needed for your NLP task. Combines 2 language models for representation of 2 sequences and a prediction head. Allows for gradient flow back to the 2 language model components.

__init__(language_model1, language_model2, prediction_heads, embeds_dropout_prob=0.1, device='cuda', lm1_output_types=['per_sequence'], lm2_output_types=['per_sequence'], loss_aggregation_fn=None)[source]
Parameters
  • language_model1 (LanguageModel) – Any model that turns token ids into vector representations

  • language_model2 (LanguageModel) – Any model that turns token ids into vector representations

  • prediction_heads (list) – A list of models that take 2 sequence embeddings and return logits for a given task

  • embeds_dropout_prob – The probability that a value in the embeddings returned by any of the 2 language model will be zeroed.

  • embeds_dropout_prob – float

  • lm1_output_types (list or str) – How to extract the embeddings from the final layer of the first language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.

  • lm2_output_types (list or str) – How to extract the embeddings from the final layer of the second language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.

  • device – The device on which this model will operate. Either “cpu” or “cuda”.

  • loss_aggregation_fn (function) – Function to aggregate the loss of multiple prediction heads. Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor) Default is a simple sum: lambda loss_per_head, global_step=None, batch=None: sum(tensors) However, you can pass more complex functions that depend on the current step (e.g. for round-robin style multitask learning) or the actual content of the batch (e.g. certain labels) Note: The loss at this stage is per sample, i.e one tensor of shape (batchsize) per prediction head.

save(save_dir, lm1_name='lm1', lm2_name='lm2')[source]

Saves the 2 language model weights and respective config_files in directories lm1 and lm2 within save_dir.

Parameters

save_dir (Path) – path to save to

classmethod load(load_dir, device, strict=False, lm1_name='lm1', lm2_name='lm2', processor=None)[source]

Loads a BiAdaptiveModel from a directory. The directory must contain:

  • directory “lm1_name” with following files:

    -> language_model.bin -> language_model_config.json

  • directory “lm2_name” with following files:

    -> language_model.bin -> language_model_config.json

  • prediction_head_X.bin multiple PH possible

  • prediction_head_X_config.json

  • processor_config.json config for transforming input

  • vocab.txt vocab file for language model, turning text to Wordpiece Token

  • special_tokens_map.json

Parameters
  • load_dir (Path) – location where adaptive model is stored

  • device (torch.device) – to which device we want to sent the model, either cpu or cuda

  • lm1_name (str) – the name to assign to the first loaded language model(for encoding queries)

  • lm2_name (str) – the name to assign to the second loaded language model(for encoding context/passages)

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

  • processor (Processor) – populates prediction head with information coming from tasks

logits_to_loss_per_head(logits, **kwargs)[source]

Collect losses from each prediction head.

Parameters

logits (object) – logits, can vary in shape and type, depending on task.

Returns

The per sample per prediciton head loss whose first two dimensions have length n_pred_heads, batch_size

logits_to_loss(logits, global_step=None, **kwargs)[source]

Get losses from all prediction heads & reduce to single loss per sample.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • global_step (int) – number of current training step

  • kwargs (object) – placeholder for passing generic parameters. Note: Contains the batch (as dict of tensors), when called from Trainer.train().

Return loss

torch.tensor that is the per sample loss (len: batch_size)

prepare_labels(**kwargs)[source]

Label conversion to original label space, per prediction head.

Parameters

label_maps (dict[int:str]) – dictionary for mapping ids to label strings

Returns

labels in the right format

forward(**kwargs)[source]

Push data through the whole model and returns logits. The data will propagate through the first language model and second language model based on the tensor names and both the encodings through each of the attached prediction heads.

Parameters

kwargs – Holds all arguments that need to be passed to both the language models and prediction head(s).

Returns

all logits as torch.tensor or multiple tensors.

forward_lm(**kwargs)[source]

Forward pass for the BiAdaptive model.

Parameters

kwargs

Returns

2 tensors of pooled_output from the 2 language models

log_params()[source]

Logs paramteres to generic logger MlLogger

verify_vocab_size(vocab_size1, vocab_size2)[source]

Verifies that the model fits to the tokenizer vocabulary. They could diverge in case of custom vocabulary added via tokenizer.add_tokens()

get_language()[source]
convert_to_transformers()[source]
classmethod convert_from_transformers(model_name_or_path1, model_name_or_path2, device, task_type, processor=None, similarity_function='dot_product')[source]
Load a (downstream) model from huggingface’s transformers format. Use cases:
  • continue training in FARM (e.g. take a squad QA model and fine-tune on your own data)

  • compare models without switching frameworks

  • use model directly for inference

Parameters
  • model_name_or_path1 – local path of a saved model or name of a public one for Question Encoder Exemplary public names: - facebook/dpr-question_encoder-single-nq-base - deepset/bert-large-uncased-whole-word-masking-squad2

  • model_name_or_path2 – local path of a saved model or name of a public one for Context/Passage Encoder Exemplary public names: - facebook/dpr-ctx_encoder-single-nq-base - deepset/bert-large-uncased-whole-word-masking-squad2

  • device – “cpu” or “cuda”

  • task_type – ‘text_similarity’ More tasks coming soon …

  • processor (Processor) – populates prediction head with information coming from tasks

Returns

AdaptiveModel

Language Model

Acknowledgements: Many of the modeling parts here come from the great transformers repository: https://github.com/huggingface/transformers. Thanks for the great work!

class farm.modeling.language_model.LanguageModel[source]

Bases: torch.nn.modules.module.Module

The parent class for any kind of model that can embed language into a semantic vector space. Practically speaking, these models read in tokenized sentences and return vectors that capture the meaning of sentences or of tokens.

subclasses = {'Albert': <class 'farm.modeling.language_model.Albert'>, 'Bert': <class 'farm.modeling.language_model.Bert'>, 'Camembert': <class 'farm.modeling.language_model.Camembert'>, 'DPRContextEncoder': <class 'farm.modeling.language_model.DPRContextEncoder'>, 'DPRQuestionEncoder': <class 'farm.modeling.language_model.DPRQuestionEncoder'>, 'DistilBert': <class 'farm.modeling.language_model.DistilBert'>, 'Electra': <class 'farm.modeling.language_model.Electra'>, 'Roberta': <class 'farm.modeling.language_model.Roberta'>, 'WordEmbedding_LM': <class 'farm.modeling.language_model.WordEmbedding_LM'>, 'XLMRoberta': <class 'farm.modeling.language_model.XLMRoberta'>, 'XLNet': <class 'farm.modeling.language_model.XLNet'>}
forward(input_ids, padding_mask, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_scratch(model_type, vocab_size)[source]
classmethod load(pretrained_model_name_or_path, revision=None, n_added_tokens=0, language_model_class=None, **kwargs)[source]

Load a pretrained language model either by

  1. specifying its name and downloading it

  2. or pointing to the directory it is saved in.

Available remote models:

  • bert-base-uncased

  • bert-large-uncased

  • bert-base-cased

  • bert-large-cased

  • bert-base-multilingual-uncased

  • bert-base-multilingual-cased

  • bert-base-chinese

  • bert-base-german-cased

  • roberta-base

  • roberta-large

  • xlnet-base-cased

  • xlnet-large-cased

  • xlm-roberta-base

  • xlm-roberta-large

  • albert-base-v2

  • albert-large-v2

  • distilbert-base-german-cased

  • distilbert-base-multilingual-cased

  • google/electra-small-discriminator

  • google/electra-base-discriminator

  • google/electra-large-discriminator

  • facebook/dpr-question_encoder-single-nq-base

  • facebook/dpr-ctx_encoder-single-nq-base

See all supported model variations here: https://huggingface.co/models

The appropriate language model class is inferred automatically from model config or can be manually supplied via language_model_class.

Parameters
  • pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

  • language_model_class (str) – (Optional) Name of the language model class to load (e.g. Bert)

static get_language_model_class(model_name_or_path)[source]
get_output_dims()[source]
freeze(layers)[source]

To be implemented

unfreeze()[source]

To be implemented

save_config(save_dir)[source]
save(save_dir)[source]

Save the model state_dict and its config file so that it can be loaded again.

Parameters

save_dir (str) – The directory in which the model should be saved.

formatted_preds(logits, samples, ignore_first_token=True, padding_mask=None, input_ids=None, **kwargs)[source]

Extracting vectors from language model (e.g. for extracting sentence embeddings). Different pooling strategies and layers are available and will be determined from the object attributes extraction_layer and extraction_strategy. Both should be set via the Inferencer: Example: Inferencer(extraction_strategy=’cls_token’, extraction_layer=-1)

Parameters
  • logits – Tuple of (sequence_output, pooled_output) from the language model. Sequence_output: one vector per token, pooled_output: one vector for whole sequence

  • samples – For each item in logits we need additional meta information to format the prediction (e.g. input text). This is created by the Processor and passed in here from the Inferencer.

  • ignore_first_token – Whether to include the first token for pooling operations (e.g. reduce_mean). Many models have here a special token like [CLS] that you don’t want to include into your average of token embeddings.

  • padding_mask – Mask for the padding tokens. Those will also not be included in the pooling operations to prevent a bias by the number of padding tokens.

  • input_ids – ids of the tokens in the vocab

  • kwargs – kwargs

Returns

list of dicts containing preds, e.g. [{“context”: “some text”, “vec”: [-0.01, 0.5 …]}]

class farm.modeling.language_model.Bert[source]

Bases: farm.modeling.language_model.LanguageModel

A BERT model that wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1810.04805

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod from_scratch(vocab_size, name='bert', language='en')[source]
classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a pretrained model by supplying

  • the name of a remote model on s3 (“bert-base-cased” …)

  • OR a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • OR a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters

pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the BERT model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.Albert[source]

Bases: farm.modeling.language_model.LanguageModel

An ALBERT model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“albert-base” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the Albert model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.Roberta[source]

Bases: farm.modeling.language_model.LanguageModel

A roberta model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1907.11692

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“roberta-base” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the Roberta model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.XLMRoberta[source]

Bases: farm.modeling.language_model.LanguageModel

A roberta model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1907.11692

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“xlm-roberta-base” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the XLMRoberta model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.DistilBert[source]

Bases: farm.modeling.language_model.LanguageModel

A DistilBERT model that wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.

NOTE: - DistilBert doesn’t have token_type_ids, you don’t need to indicate which token belongs to which segment. Just separate your segments with the separation token tokenizer.sep_token (or [SEP]) - Unlike the other BERT variants, DistilBert does not output the pooled_output. An additional pooler is initialized.

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a pretrained model by supplying

  • the name of a remote model on s3 (“distilbert-base-german-cased” …)

  • OR a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • OR a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters

pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.

forward(input_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the DistilBERT model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.XLNet[source]

Bases: farm.modeling.language_model.LanguageModel

A XLNet model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1906.08237

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“xlnet-base-cased” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the XLNet model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.EmbeddingConfig(name=None, embeddings_filename=None, vocab_filename=None, vocab_size=None, hidden_size=None, language=None, **kwargs)[source]

Bases: object

Config for Word Embeddings Models. Necessary to work with Bert and other LM style functionality

__init__(name=None, embeddings_filename=None, vocab_filename=None, vocab_size=None, hidden_size=None, language=None, **kwargs)[source]
Parameters
  • name – Name of config

  • embeddings_filename

  • vocab_filename

  • vocab_size

  • hidden_size

  • language

  • kwargs

to_dict()[source]

Serializes this instance to a Python dictionary.

Returns:

Dict[str, any]: Dictionary of all the attributes that make up this configuration instance,

to_json_string()[source]

Serializes this instance to a JSON string.

Returns:

string: String containing all the attributes that make up this configuration instance in JSON format.

class farm.modeling.language_model.EmbeddingModel(embedding_file, config_dict, vocab_file)[source]

Bases: object

Embedding Model that combines - Embeddings - Config Object - Vocab Necessary to work with Bert and other LM style functionality

__init__(embedding_file, config_dict, vocab_file)[source]
Parameters
  • embedding_file (str) – filename of embeddings. Usually in txt format, with the word and associated vector on each line

  • config_dict (dict) – dictionary containing config elements

  • vocab_file (str) – filename of vocab, each line contains a word

save(save_dir)[source]
resize_token_embeddings(new_num_tokens=None)[source]
class farm.modeling.language_model.WordEmbedding_LM[source]

Bases: farm.modeling.language_model.LanguageModel

A Language Model based only on word embeddings - Inside FARM, WordEmbedding Language Models must have a fixed vocabulary - Each (known) word in some text input is projected to its vector representation - Pooling operations can be applied for representing whole text sequences

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • a local path of a model trained via FARM (“some_dir/farm_model”)

  • the name of a remote model on s3

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

save(save_dir)[source]

Save the model embeddings and its config file so that it can be loaded again. # TODO make embeddings trainable and save trained embeddings # TODO save model weights as pytorch model bin for more efficient loading and saving :param save_dir: The directory in which the model should be saved. :type save_dir: str

forward(input_ids, **kwargs)[source]

Perform the forward pass of the wordembedding model. This is just the mapping of words to their corresponding embeddings

trim_vocab(token_counts, processor, min_threshold)[source]

Remove embeddings for rare tokens in your corpus (< min_threshold occurrences) to reduce model size

normalize_embeddings(zero_mean=True, pca_removal=False, pca_n_components=300, pca_n_top_components=10, use_mean_vec_for_special_tokens=True, n_special_tokens=5)[source]
Normalize word embeddings as in https://arxiv.org/pdf/1808.06305.pdf

(e.g. used for S3E Pooling of sentence embeddings)

Parameters
  • zero_mean (bool) – Whether to center embeddings via subtracting mean

  • pca_removal (bool) – Whether to remove PCA components

  • pca_n_components (int) – Number of PCA components to use for fitting

  • pca_n_top_components (int) – Number of PCA components to remove

  • use_mean_vec_for_special_tokens (bool) – Whether to replace embedding of special tokens with the mean embedding

  • n_special_tokens (int) – Number of special tokens like CLS, UNK etc. (used if use_mean_vec_for_special_tokens). Note: We expect the special tokens to be the first n_special_tokens entries of the vocab.

Returns

None

class farm.modeling.language_model.Electra[source]

Bases: farm.modeling.language_model.LanguageModel

ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The generator replaces tokens in a sequence, and is therefore trained as a masked language model. The discriminator, which is the model we’re interested in, tries to identify which tokens were replaced by the generator in the sequence.

The ELECTRA model here wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.

NOTE: - Electra does not output the pooled_output. An additional pooler is initialized.

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a pretrained model by supplying

  • the name of a remote model on s3 (“google/electra-base-discriminator” …)

  • OR a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • OR a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters

pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.

forward(input_ids, segment_ids, padding_mask, **kwargs)[source]

Perform the forward pass of the ELECTRA model.

Parameters
  • input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.Camembert[source]

Bases: farm.modeling.language_model.Roberta

A Camembert model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a language model either by supplying

  • the name of a remote model on s3 (“camembert-base” …)

  • or a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • or a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters
  • pretrained_model_name_or_path – name or path of a model

  • language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.

Returns

Language Model

class farm.modeling.language_model.DPRQuestionEncoder[source]

Bases: farm.modeling.language_model.LanguageModel

A DPRQuestionEncoder model that wraps HuggingFace’s implementation

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a pretrained model by supplying

  • the name of a remote model on s3 (“facebook/dpr-question_encoder-single-nq-base” …)

  • OR a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • OR a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters

pretrained_model_name_or_path (str) – The path of the base pretrained language model whose weights are used to initialize DPRQuestionEncoder

forward(query_input_ids, query_segment_ids, query_attention_mask, **kwargs)[source]

Perform the forward pass of the DPRQuestionEncoder model.

Parameters
  • query_input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]

  • query_segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]

  • query_attention_mask (torch.Tensor) – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]
class farm.modeling.language_model.DPRContextEncoder[source]

Bases: farm.modeling.language_model.LanguageModel

A DPRContextEncoder model that wraps HuggingFace’s implementation

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, language=None, **kwargs)[source]

Load a pretrained model by supplying

  • the name of a remote model on s3 (“facebook/dpr-ctx_encoder-single-nq-base” …)

  • OR a local path of a model trained via transformers (“some_dir/huggingface_model”)

  • OR a local path of a model trained via FARM (“some_dir/farm_model”)

Parameters

pretrained_model_name_or_path (str) – The path of the base pretrained language model whose weights are used to initialize DPRContextEncoder

forward(passage_input_ids, passage_segment_ids, passage_attention_mask, **kwargs)[source]

Perform the forward pass of the DPRContextEncoder model.

Parameters
  • passage_input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, number_of_hard_negative_passages, max_seq_len]

  • passage_segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, number_of_hard_negative_passages, max_seq_len]

  • passage_attention_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, number_of_hard_negative_passages, max_seq_len]

Returns

Embeddings for each token in the input sequence.

enable_hidden_states_output()[source]
disable_hidden_states_output()[source]

Prediction Head

class farm.modeling.prediction_head.PredictionHead[source]

Bases: torch.nn.modules.module.Module

Takes word embeddings from a language model and generates logits for a given task. Can also convert logits to loss and and logits to predictions.

classmethod create(prediction_head_name, layer_dims, class_weights=None)[source]

Create subclass of Prediction Head.

Parameters
  • prediction_head_name (str) – Classname (exact string!) of prediction head we want to create

  • layer_dims (List[Int]) – describing the feed forward block structure, e.g. [768,2]

  • class_weights (list[Float]) – The loss weighting to be assigned to certain label classes during training. Used to correct cases where there is a strong class imbalance.

Returns

Prediction Head of class prediction_head_name

save_config(save_dir, head_num=0)[source]

Saves the config as a json file.

Parameters
  • save_dir (str or Path) – Path to save config to

  • head_num (int) – Which head to save

save(save_dir, head_num=0)[source]

Saves the prediction head state dict.

Parameters
  • save_dir (str or Path) – path to save prediction head to

  • head_num (int) – which head to save

generate_config()[source]

Generates config file from Class parameters (only for sensible config parameters).

classmethod load(config_file, strict=True, load_weights=True)[source]

Loads a Prediction Head. Infers the class of prediction head from config_file.

Parameters
  • config_file (str) – location where corresponding config is stored

  • strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.

Returns

PredictionHead

Return type

PredictionHead[T]

logits_to_loss(logits, labels)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

resize_input(input_dim)[source]

This function compares the output dimensionality of the language model against the input dimensionality of the prediction head. If there is a mismatch, the prediction head will be resized to fit.

class farm.modeling.prediction_head.RegressionHead(layer_dims=[768, 1], task_name='regression', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims=[768, 1], task_name='regression', **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

formatted_preds(logits, samples, **kwargs)[source]
class farm.modeling.prediction_head.TextClassificationHead(layer_dims=None, num_labels=None, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims=None, num_labels=None, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]
Parameters
  • layer_dims (list) – The size of the layers in the feed forward component. The feed forward will have as many layers as there are ints in this list. This param will be deprecated in future

  • num_labels (int) – The numbers of labels. Use to set the size of the final layer in the feed forward component. It is recommended to only set num_labels or layer_dims, not both.

  • class_weights

  • loss_ignore_index

  • loss_reduction

  • task_name

  • kwargs

classmethod load(pretrained_model_name_or_path, revision=None)[source]

Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g. distilbert-base-uncased-distilled-squad)

Parameters
  • pretrained_model_name_or_path

    local path of a saved model or name of a publicly available model. Exemplary public name: - deepset/bert-base-german-cased-hatespeech-GermEval18Coarse

    See https://huggingface.co/models for full list

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_probs(logits, return_class_probs, **kwargs)[source]
logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

formatted_preds(logits=None, preds=None, samples=None, return_class_probs=False, **kwargs)[source]

Like QuestionAnsweringHead.formatted_preds(), this fn can operate on either logits or preds. This is needed since at inference, the order of operations is very different depending on whether we are performing aggregation or not (compare Inferencer._get_predictions() vs Inferencer._get_predictions_and_aggregate())

class farm.modeling.prediction_head.MultiLabelTextClassificationHead(layer_dims=None, num_labels=None, class_weights=None, loss_reduction='none', task_name='text_classification', pred_threshold=0.5, **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims=None, num_labels=None, class_weights=None, loss_reduction='none', task_name='text_classification', pred_threshold=0.5, **kwargs)[source]
Parameters
  • layer_dims (list) – The size of the layers in the feed forward component. The feed forward will have as many layers as there are ints in this list. This param will be deprecated in future

  • num_labels (int) – The numbers of labels. Use to set the size of the final layer in the feed forward component. It is recommended to only set num_labels or layer_dims, not both.

  • class_weights

  • loss_reduction

  • task_name

  • pred_threshold

  • kwargs

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_probs(logits, **kwargs)[source]
logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

formatted_preds(logits, samples, **kwargs)[source]
class farm.modeling.prediction_head.TokenClassificationHead(layer_dims=None, num_labels=None, task_name='ner', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(layer_dims=None, num_labels=None, task_name='ner', **kwargs)[source]
Parameters
  • layer_dims (list) – The size of the layers in the feed forward component. The feed forward will have as many layers as there are ints in this list. This param will be deprecated in future

  • num_labels (int) – The numbers of labels. Use to set the size of the final layer in the feed forward component. It is recommended to only set num_labels or layer_dims, not both.

  • task_name

  • kwargs

classmethod load(pretrained_model_name_or_path, revision=None)[source]

Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g.bert-base-cased-finetuned-conll03-english)

Parameters

pretrained_model_name_or_path

local path of a saved model or name of a publicly available model. Exemplary public names: - bert-base-cased-finetuned-conll03-english

See https://huggingface.co/models for full list

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, initial_mask, padding_mask=None, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits, initial_mask, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

logits_to_probs(logits, initial_mask, return_class_probs, **kwargs)[source]
prepare_labels(initial_mask, **kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

static initial_token_only(seq, initial_mask)[source]
formatted_preds(logits, initial_mask, samples, return_class_probs=False, **kwargs)[source]
class farm.modeling.prediction_head.BertLMHead(hidden_size, vocab_size, hidden_act='gelu', task_name='lm', **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

__init__(hidden_size, vocab_size, hidden_act='gelu', task_name='lm', **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod load(pretrained_model_name_or_path, revision=None, n_added_tokens=0)[source]

Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g.bert-base-cased)

Parameters
  • pretrained_model_name_or_path

    local path of a saved model or name of a publicly available model. Exemplary public names: - bert-base-cased

    See https://huggingface.co/models for full list

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

set_shared_weights(shared_embedding_weights)[source]
forward(hidden_states)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

logits_to_loss(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.

Parameters
  • logits (object) – logits, can vary in shape and type, depending on task

  • labels (object) – labels, can vary in shape and type, depending on task

Returns

per sample loss as a torch.tensor of shape [batch_size]

logits_to_preds(logits, **kwargs)[source]

Implement this function in your special Prediction Head. Should combine turn logits into predictions.

Parameters

logits (object) – logits, can vary in shape and type, depending on task

Returns

predictions as a torch.tensor of shape [batch_size]

prepare_labels(**kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

class farm.modeling.prediction_head.NextSentenceHead(layer_dims=None, num_labels=None, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]

Bases: farm.modeling.prediction_head.TextClassificationHead

Almost identical to a TextClassificationHead. Only difference: we can load the weights from

a pretrained language model that was saved in the Transformers style (all in one model).

classmethod load(pretrained_model_name_or_path)[source]

Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g.bert-base-cased)

Parameters

pretrained_model_name_or_path

local path of a saved model or name of a publicly available model. Exemplary public names: - bert-base-cased

See https://huggingface.co/models for full list

class farm.modeling.prediction_head.FeedForwardBlock(layer_dims, **kwargs)[source]

Bases: torch.nn.modules.module.Module

A feed forward neural network of variable depth and width.

__init__(layer_dims, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class farm.modeling.prediction_head.QuestionAnsweringHead(layer_dims=[768, 2], task_name='question_answering', no_ans_boost=0.0, context_window_size=100, n_best=5, n_best_per_sample=None, duplicate_filtering=-1, **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

A question answering head predicts the start and end of the answer on token level.

__init__(layer_dims=[768, 2], task_name='question_answering', no_ans_boost=0.0, context_window_size=100, n_best=5, n_best_per_sample=None, duplicate_filtering=-1, **kwargs)[source]
Parameters
  • layer_dims (List[Int]) – dimensions of Feed Forward block, e.g. [768,2], for adjusting to BERT embedding. Output should be always 2

  • kwargs (object) – placeholder for passing generic parameters

  • no_ans_boost (float) – How much the no_answer logit is boosted/increased. The higher the value, the more likely a “no answer possible given the input text” is returned by the model

  • context_window_size (int) – The size, in characters, of the window around the answer span that is used when displaying the context around the answer.

  • n_best (int) – The number of positive answer spans for each document.

  • n_best_per_sample (int) – num candidate answer spans to consider from each passage. Each passage also returns “no answer” info. This is decoupled from n_best on document level, since predictions on passage level are very similar. It should have a low value

  • duplicate_filtering (int) – Answers are filtered based on their position. Both start and end position of the answers are considered. The higher the value, answers that are more apart are filtered out. 0 corresponds to exact duplicates. -1 turns off duplicate removal.

classmethod load(pretrained_model_name_or_path, revision=None)[source]

Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g. distilbert-base-uncased-distilled-squad)

Parameters
  • pretrained_model_name_or_path

    local path of a saved model or name of a publicly available model. Exemplary public names: - distilbert-base-uncased-distilled-squad - bert-large-uncased-whole-word-masking-finetuned-squad

    See https://huggingface.co/models for full list

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

forward(X)[source]

One forward pass through the prediction head model, starting with language model output on token level

logits_to_loss(logits, labels, **kwargs)[source]

Combine predictions and labels to a per sample loss.

logits_to_preds(logits, span_mask, start_of_word, seq_2_start_t, max_answer_length=1000, **kwargs)[source]

Get the predicted index of start and end token of the answer. Note that the output is at token level and not word level. Note also that these logits correspond to the tokens of a sample (i.e. special tokens, question tokens, passage_tokens)

get_top_candidates(sorted_candidates, start_end_matrix, sample_idx)[source]

Returns top candidate answers as a list of Span objects. Operates on a matrix of summed start and end logits. This matrix corresponds to a single sample (includes special tokens, question tokens, passage tokens). This method always returns a list of len n_best + 1 (it is comprised of the n_best positive answers along with the one no_answer)

formatted_preds(logits=None, preds=None, baskets=None, **kwargs)[source]

Takes a list of passage level predictions, each corresponding to one sample, and converts them into document level predictions. Leverages information in the SampleBaskets. Assumes that we are being passed predictions from ALL samples in the one SampleBasket i.e. all passages of a document. Logits should be None, because we have already converted the logits to predictions before calling formatted_preds. (see Inferencer._get_predictions_and_aggregate()).

to_qa_preds(top_preds, no_ans_gaps, baskets)[source]

Groups Span objects together in a QAPred object

static get_ground_truth(basket)[source]
static get_question(question_names, raw_dict)[source]
has_no_answer_idxs(sample_top_n)[source]
aggregate_preds(preds, passage_start_t, ids, seq_2_start_t=None, labels=None)[source]

Aggregate passage level predictions to create document level predictions. This method assumes that all passages of each document are contained in preds i.e. that there are no incomplete documents. The output of this step are prediction spans. No answer is represented by a (-1, -1) span on the document level

static reduce_labels(labels)[source]

Removes repeat answers. Represents a no answer label as (-1,-1)

reduce_preds(preds)[source]

This function contains the logic for choosing the best answers from each passage. In the end, it returns the n_best predictions on the document level.

static deduplicate(flat_pos_answers)[source]
static get_no_answer_score(preds)[source]
static pred_to_doc_idxs(pred, passage_start_t)[source]

Converts the passage level predictions to document level predictions. Note that on the doc level we don’t have special tokens or question tokens. This means that a no answer cannot be prepresented by a (0,0) qa_answer but will instead be represented by (-1, -1)

static label_to_doc_idxs(label, passage_start_t)[source]

Converts the passage level labels to document level labels. Note that on the doc level we don’t have special tokens or question tokens. This means that a no answer cannot be prepresented by a (0,0) span but will instead be represented by (-1, -1)

prepare_labels(labels, start_of_word, **kwargs)[source]

Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.

Parameters

kwargs (object) – placeholder for passing generic parameters

Returns

labels in the right format

Return type

object

static merge_formatted_preds(preds_all)[source]

Merges results from the two prediction heads used for NQ style QA. Takes the prediction from QA head and assigns it the appropriate classification label. This mapping is achieved through passage_id. preds_all should contain [QuestionAnsweringHead.formatted_preds(), TextClassificationHead()]. The first item of this list should be of len=n_documents while the second item should be of len=n_passages

farm.modeling.prediction_head.pick_single_fn(heads, fn_name)[source]

Iterates over heads and returns a static method called fn_name if and only if one head has a method of that name. If no heads have such a method, None is returned. If more than one head has such a method, an Exception is thrown

class farm.modeling.prediction_head.TextSimilarityHead(similarity_function: str = 'dot_product', global_loss_buffer_size: int = 150000, **kwargs)[source]

Bases: farm.modeling.prediction_head.PredictionHead

Trains a head on predicting the similarity of two texts like in Dense Passage Retrieval.

__init__(similarity_function: str = 'dot_product', global_loss_buffer_size: int = 150000, **kwargs)[source]

Init the TextSimilarityHead.

Parameters
  • similarity_function – Function to calculate similarity between queries and passage embeddings. Choose either “dot_product” (Default) or “cosine”.

  • global_loss_buffer_size – Buffer size for all_gather() in DDP. Increase if errors like “encoded data exceeds max_size …” come up

  • kwargs

classmethod dot_product_scores(query_vectors, passage_vectors)[source]

Calculates dot product similarity scores for two 2-dimensional tensors

Parameters
  • query_vectors (torch.Tensor) – tensor of query embeddings from BiAdaptive model of dimension n1 x D, where n1 is the number of queries/batch size and D is embedding size

  • passage_vectors (torch.Tensor) – tensor of context/passage embeddings from BiAdaptive model of dimension n2 x D, where n2 is the number of queries/batch size and D is embedding size

Return dot_product

similarity score of each query with each context/passage (dimension: n1xn2)

classmethod cosine_scores(query_vectors, passage_vectors)[source]

Calculates cosine similarity scores for two 2-dimensional tensors

Parameters
  • query_vectors (torch.Tensor) – tensor of query embeddings from BiAdaptive model of dimension n1 x D, where n1 is the number of queries/batch size and D is embedding size

  • passage_vectors (torch.Tensor) – tensor of context/passage embeddings from BiAdaptive model of dimension n2 x D, where n2 is the number of queries/batch size and D is embedding size

Returns

cosine similarity score of each query with each context/passage (dimension: n1xn2)

get_similarity_function()[source]

Returns the type of similarity function used to compare queries and passages/contexts

forward(query_vectors: torch.Tensor, passage_vectors: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Only packs the embeddings from both language models into a tuple. No further modification. The similarity calculation is handled later to enable distributed training (DDP) while keeping the support for in-batch negatives. (Gather all embeddings from nodes => then do similarity scores + loss)

Parameters
  • query_vectors (torch.Tensor) – Tensor of query embeddings from BiAdaptive model of dimension n1 x D, where n1 is the number of queries/batch size and D is embedding size

  • passage_vectors (torch.Tensor) – Tensor of context/passage embeddings from BiAdaptive model of dimension n2 x D, where n2 is the number of queries/batch size and D is embedding size

Returns

(query_vectors, passage_vectors)

logits_to_loss(logits: Tuple[torch.Tensor, torch.Tensor], **kwargs)[source]

Computes the loss (Default: NLLLoss) by applying a similarity function (Default: dot product) to the input tuple of (query_vectors, passage_vectors) and afterwards applying the loss function on similarity scores.

Parameters

logits – Tuple of Tensors (query_embedding, passage_embedding) as returned from forward()

Returns

negative log likelihood loss from similarity scores

logits_to_preds(logits: Tuple[torch.Tensor, torch.Tensor], **kwargs)[source]

Returns predicted ranks(similarity) of passages/context for each query

Parameters

logits (torch.Tensor) – tensor of log softmax similarity scores of each query with each context/passage (dimension: n1xn2)

Returns

predicted ranks of passages for each query

prepare_labels(**kwargs)[source]

Returns a tensor with passage labels(0:hard_negative/1:positive) for each query

Returns

passage labels(0:hard_negative/1:positive) for each query

formatted_preds(logits: Tuple[torch.Tensor, torch.Tensor], **kwargs)[source]

Optimization

class farm.modeling.optimization.WrappedDataParallel(module, device_ids=None, output_device=None, dim=0)[source]

Bases: torch.nn.parallel.data_parallel.DataParallel

A way of adapting attributes of underlying class to parallel mode. See: https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html#dataparallel

Gets into recursion errors. Workaround see: https://discuss.pytorch.org/t/access-att-of-model-wrapped-within-torch-nn-dataparallel-maximum-recursion-depth-exceeded/46975

class farm.modeling.optimization.WrappedDDP(module, device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False, gradient_as_bucket_view=False)[source]

Bases: torch.nn.parallel.distributed.DistributedDataParallel

A way of adapting attributes of underlying class to distributed mode. Same as in WrappedDataParallel above. Even when using distributed on a single machine with multiple GPUs, apex can speed up training significantly. Distributed code must be launched with “python -m torch.distributed.launch –nproc_per_node=1 run_script.py”

farm.modeling.optimization.initialize_optimizer(model, n_batches, n_epochs, device, learning_rate, optimizer_opts=None, schedule_opts=None, distributed=False, grad_acc_steps=1, local_rank=-1, use_amp=None)[source]

Initializes an optimizer, a learning rate scheduler and converts the model if needed (e.g for mixed precision). Per default, we use transformers’ AdamW and a linear warmup schedule with warmup ratio 0.1. You can easily switch optimizer and schedule via optimizer_opts and schedule_opts.

Parameters
  • model (AdaptiveModel) – model to optimize (e.g. trimming weights to fp16 / mixed precision)

  • n_batches (int) – number of batches for training

  • n_epochs – number of epochs for training

  • device

  • learning_rate (float) – Learning rate

  • optimizer_opts – Dict to customize the optimizer. Choose any optimizer available from torch.optim, apex.optimizers or transformers.optimization by supplying the class name and the parameters for the constructor. Examples: 1) AdamW from Transformers (Default): {“name”: “TransformersAdamW”, “correct_bias”: False, “weight_decay”: 0.01} 2) SGD from pytorch: {“name”: “SGD”, “momentum”: 0.0} 3) FusedLAMB from apex: {“name”: “FusedLAMB”, “bias_correction”: True}

  • schedule_opts – Dict to customize the learning rate schedule. Choose any Schedule from Pytorch or Huggingface’s Transformers by supplying the class name and the parameters needed by the constructor. If the dict does not contain num_training_steps it will be set by calculating it from n_batches, grad_acc_steps and n_epochs. Examples: 1) Linear Warmup (Default): {“name”: “LinearWarmup”, “num_warmup_steps”: 0.1 * num_training_steps, “num_training_steps”: num_training_steps} 2) CosineWarmup: {“name”: “CosineWarmup”, “num_warmup_steps”: 0.1 * num_training_steps, “num_training_steps”: num_training_steps} 3) CyclicLR from pytorch: {“name”: “CyclicLR”, “base_lr”: 1e-5, “max_lr”:1e-4, “step_size_up”: 100}

  • distributed – Whether training on distributed machines

  • grad_acc_steps – Number of steps to accumulate gradients for. Helpful to mimic large batch_sizes on small machines.

  • local_rank – rank of the machine in a distributed setting

  • use_amp – Optimization level of nvidia’s automatic mixed precision (AMP). The higher the level, the faster the model. Options: “O0” (Normal FP32 training) “O1” (Mixed Precision => Recommended) “O2” (Almost FP16) “O3” (Pure FP16). See details on: https://nvidia.github.io/apex/amp.html

Returns

model, optimizer, scheduler

farm.modeling.optimization.get_scheduler(optimizer, opts)[source]

Get the scheduler based on dictionary with options. Options are passed to the scheduler constructor.

Parameters
  • optimizer – optimizer whose learning rate to control

  • opts – dictionary of args to be passed to constructor of schedule

Returns

created scheduler

farm.modeling.optimization.optimize_model(model, device, local_rank, optimizer=None, distributed=False, use_amp=None)[source]

Wraps MultiGPU or distributed usage around a model No support for ONNX models

Parameters
  • model (AdaptiveModel) – model to optimize (e.g. trimming weights to fp16 / mixed precision)

  • device – either gpu or cpu, get the device from initialize_device_settings()

  • distributed – Whether training on distributed machines

  • local_rank – rank of the machine in a distributed setting

  • use_amp – Optimization level of nvidia’s automatic mixed precision (AMP). The higher the level, the faster the model. Options: “O0” (Normal FP32 training) “O1” (Mixed Precision => Recommended) “O2” (Almost FP16) “O3” (Pure FP16). See details on: https://nvidia.github.io/apex/amp.html

Returns

model, optimizer

Tokenization

Tokenization classes.

class farm.modeling.tokenization.Tokenizer[source]

Bases: object

Simple Wrapper for Tokenizers from the transformers package. Enables loading of different Tokenizer classes with a uniform interface.

classmethod load(pretrained_model_name_or_path, revision=None, tokenizer_class=None, use_fast=True, **kwargs)[source]

Enables loading of different Tokenizer classes with a uniform interface. Either infer the class from model config or define it manually via tokenizer_class.

Parameters
  • pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name (e.g. bert-base-uncased)

  • revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.

  • tokenizer_class (str) – (Optional) Name of the tokenizer class to load (e.g. BertTokenizer)

  • use_fast (bool) – (Optional, False by default) Indicate if FARM should try to load the fast version of the tokenizer (True) or use the Python one (False). Only DistilBERT, BERT and Electra fast tokenizers are supported.

  • kwargs

Returns

Tokenizer

class farm.modeling.tokenization.EmbeddingTokenizer(vocab_file, do_lower_case=True, unk_token='[UNK]', sep_token='[SEP]', pad_token='[PAD]', cls_token='[CLS]', mask_token='[MASK]', **kwargs)[source]

Bases: transformers.tokenization_utils.PreTrainedTokenizer

Constructs an EmbeddingTokenizer.

__init__(vocab_file, do_lower_case=True, unk_token='[UNK]', sep_token='[SEP]', pad_token='[PAD]', cls_token='[CLS]', mask_token='[MASK]', **kwargs)[source]
Parameters
  • vocab_file (str) – Path to a one-word-per-line vocabulary file

  • do_lower_case (bool) – Flag whether to lower case the input

property vocab_size

int: Size of the base vocabulary (without the added tokens).

classmethod from_pretrained(pretrained_model_name_or_path, **kwargs)[source]

Load the tokenizer from local path or remote.

save_pretrained(vocab_path)[source]

Save the tokenizer vocabulary to a directory or file.

farm.modeling.tokenization.tokenize_with_metadata(text, tokenizer)[source]

Performing tokenization while storing some important metadata for each token:

  • offsets: (int) Character index where the token begins in the original text

  • start_of_word: (bool) If the token is the start of a word. Particularly helpful for NER and QA tasks.

We do this by first doing whitespace tokenization and then applying the model specific tokenizer to each “word”.

Note

We don’t assume to preserve exact whitespaces in the tokens! This means: tabs, new lines, multiple whitespace etc will all resolve to a single ” “. This doesn’t make a difference for BERT + XLNet but it does for RoBERTa. For RoBERTa it has the positive effect of a shorter sequence length, but some information about whitespace type is lost which might be helpful for certain NLP tasks ( e.g tab for tables).

Parameters
  • text (str) – Text to tokenize

  • tokenizer – Tokenizer (e.g. from Tokenizer.load())

Returns

Dictionary with “tokens”, “offsets” and “start_of_word”

Return type

dict

farm.modeling.tokenization.truncate_sequences(seq_a, seq_b, tokenizer, max_seq_len, truncation_strategy='longest_first', with_special_tokens=True, stride=0)[source]

Reduces a single sequence or a pair of sequences to a maximum sequence length. The sequences can contain tokens or any other elements (offsets, masks …). If with_special_tokens is enabled, it’ll remove some additional tokens to have exactly enough space for later adding special tokens (CLS, SEP etc.)

Supported truncation strategies:

  • longest_first: (default) Iteratively reduce the inputs sequence until the input is under max_length starting from the longest one at each token (when there is a pair of input sequences). Overflowing tokens only contains overflow from the first sequence.

  • only_first: Only truncate the first sequence. raise an error if the first sequence is shorter or equal to than num_tokens_to_remove.

  • only_second: Only truncate the second sequence

  • do_not_truncate: Does not truncate (raise an error if the input sequence is longer than max_length)

Parameters
  • seq_a (list) – First sequence of tokens/offsets/…

  • seq_b (None or list) – Optional second sequence of tokens/offsets/…

  • tokenizer – Tokenizer (e.g. from Tokenizer.load())

  • max_seq_len (int) –

  • truncation_strategy (str) – how the sequence(s) should be truncated down. Default: “longest_first” (see above for other options).

  • with_special_tokens (bool) – If true, it’ll remove some additional tokens to have exactly enough space for later adding special tokens (CLS, SEP etc.)

  • stride (int) – optional stride of the window during truncation

Returns

truncated seq_a, truncated seq_b, overflowing tokens

farm.modeling.tokenization.insert_at_special_tokens_pos(seq, special_tokens_mask, insert_element)[source]

Adds elements to a sequence at the positions that align with special tokens. This is useful for expanding label ids or masks, so that they align with corresponding tokens (incl. the special tokens)

Example:

# Tokens:  ["CLS", "some", "words","SEP"]
>>> special_tokens_mask =  [1,0,0,1]
>>> lm_label_ids =  [12,200]
>>> insert_at_special_tokens_pos(lm_label_ids, special_tokens_mask, insert_element=-1)
[-1, 12, 200, -1]
Parameters
  • seq (list) – List where you want to insert new elements

  • special_tokens_mask (list) – list with “1” for positions of special chars

  • insert_element – the value you want to insert

Returns

list

farm.modeling.tokenization.tokenize_batch_question_answering(pre_baskets, tokenizer, indices)[source]

Tokenizes text data for question answering tasks. Tokenization means splitting words into subwords, depending on the tokenizer’s vocabulary.

  • We first tokenize all documents in batch mode. (When using FastTokenizers Rust multithreading can be enabled by TODO add how to enable rust mt)

  • Then we tokenize each question individually

  • We construct dicts with question and corresponding document text + tokens + offsets + ids

Parameters
  • pre_baskets – input dicts with QA info #todo change to input objects

  • tokenizer – tokenizer to be used

  • indices – list, indices used during multiprocessing so that IDs assigned to our baskets are unique

Returns

baskets, list containing question and corresponding document information