Modeling¶
Adaptive Model¶
-
class
farm.modeling.adaptive_model.
BaseAdaptiveModel
(prediction_heads)[source]¶ Bases:
object
Base Class for implementing AdaptiveModel with frameworks like PyTorch and ONNX.
-
subclasses
= {'AdaptiveModel': <class 'farm.modeling.adaptive_model.AdaptiveModel'>, 'ONNXAdaptiveModel': <class 'farm.modeling.adaptive_model.ONNXAdaptiveModel'>, 'ONNXWrapper': <class 'farm.modeling.adaptive_model.ONNXWrapper'>}¶
-
classmethod
load
(**kwargs)[source]¶ Load corresponding AdaptiveModel Class(AdaptiveModel/ONNXAdaptiveModel) based on the files in the load_dir.
- Parameters
kwargs – arguments to pass for loading the model.
- Returns
instance of a model
-
logits_to_preds
(logits, **kwargs)[source]¶ Get predictions from all prediction heads.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
label_maps – Maps from label encoding to label string
label_maps – dict
- Returns
A list of all predictions from all prediction heads
-
formatted_preds
(logits, **kwargs)[source]¶ Format predictions for inference.
- Parameters
logits (torch.tensor) – model logits
kwargs (object) – placeholder for passing generic parameters
- Returns
predictions in the right format
-
connect_heads_with_processor
(tasks, require_labels=True)[source]¶ Populates prediction head with information coming from tasks.
- Parameters
tasks – A dictionary where the keys are the names of the tasks and the values are the details of the task (e.g. label_list, metric, tensor name)
require_labels – If True, an error will be thrown when a task is not supplied with labels)
- Returns
-
-
farm.modeling.adaptive_model.
loss_per_head_sum
(loss_per_head, global_step=None, batch=None)[source]¶ Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor)
-
class
farm.modeling.adaptive_model.
AdaptiveModel
(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device, loss_aggregation_fn=None)[source]¶ Bases:
torch.nn.modules.module.Module
,farm.modeling.adaptive_model.BaseAdaptiveModel
PyTorch implementation containing all the modelling needed for your NLP task. Combines a language model and a prediction head. Allows for gradient flow back to the language model component.
-
__init__
(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device, loss_aggregation_fn=None)[source]¶ - Parameters
language_model (LanguageModel) – Any model that turns token ids into vector representations
prediction_heads (list) – A list of models that take embeddings and return logits for a given task
embeds_dropout_prob – The probability that a value in the embeddings returned by the language model will be zeroed.
embeds_dropout_prob – float
lm_output_types (list or str) – How to extract the embeddings from the final layer of the language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.
device – The device on which this model will operate. Either “cpu” or “cuda”.
loss_aggregation_fn (function) – Function to aggregate the loss of multiple prediction heads. Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor) Default is a simple sum: lambda loss_per_head, global_step=None, batch=None: sum(tensors) However, you can pass more complex functions that depend on the current step (e.g. for round-robin style multitask learning) or the actual content of the batch (e.g. certain labels) Note: The loss at this stage is per sample, i.e one tensor of shape (batchsize) per prediction head.
-
fit_heads_to_lm
()[source]¶ This iterates over each prediction head and ensures that its input dimensionality matches the output dimensionality of the language model. If it doesn’t, it is resized so it does fit
-
bypass_ph
()[source]¶ Replaces methods in the prediction heads with dummy functions. Used for benchmarking where we want to isolate the lm run time from ph run time.
-
save
(save_dir)[source]¶ Saves the language model and prediction heads. This will generate a config file and model weights for each.
- Parameters
save_dir (Path) – path to save to
-
classmethod
load
(load_dir, device, strict=True, lm_name=None, processor=None)[source]¶ Loads an AdaptiveModel from a directory. The directory must contain:
language_model.bin
language_model_config.json
prediction_head_X.bin multiple PH possible
prediction_head_X_config.json
processor_config.json config for transforming input
vocab.txt vocab file for language model, turning text to Wordpiece Tokens
- Parameters
load_dir (Path) – location where adaptive model is stored
device (torch.device) – to which device we want to sent the model, either cpu or cuda
lm_name (str) – the name to assign to the loaded language model
strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.
processor (Processor) – populates prediction head with information coming from tasks
-
logits_to_loss_per_head
(logits, **kwargs)[source]¶ Collect losses from each prediction head.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task.
- Returns
The per sample per prediciton head loss whose first two dimensions have length n_pred_heads, batch_size
-
logits_to_loss
(logits, global_step=None, **kwargs)[source]¶ Get losses from all prediction heads & reduce to single loss per sample.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
global_step (int) – number of current training step
kwargs (object) – placeholder for passing generic parameters. Note: Contains the batch (as dict of tensors), when called from Trainer.train().
- Return loss
torch.tensor that is the per sample loss (len: batch_size)
-
prepare_labels
(**kwargs)[source]¶ Label conversion to original label space, per prediction head.
- Parameters
label_maps (dict[int:str]) – dictionary for mapping ids to label strings
- Returns
labels in the right format
-
forward
(**kwargs)[source]¶ Push data through the whole model and returns logits. The data will propagate through the language model and each of the attached prediction heads.
- Parameters
kwargs – Holds all arguments that need to be passed to the language model and prediction head(s).
- Returns
all logits as torch.tensor or multiple tensors.
-
verify_vocab_size
(vocab_size)[source]¶ Verifies that the model fits to the tokenizer vocabulary. They could diverge in case of custom vocabulary added via tokenizer.add_tokens()
-
convert_to_transformers
()[source]¶ Convert an adaptive model to huggingface’s transformers format. Returns a list containing one model for each prediction head.
- Returns
List of huggingface transformers models.
-
classmethod
convert_from_transformers
(model_name_or_path, device, revision=None, task_type=None, processor=None)[source]¶ - Load a (downstream) model from huggingface’s transformers format. Use cases:
continue training in FARM (e.g. take a squad QA model and fine-tune on your own data)
compare models without switching frameworks
use model directly for inference
- Parameters
model_name_or_path –
local path of a saved model or name of a public one. Exemplary public names: - distilbert-base-uncased-distilled-squad - deepset/bert-large-uncased-whole-word-masking-squad2
See https://huggingface.co/models for full list
revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
device – “cpu” or “cuda”
task_type – One of : - ‘question_answering’ - ‘text_classification’ - ‘embeddings’ More tasks coming soon …
processor (Processor) – populates prediction head with information coming from tasks
- Returns
AdaptiveModel
-
classmethod
convert_to_onnx
(model_name, output_path, task_type, convert_to_float16=False, quantize=False, opset_version=11)[source]¶ Convert a PyTorch model from transformers hub to an ONNX Model.
- Parameters
model_name (str) – transformers model name
output_path (Path) – output Path to write the converted to
task_type – Type of task for the model. Available options: “embeddings”, “question_answering”, “text_classification”, “ner”.
convert_to_float16 (bool) – By default, the model use float32 precision. With half precision of flaot16, inference should be faster on Nvidia GPUs with Tensor core like T4 or V100. On older GPUs, float32 might be more performant.
quantize (bool) – convert floating point number to integers
opset_version (int) – ONNX opset version
- Returns
-
-
class
farm.modeling.adaptive_model.
ONNXAdaptiveModel
(onnx_session, language_model_class, language, prediction_heads, device)[source]¶ Bases:
farm.modeling.adaptive_model.BaseAdaptiveModel
Implementation of ONNX Runtime for Inference of ONNX Models.
Existing PyTorch based FARM AdaptiveModel can be converted to ONNX format using AdaptiveModel.convert_to_onnx(). The conversion is currently only implemented for Question Answering Models.
For inference, this class is compatible with the FARM Inferencer.
-
__init__
(onnx_session, language_model_class, language, prediction_heads, device)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
classmethod
load
(load_dir, device, **kwargs)[source]¶ Load corresponding AdaptiveModel Class(AdaptiveModel/ONNXAdaptiveModel) based on the files in the load_dir.
- Parameters
kwargs – arguments to pass for loading the model.
- Returns
instance of a model
-
-
class
farm.modeling.adaptive_model.
ONNXWrapper
(language_model, prediction_heads, embeds_dropout_prob, lm_output_types, device, loss_aggregation_fn=None)[source]¶ Bases:
farm.modeling.adaptive_model.AdaptiveModel
Wrapper Class for converting PyTorch models to ONNX.
As of torch v1.4.0, torch.onnx.export only support passing positional arguments to the forward pass of the model. However, the AdaptiveModel’s forward takes keyword arguments. This class circumvents the issue by converting positional arguments to keyword arguments.
-
forward
(*batch)[source]¶ Push data through the whole model and returns logits. The data will propagate through the language model and each of the attached prediction heads.
- Parameters
kwargs – Holds all arguments that need to be passed to the language model and prediction head(s).
- Returns
all logits as torch.tensor or multiple tensors.
-
BiAdaptive Model¶
-
class
farm.modeling.biadaptive_model.
BaseBiAdaptiveModel
(prediction_heads)[source]¶ Bases:
object
Base Class for implementing AdaptiveModel with frameworks like PyTorch and ONNX.
-
subclasses
= {'BiAdaptiveModel': <class 'farm.modeling.biadaptive_model.BiAdaptiveModel'>}¶
-
classmethod
load
(**kwargs)[source]¶ Load corresponding AdaptiveModel Class(AdaptiveModel/ONNXAdaptiveModel) based on the files in the load_dir.
- Parameters
kwargs – arguments to pass for loading the model.
- Returns
instance of a model
-
logits_to_preds
(logits, **kwargs)[source]¶ Get predictions from all prediction heads.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
label_maps – Maps from label encoding to label string
label_maps – dict
- Returns
A list of all predictions from all prediction heads
-
formatted_preds
(logits, language_model1, language_model2, **kwargs)[source]¶ Format predictions to strings for inference output
- Parameters
logits (torch.tensor) – model logits
kwargs (object) – placeholder for passing generic parameters
- Returns
predictions in the right format
-
connect_heads_with_processor
(tasks, require_labels=True)[source]¶ Populates prediction head with information coming from tasks.
- Parameters
tasks – A dictionary where the keys are the names of the tasks and the values are the details of the task (e.g. label_list, metric, tensor name)
require_labels – If True, an error will be thrown when a task is not supplied with labels)
- Returns
-
-
farm.modeling.biadaptive_model.
loss_per_head_sum
(loss_per_head, global_step=None, batch=None)[source]¶ Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor)
-
class
farm.modeling.biadaptive_model.
BiAdaptiveModel
(language_model1, language_model2, prediction_heads, embeds_dropout_prob=0.1, device='cuda', lm1_output_types=['per_sequence'], lm2_output_types=['per_sequence'], loss_aggregation_fn=None)[source]¶ Bases:
torch.nn.modules.module.Module
,farm.modeling.biadaptive_model.BaseBiAdaptiveModel
PyTorch implementation containing all the modelling needed for your NLP task. Combines 2 language models for representation of 2 sequences and a prediction head. Allows for gradient flow back to the 2 language model components.
-
__init__
(language_model1, language_model2, prediction_heads, embeds_dropout_prob=0.1, device='cuda', lm1_output_types=['per_sequence'], lm2_output_types=['per_sequence'], loss_aggregation_fn=None)[source]¶ - Parameters
language_model1 (LanguageModel) – Any model that turns token ids into vector representations
language_model2 (LanguageModel) – Any model that turns token ids into vector representations
prediction_heads (list) – A list of models that take 2 sequence embeddings and return logits for a given task
embeds_dropout_prob – The probability that a value in the embeddings returned by any of the 2 language model will be zeroed.
embeds_dropout_prob – float
lm1_output_types (list or str) – How to extract the embeddings from the final layer of the first language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.
lm2_output_types (list or str) – How to extract the embeddings from the final layer of the second language model. When set to “per_token”, one embedding will be extracted per input token. If set to “per_sequence”, a single embedding will be extracted to represent the full input sequence. Can either be a single string, or a list of strings, one for each prediction head.
device – The device on which this model will operate. Either “cpu” or “cuda”.
loss_aggregation_fn (function) – Function to aggregate the loss of multiple prediction heads. Input: loss_per_head (list of tensors), global_step (int), batch (dict) Output: aggregated loss (tensor) Default is a simple sum: lambda loss_per_head, global_step=None, batch=None: sum(tensors) However, you can pass more complex functions that depend on the current step (e.g. for round-robin style multitask learning) or the actual content of the batch (e.g. certain labels) Note: The loss at this stage is per sample, i.e one tensor of shape (batchsize) per prediction head.
-
save
(save_dir, lm1_name='lm1', lm2_name='lm2')[source]¶ Saves the 2 language model weights and respective config_files in directories lm1 and lm2 within save_dir.
- Parameters
save_dir (Path) – path to save to
-
classmethod
load
(load_dir, device, strict=False, lm1_name='lm1', lm2_name='lm2', processor=None)[source]¶ Loads a BiAdaptiveModel from a directory. The directory must contain:
- directory “lm1_name” with following files:
-> language_model.bin -> language_model_config.json
- directory “lm2_name” with following files:
-> language_model.bin -> language_model_config.json
prediction_head_X.bin multiple PH possible
prediction_head_X_config.json
processor_config.json config for transforming input
vocab.txt vocab file for language model, turning text to Wordpiece Token
special_tokens_map.json
- Parameters
load_dir (Path) – location where adaptive model is stored
device (torch.device) – to which device we want to sent the model, either cpu or cuda
lm1_name (str) – the name to assign to the first loaded language model(for encoding queries)
lm2_name (str) – the name to assign to the second loaded language model(for encoding context/passages)
strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.
processor (Processor) – populates prediction head with information coming from tasks
-
logits_to_loss_per_head
(logits, **kwargs)[source]¶ Collect losses from each prediction head.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task.
- Returns
The per sample per prediciton head loss whose first two dimensions have length n_pred_heads, batch_size
-
logits_to_loss
(logits, global_step=None, **kwargs)[source]¶ Get losses from all prediction heads & reduce to single loss per sample.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
global_step (int) – number of current training step
kwargs (object) – placeholder for passing generic parameters. Note: Contains the batch (as dict of tensors), when called from Trainer.train().
- Return loss
torch.tensor that is the per sample loss (len: batch_size)
-
prepare_labels
(**kwargs)[source]¶ Label conversion to original label space, per prediction head.
- Parameters
label_maps (dict[int:str]) – dictionary for mapping ids to label strings
- Returns
labels in the right format
-
forward
(**kwargs)[source]¶ Push data through the whole model and returns logits. The data will propagate through the first language model and second language model based on the tensor names and both the encodings through each of the attached prediction heads.
- Parameters
kwargs – Holds all arguments that need to be passed to both the language models and prediction head(s).
- Returns
all logits as torch.tensor or multiple tensors.
-
forward_lm
(**kwargs)[source]¶ Forward pass for the BiAdaptive model.
- Parameters
kwargs –
- Returns
2 tensors of pooled_output from the 2 language models
-
verify_vocab_size
(vocab_size1, vocab_size2)[source]¶ Verifies that the model fits to the tokenizer vocabulary. They could diverge in case of custom vocabulary added via tokenizer.add_tokens()
-
classmethod
convert_from_transformers
(model_name_or_path1, model_name_or_path2, device, task_type, processor=None, similarity_function='dot_product')[source]¶ - Load a (downstream) model from huggingface’s transformers format. Use cases:
continue training in FARM (e.g. take a squad QA model and fine-tune on your own data)
compare models without switching frameworks
use model directly for inference
- Parameters
model_name_or_path1 – local path of a saved model or name of a public one for Question Encoder Exemplary public names: - facebook/dpr-question_encoder-single-nq-base - deepset/bert-large-uncased-whole-word-masking-squad2
model_name_or_path2 – local path of a saved model or name of a public one for Context/Passage Encoder Exemplary public names: - facebook/dpr-ctx_encoder-single-nq-base - deepset/bert-large-uncased-whole-word-masking-squad2
device – “cpu” or “cuda”
task_type – ‘text_similarity’ More tasks coming soon …
processor (Processor) – populates prediction head with information coming from tasks
- Returns
AdaptiveModel
-
Language Model¶
Acknowledgements: Many of the modeling parts here come from the great transformers repository: https://github.com/huggingface/transformers. Thanks for the great work!
-
class
farm.modeling.language_model.
LanguageModel
[source]¶ Bases:
torch.nn.modules.module.Module
The parent class for any kind of model that can embed language into a semantic vector space. Practically speaking, these models read in tokenized sentences and return vectors that capture the meaning of sentences or of tokens.
-
subclasses
= {'Albert': <class 'farm.modeling.language_model.Albert'>, 'Bert': <class 'farm.modeling.language_model.Bert'>, 'Camembert': <class 'farm.modeling.language_model.Camembert'>, 'DPRContextEncoder': <class 'farm.modeling.language_model.DPRContextEncoder'>, 'DPRQuestionEncoder': <class 'farm.modeling.language_model.DPRQuestionEncoder'>, 'DistilBert': <class 'farm.modeling.language_model.DistilBert'>, 'Electra': <class 'farm.modeling.language_model.Electra'>, 'Roberta': <class 'farm.modeling.language_model.Roberta'>, 'WordEmbedding_LM': <class 'farm.modeling.language_model.WordEmbedding_LM'>, 'XLMRoberta': <class 'farm.modeling.language_model.XLMRoberta'>, 'XLNet': <class 'farm.modeling.language_model.XLNet'>}¶
-
forward
(input_ids, padding_mask, **kwargs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
load
(pretrained_model_name_or_path, revision=None, n_added_tokens=0, language_model_class=None, **kwargs)[source]¶ Load a pretrained language model either by
specifying its name and downloading it
or pointing to the directory it is saved in.
Available remote models:
bert-base-uncased
bert-large-uncased
bert-base-cased
bert-large-cased
bert-base-multilingual-uncased
bert-base-multilingual-cased
bert-base-chinese
bert-base-german-cased
roberta-base
roberta-large
xlnet-base-cased
xlnet-large-cased
xlm-roberta-base
xlm-roberta-large
albert-base-v2
albert-large-v2
distilbert-base-german-cased
distilbert-base-multilingual-cased
google/electra-small-discriminator
google/electra-base-discriminator
google/electra-large-discriminator
facebook/dpr-question_encoder-single-nq-base
facebook/dpr-ctx_encoder-single-nq-base
See all supported model variations here: https://huggingface.co/models
The appropriate language model class is inferred automatically from model config or can be manually supplied via language_model_class.
- Parameters
pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.
revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
language_model_class (str) – (Optional) Name of the language model class to load (e.g. Bert)
-
save
(save_dir)[source]¶ Save the model state_dict and its config file so that it can be loaded again.
- Parameters
save_dir (str) – The directory in which the model should be saved.
-
formatted_preds
(logits, samples, ignore_first_token=True, padding_mask=None, input_ids=None, **kwargs)[source]¶ Extracting vectors from language model (e.g. for extracting sentence embeddings). Different pooling strategies and layers are available and will be determined from the object attributes extraction_layer and extraction_strategy. Both should be set via the Inferencer: Example: Inferencer(extraction_strategy=’cls_token’, extraction_layer=-1)
- Parameters
logits – Tuple of (sequence_output, pooled_output) from the language model. Sequence_output: one vector per token, pooled_output: one vector for whole sequence
samples – For each item in logits we need additional meta information to format the prediction (e.g. input text). This is created by the Processor and passed in here from the Inferencer.
ignore_first_token – Whether to include the first token for pooling operations (e.g. reduce_mean). Many models have here a special token like [CLS] that you don’t want to include into your average of token embeddings.
padding_mask – Mask for the padding tokens. Those will also not be included in the pooling operations to prevent a bias by the number of padding tokens.
input_ids – ids of the tokens in the vocab
kwargs – kwargs
- Returns
list of dicts containing preds, e.g. [{“context”: “some text”, “vec”: [-0.01, 0.5 …]}]
-
-
class
farm.modeling.language_model.
Bert
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A BERT model that wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1810.04805
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a pretrained model by supplying
the name of a remote model on s3 (“bert-base-cased” …)
OR a local path of a model trained via transformers (“some_dir/huggingface_model”)
OR a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.
-
forward
(input_ids, segment_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the BERT model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
Albert
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
An ALBERT model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a language model either by supplying
the name of a remote model on s3 (“albert-base” …)
or a local path of a model trained via transformers (“some_dir/huggingface_model”)
or a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path – name or path of a model
language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.
- Returns
Language Model
-
forward
(input_ids, segment_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the Albert model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
Roberta
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A roberta model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1907.11692
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a language model either by supplying
the name of a remote model on s3 (“roberta-base” …)
or a local path of a model trained via transformers (“some_dir/huggingface_model”)
or a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path – name or path of a model
language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.
- Returns
Language Model
-
forward
(input_ids, segment_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the Roberta model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
XLMRoberta
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A roberta model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1907.11692
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a language model either by supplying
the name of a remote model on s3 (“xlm-roberta-base” …)
or a local path of a model trained via transformers (“some_dir/huggingface_model”)
or a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path – name or path of a model
language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.
- Returns
Language Model
-
forward
(input_ids, segment_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the XLMRoberta model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
DistilBert
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A DistilBERT model that wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.
NOTE: - DistilBert doesn’t have token_type_ids, you don’t need to indicate which token belongs to which segment. Just separate your segments with the separation token tokenizer.sep_token (or [SEP]) - Unlike the other BERT variants, DistilBert does not output the pooled_output. An additional pooler is initialized.
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a pretrained model by supplying
the name of a remote model on s3 (“distilbert-base-german-cased” …)
OR a local path of a model trained via transformers (“some_dir/huggingface_model”)
OR a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.
-
forward
(input_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the DistilBERT model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
XLNet
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A XLNet model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class. Paper: https://arxiv.org/abs/1906.08237
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a language model either by supplying
the name of a remote model on s3 (“xlnet-base-cased” …)
or a local path of a model trained via transformers (“some_dir/huggingface_model”)
or a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path – name or path of a model
language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.
- Returns
Language Model
-
forward
(input_ids, segment_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the XLNet model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
EmbeddingConfig
(name=None, embeddings_filename=None, vocab_filename=None, vocab_size=None, hidden_size=None, language=None, **kwargs)[source]¶ Bases:
object
Config for Word Embeddings Models. Necessary to work with Bert and other LM style functionality
-
__init__
(name=None, embeddings_filename=None, vocab_filename=None, vocab_size=None, hidden_size=None, language=None, **kwargs)[source]¶ - Parameters
name – Name of config
embeddings_filename –
vocab_filename –
vocab_size –
hidden_size –
language –
kwargs –
-
-
class
farm.modeling.language_model.
EmbeddingModel
(embedding_file, config_dict, vocab_file)[source]¶ Bases:
object
Embedding Model that combines - Embeddings - Config Object - Vocab Necessary to work with Bert and other LM style functionality
-
__init__
(embedding_file, config_dict, vocab_file)[source]¶ - Parameters
embedding_file (str) – filename of embeddings. Usually in txt format, with the word and associated vector on each line
config_dict (dict) – dictionary containing config elements
vocab_file (str) – filename of vocab, each line contains a word
-
-
class
farm.modeling.language_model.
WordEmbedding_LM
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A Language Model based only on word embeddings - Inside FARM, WordEmbedding Language Models must have a fixed vocabulary - Each (known) word in some text input is projected to its vector representation - Pooling operations can be applied for representing whole text sequences
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a language model either by supplying
a local path of a model trained via FARM (“some_dir/farm_model”)
the name of a remote model on s3
- Parameters
pretrained_model_name_or_path – name or path of a model
language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.
- Returns
Language Model
-
save
(save_dir)[source]¶ Save the model embeddings and its config file so that it can be loaded again. # TODO make embeddings trainable and save trained embeddings # TODO save model weights as pytorch model bin for more efficient loading and saving :param save_dir: The directory in which the model should be saved. :type save_dir: str
-
forward
(input_ids, **kwargs)[source]¶ Perform the forward pass of the wordembedding model. This is just the mapping of words to their corresponding embeddings
-
trim_vocab
(token_counts, processor, min_threshold)[source]¶ Remove embeddings for rare tokens in your corpus (< min_threshold occurrences) to reduce model size
-
normalize_embeddings
(zero_mean=True, pca_removal=False, pca_n_components=300, pca_n_top_components=10, use_mean_vec_for_special_tokens=True, n_special_tokens=5)[source]¶ - Normalize word embeddings as in https://arxiv.org/pdf/1808.06305.pdf
(e.g. used for S3E Pooling of sentence embeddings)
- Parameters
zero_mean (bool) – Whether to center embeddings via subtracting mean
pca_removal (bool) – Whether to remove PCA components
pca_n_components (int) – Number of PCA components to use for fitting
pca_n_top_components (int) – Number of PCA components to remove
use_mean_vec_for_special_tokens (bool) – Whether to replace embedding of special tokens with the mean embedding
n_special_tokens (int) – Number of special tokens like CLS, UNK etc. (used if use_mean_vec_for_special_tokens). Note: We expect the special tokens to be the first n_special_tokens entries of the vocab.
- Returns
None
-
classmethod
-
class
farm.modeling.language_model.
Electra
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The generator replaces tokens in a sequence, and is therefore trained as a masked language model. The discriminator, which is the model we’re interested in, tries to identify which tokens were replaced by the generator in the sequence.
The ELECTRA model here wraps HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.
NOTE: - Electra does not output the pooled_output. An additional pooler is initialized.
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a pretrained model by supplying
the name of a remote model on s3 (“google/electra-base-discriminator” …)
OR a local path of a model trained via transformers (“some_dir/huggingface_model”)
OR a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name.
-
forward
(input_ids, segment_ids, padding_mask, **kwargs)[source]¶ Perform the forward pass of the ELECTRA model.
- Parameters
input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
padding_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
Camembert
[source]¶ Bases:
farm.modeling.language_model.Roberta
A Camembert model that wraps the HuggingFace’s implementation (https://github.com/huggingface/transformers) to fit the LanguageModel class.
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a language model either by supplying
the name of a remote model on s3 (“camembert-base” …)
or a local path of a model trained via transformers (“some_dir/huggingface_model”)
or a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path – name or path of a model
language – (Optional) Name of language the model was trained for (e.g. “german”). If not supplied, FARM will try to infer it from the model name.
- Returns
Language Model
-
classmethod
-
class
farm.modeling.language_model.
DPRQuestionEncoder
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A DPRQuestionEncoder model that wraps HuggingFace’s implementation
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a pretrained model by supplying
the name of a remote model on s3 (“facebook/dpr-question_encoder-single-nq-base” …)
OR a local path of a model trained via transformers (“some_dir/huggingface_model”)
OR a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path (str) – The path of the base pretrained language model whose weights are used to initialize DPRQuestionEncoder
-
forward
(query_input_ids, query_segment_ids, query_attention_mask, **kwargs)[source]¶ Perform the forward pass of the DPRQuestionEncoder model.
- Parameters
query_input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, max_seq_len]
query_segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, max_seq_len]
query_attention_mask (torch.Tensor) – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
-
class
farm.modeling.language_model.
DPRContextEncoder
[source]¶ Bases:
farm.modeling.language_model.LanguageModel
A DPRContextEncoder model that wraps HuggingFace’s implementation
-
classmethod
load
(pretrained_model_name_or_path, language=None, **kwargs)[source]¶ Load a pretrained model by supplying
the name of a remote model on s3 (“facebook/dpr-ctx_encoder-single-nq-base” …)
OR a local path of a model trained via transformers (“some_dir/huggingface_model”)
OR a local path of a model trained via FARM (“some_dir/farm_model”)
- Parameters
pretrained_model_name_or_path (str) – The path of the base pretrained language model whose weights are used to initialize DPRContextEncoder
-
forward
(passage_input_ids, passage_segment_ids, passage_attention_mask, **kwargs)[source]¶ Perform the forward pass of the DPRContextEncoder model.
- Parameters
passage_input_ids (torch.Tensor) – The ids of each token in the input sequence. Is a tensor of shape [batch_size, number_of_hard_negative_passages, max_seq_len]
passage_segment_ids (torch.Tensor) – The id of the segment. For example, in next sentence prediction, the tokens in the first sentence are marked with 0 and those in the second are marked with 1. It is a tensor of shape [batch_size, number_of_hard_negative_passages, max_seq_len]
passage_attention_mask – A mask that assigns a 1 to valid input tokens and 0 to padding tokens of shape [batch_size, number_of_hard_negative_passages, max_seq_len]
- Returns
Embeddings for each token in the input sequence.
-
classmethod
Prediction Head¶
-
class
farm.modeling.prediction_head.
PredictionHead
[source]¶ Bases:
torch.nn.modules.module.Module
Takes word embeddings from a language model and generates logits for a given task. Can also convert logits to loss and and logits to predictions.
-
classmethod
create
(prediction_head_name, layer_dims, class_weights=None)[source]¶ Create subclass of Prediction Head.
- Parameters
prediction_head_name (str) – Classname (exact string!) of prediction head we want to create
layer_dims (List[Int]) – describing the feed forward block structure, e.g. [768,2]
class_weights (list[Float]) – The loss weighting to be assigned to certain label classes during training. Used to correct cases where there is a strong class imbalance.
- Returns
Prediction Head of class prediction_head_name
-
save_config
(save_dir, head_num=0)[source]¶ Saves the config as a json file.
- Parameters
save_dir (str or Path) – Path to save config to
head_num (int) – Which head to save
-
save
(save_dir, head_num=0)[source]¶ Saves the prediction head state dict.
- Parameters
save_dir (str or Path) – path to save prediction head to
head_num (int) – which head to save
-
generate_config
()[source]¶ Generates config file from Class parameters (only for sensible config parameters).
-
classmethod
load
(config_file, strict=True, load_weights=True)[source]¶ Loads a Prediction Head. Infers the class of prediction head from config_file.
- Parameters
config_file (str) – location where corresponding config is stored
strict (bool) – whether to strictly enforce that the keys loaded from saved model match the ones in the PredictionHead (see torch.nn.module.load_state_dict()). Set to False for backwards compatibility with PHs saved with older version of FARM.
- Returns
PredictionHead
- Return type
-
logits_to_loss
(logits, labels)[source]¶ Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
labels (object) – labels, can vary in shape and type, depending on task
- Returns
per sample loss as a torch.tensor of shape [batch_size]
-
logits_to_preds
(logits)[source]¶ Implement this function in your special Prediction Head. Should combine turn logits into predictions.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
- Returns
predictions as a torch.tensor of shape [batch_size]
-
classmethod
-
class
farm.modeling.prediction_head.
RegressionHead
(layer_dims=[768, 1], task_name='regression', **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
-
__init__
(layer_dims=[768, 1], task_name='regression', **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
logits_to_loss
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
labels (object) – labels, can vary in shape and type, depending on task
- Returns
per sample loss as a torch.tensor of shape [batch_size]
-
logits_to_preds
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine turn logits into predictions.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
- Returns
predictions as a torch.tensor of shape [batch_size]
-
-
class
farm.modeling.prediction_head.
TextClassificationHead
(layer_dims=None, num_labels=None, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
-
__init__
(layer_dims=None, num_labels=None, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]¶ - Parameters
layer_dims (list) – The size of the layers in the feed forward component. The feed forward will have as many layers as there are ints in this list. This param will be deprecated in future
num_labels (int) – The numbers of labels. Use to set the size of the final layer in the feed forward component. It is recommended to only set num_labels or layer_dims, not both.
class_weights –
loss_ignore_index –
loss_reduction –
task_name –
kwargs –
-
classmethod
load
(pretrained_model_name_or_path, revision=None)[source]¶ Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g. distilbert-base-uncased-distilled-squad)
- Parameters
pretrained_model_name_or_path –
local path of a saved model or name of a publicly available model. Exemplary public name: - deepset/bert-base-german-cased-hatespeech-GermEval18Coarse
See https://huggingface.co/models for full list
revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
-
forward
(X)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
logits_to_loss
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
labels (object) – labels, can vary in shape and type, depending on task
- Returns
per sample loss as a torch.tensor of shape [batch_size]
-
logits_to_preds
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine turn logits into predictions.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
- Returns
predictions as a torch.tensor of shape [batch_size]
-
prepare_labels
(**kwargs)[source]¶ Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.
- Parameters
kwargs (object) – placeholder for passing generic parameters
- Returns
labels in the right format
- Return type
object
-
formatted_preds
(logits=None, preds=None, samples=None, return_class_probs=False, **kwargs)[source]¶ Like QuestionAnsweringHead.formatted_preds(), this fn can operate on either logits or preds. This is needed since at inference, the order of operations is very different depending on whether we are performing aggregation or not (compare Inferencer._get_predictions() vs Inferencer._get_predictions_and_aggregate())
-
-
class
farm.modeling.prediction_head.
MultiLabelTextClassificationHead
(layer_dims=None, num_labels=None, class_weights=None, loss_reduction='none', task_name='text_classification', pred_threshold=0.5, **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
-
__init__
(layer_dims=None, num_labels=None, class_weights=None, loss_reduction='none', task_name='text_classification', pred_threshold=0.5, **kwargs)[source]¶ - Parameters
layer_dims (list) – The size of the layers in the feed forward component. The feed forward will have as many layers as there are ints in this list. This param will be deprecated in future
num_labels (int) – The numbers of labels. Use to set the size of the final layer in the feed forward component. It is recommended to only set num_labels or layer_dims, not both.
class_weights –
loss_reduction –
task_name –
pred_threshold –
kwargs –
-
forward
(X)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
logits_to_loss
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
labels (object) – labels, can vary in shape and type, depending on task
- Returns
per sample loss as a torch.tensor of shape [batch_size]
-
logits_to_preds
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine turn logits into predictions.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
- Returns
predictions as a torch.tensor of shape [batch_size]
-
-
class
farm.modeling.prediction_head.
TokenClassificationHead
(layer_dims=None, num_labels=None, task_name='ner', **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
-
__init__
(layer_dims=None, num_labels=None, task_name='ner', **kwargs)[source]¶ - Parameters
layer_dims (list) – The size of the layers in the feed forward component. The feed forward will have as many layers as there are ints in this list. This param will be deprecated in future
num_labels (int) – The numbers of labels. Use to set the size of the final layer in the feed forward component. It is recommended to only set num_labels or layer_dims, not both.
task_name –
kwargs –
-
classmethod
load
(pretrained_model_name_or_path, revision=None)[source]¶ Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g.bert-base-cased-finetuned-conll03-english)
- Parameters
pretrained_model_name_or_path –
local path of a saved model or name of a publicly available model. Exemplary public names: - bert-base-cased-finetuned-conll03-english
See https://huggingface.co/models for full list
-
forward
(X)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
logits_to_loss
(logits, initial_mask, padding_mask=None, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
labels (object) – labels, can vary in shape and type, depending on task
- Returns
per sample loss as a torch.tensor of shape [batch_size]
-
logits_to_preds
(logits, initial_mask, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine turn logits into predictions.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
- Returns
predictions as a torch.tensor of shape [batch_size]
-
prepare_labels
(initial_mask, **kwargs)[source]¶ Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.
- Parameters
kwargs (object) – placeholder for passing generic parameters
- Returns
labels in the right format
- Return type
object
-
-
class
farm.modeling.prediction_head.
BertLMHead
(hidden_size, vocab_size, hidden_act='gelu', task_name='lm', **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
-
__init__
(hidden_size, vocab_size, hidden_act='gelu', task_name='lm', **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
classmethod
load
(pretrained_model_name_or_path, revision=None, n_added_tokens=0)[source]¶ Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g.bert-base-cased)
- Parameters
pretrained_model_name_or_path –
local path of a saved model or name of a publicly available model. Exemplary public names: - bert-base-cased
See https://huggingface.co/models for full list
revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
-
forward
(hidden_states)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
logits_to_loss
(logits, **kwargs)[source]¶ Implement this function in your special Prediction Head. Should combine logits and labels with a loss fct to a per sample loss.
- Parameters
logits (object) – logits, can vary in shape and type, depending on task
labels (object) – labels, can vary in shape and type, depending on task
- Returns
per sample loss as a torch.tensor of shape [batch_size]
-
-
class
farm.modeling.prediction_head.
NextSentenceHead
(layer_dims=None, num_labels=None, class_weights=None, loss_ignore_index=-100, loss_reduction='none', task_name='text_classification', **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.TextClassificationHead
- Almost identical to a TextClassificationHead. Only difference: we can load the weights from
a pretrained language model that was saved in the Transformers style (all in one model).
-
classmethod
load
(pretrained_model_name_or_path)[source]¶ Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g.bert-base-cased)
- Parameters
pretrained_model_name_or_path –
local path of a saved model or name of a publicly available model. Exemplary public names: - bert-base-cased
See https://huggingface.co/models for full list
-
class
farm.modeling.prediction_head.
FeedForwardBlock
(layer_dims, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
A feed forward neural network of variable depth and width.
-
__init__
(layer_dims, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(X)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
farm.modeling.prediction_head.
QuestionAnsweringHead
(layer_dims=[768, 2], task_name='question_answering', no_ans_boost=0.0, context_window_size=100, n_best=5, n_best_per_sample=None, duplicate_filtering=-1, **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
A question answering head predicts the start and end of the answer on token level.
-
__init__
(layer_dims=[768, 2], task_name='question_answering', no_ans_boost=0.0, context_window_size=100, n_best=5, n_best_per_sample=None, duplicate_filtering=-1, **kwargs)[source]¶ - Parameters
layer_dims (List[Int]) – dimensions of Feed Forward block, e.g. [768,2], for adjusting to BERT embedding. Output should be always 2
kwargs (object) – placeholder for passing generic parameters
no_ans_boost (float) – How much the no_answer logit is boosted/increased. The higher the value, the more likely a “no answer possible given the input text” is returned by the model
context_window_size (int) – The size, in characters, of the window around the answer span that is used when displaying the context around the answer.
n_best (int) – The number of positive answer spans for each document.
n_best_per_sample (int) – num candidate answer spans to consider from each passage. Each passage also returns “no answer” info. This is decoupled from n_best on document level, since predictions on passage level are very similar. It should have a low value
duplicate_filtering (int) – Answers are filtered based on their position. Both start and end position of the answers are considered. The higher the value, answers that are more apart are filtered out. 0 corresponds to exact duplicates. -1 turns off duplicate removal.
-
classmethod
load
(pretrained_model_name_or_path, revision=None)[source]¶ Load a prediction head from a saved FARM or transformers model. pretrained_model_name_or_path can be one of the following: a) Local path to a FARM prediction head config (e.g. my-bert/prediction_head_0_config.json) b) Local path to a Transformers model (e.g. my-bert) c) Name of a public model from https://huggingface.co/models (e.g. distilbert-base-uncased-distilled-squad)
- Parameters
pretrained_model_name_or_path –
local path of a saved model or name of a publicly available model. Exemplary public names: - distilbert-base-uncased-distilled-squad - bert-large-uncased-whole-word-masking-finetuned-squad
See https://huggingface.co/models for full list
revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
-
forward
(X)[source]¶ One forward pass through the prediction head model, starting with language model output on token level
-
logits_to_loss
(logits, labels, **kwargs)[source]¶ Combine predictions and labels to a per sample loss.
-
logits_to_preds
(logits, span_mask, start_of_word, seq_2_start_t, max_answer_length=1000, **kwargs)[source]¶ Get the predicted index of start and end token of the answer. Note that the output is at token level and not word level. Note also that these logits correspond to the tokens of a sample (i.e. special tokens, question tokens, passage_tokens)
-
get_top_candidates
(sorted_candidates, start_end_matrix, sample_idx)[source]¶ Returns top candidate answers as a list of Span objects. Operates on a matrix of summed start and end logits. This matrix corresponds to a single sample (includes special tokens, question tokens, passage tokens). This method always returns a list of len n_best + 1 (it is comprised of the n_best positive answers along with the one no_answer)
-
formatted_preds
(logits=None, preds=None, baskets=None, **kwargs)[source]¶ Takes a list of passage level predictions, each corresponding to one sample, and converts them into document level predictions. Leverages information in the SampleBaskets. Assumes that we are being passed predictions from ALL samples in the one SampleBasket i.e. all passages of a document. Logits should be None, because we have already converted the logits to predictions before calling formatted_preds. (see Inferencer._get_predictions_and_aggregate()).
-
to_qa_preds
(top_preds, no_ans_gaps, baskets)[source]¶ Groups Span objects together in a QAPred object
-
aggregate_preds
(preds, passage_start_t, ids, seq_2_start_t=None, labels=None)[source]¶ Aggregate passage level predictions to create document level predictions. This method assumes that all passages of each document are contained in preds i.e. that there are no incomplete documents. The output of this step are prediction spans. No answer is represented by a (-1, -1) span on the document level
-
static
reduce_labels
(labels)[source]¶ Removes repeat answers. Represents a no answer label as (-1,-1)
-
reduce_preds
(preds)[source]¶ This function contains the logic for choosing the best answers from each passage. In the end, it returns the n_best predictions on the document level.
-
static
pred_to_doc_idxs
(pred, passage_start_t)[source]¶ Converts the passage level predictions to document level predictions. Note that on the doc level we don’t have special tokens or question tokens. This means that a no answer cannot be prepresented by a (0,0) qa_answer but will instead be represented by (-1, -1)
-
static
label_to_doc_idxs
(label, passage_start_t)[source]¶ Converts the passage level labels to document level labels. Note that on the doc level we don’t have special tokens or question tokens. This means that a no answer cannot be prepresented by a (0,0) span but will instead be represented by (-1, -1)
-
prepare_labels
(labels, start_of_word, **kwargs)[source]¶ Some prediction heads need additional label conversion. E.g. NER needs word level labels turned into subword token level labels.
- Parameters
kwargs (object) – placeholder for passing generic parameters
- Returns
labels in the right format
- Return type
object
-
static
merge_formatted_preds
(preds_all)[source]¶ Merges results from the two prediction heads used for NQ style QA. Takes the prediction from QA head and assigns it the appropriate classification label. This mapping is achieved through passage_id. preds_all should contain [QuestionAnsweringHead.formatted_preds(), TextClassificationHead()]. The first item of this list should be of len=n_documents while the second item should be of len=n_passages
-
-
farm.modeling.prediction_head.
pick_single_fn
(heads, fn_name)[source]¶ Iterates over heads and returns a static method called fn_name if and only if one head has a method of that name. If no heads have such a method, None is returned. If more than one head has such a method, an Exception is thrown
-
class
farm.modeling.prediction_head.
TextSimilarityHead
(similarity_function: str = 'dot_product', global_loss_buffer_size: int = 150000, **kwargs)[source]¶ Bases:
farm.modeling.prediction_head.PredictionHead
Trains a head on predicting the similarity of two texts like in Dense Passage Retrieval.
-
__init__
(similarity_function: str = 'dot_product', global_loss_buffer_size: int = 150000, **kwargs)[source]¶ Init the TextSimilarityHead.
- Parameters
similarity_function – Function to calculate similarity between queries and passage embeddings. Choose either “dot_product” (Default) or “cosine”.
global_loss_buffer_size – Buffer size for all_gather() in DDP. Increase if errors like “encoded data exceeds max_size …” come up
kwargs –
-
classmethod
dot_product_scores
(query_vectors, passage_vectors)[source]¶ Calculates dot product similarity scores for two 2-dimensional tensors
- Parameters
query_vectors (torch.Tensor) – tensor of query embeddings from BiAdaptive model of dimension n1 x D, where n1 is the number of queries/batch size and D is embedding size
passage_vectors (torch.Tensor) – tensor of context/passage embeddings from BiAdaptive model of dimension n2 x D, where n2 is the number of queries/batch size and D is embedding size
- Return dot_product
similarity score of each query with each context/passage (dimension: n1xn2)
-
classmethod
cosine_scores
(query_vectors, passage_vectors)[source]¶ Calculates cosine similarity scores for two 2-dimensional tensors
- Parameters
query_vectors (torch.Tensor) – tensor of query embeddings from BiAdaptive model of dimension n1 x D, where n1 is the number of queries/batch size and D is embedding size
passage_vectors (torch.Tensor) – tensor of context/passage embeddings from BiAdaptive model of dimension n2 x D, where n2 is the number of queries/batch size and D is embedding size
- Returns
cosine similarity score of each query with each context/passage (dimension: n1xn2)
-
get_similarity_function
()[source]¶ Returns the type of similarity function used to compare queries and passages/contexts
-
forward
(query_vectors: torch.Tensor, passage_vectors: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ Only packs the embeddings from both language models into a tuple. No further modification. The similarity calculation is handled later to enable distributed training (DDP) while keeping the support for in-batch negatives. (Gather all embeddings from nodes => then do similarity scores + loss)
- Parameters
query_vectors (torch.Tensor) – Tensor of query embeddings from BiAdaptive model of dimension n1 x D, where n1 is the number of queries/batch size and D is embedding size
passage_vectors (torch.Tensor) – Tensor of context/passage embeddings from BiAdaptive model of dimension n2 x D, where n2 is the number of queries/batch size and D is embedding size
- Returns
(query_vectors, passage_vectors)
-
logits_to_loss
(logits: Tuple[torch.Tensor, torch.Tensor], **kwargs)[source]¶ Computes the loss (Default: NLLLoss) by applying a similarity function (Default: dot product) to the input tuple of (query_vectors, passage_vectors) and afterwards applying the loss function on similarity scores.
- Parameters
logits – Tuple of Tensors (query_embedding, passage_embedding) as returned from forward()
- Returns
negative log likelihood loss from similarity scores
-
logits_to_preds
(logits: Tuple[torch.Tensor, torch.Tensor], **kwargs)[source]¶ Returns predicted ranks(similarity) of passages/context for each query
- Parameters
logits (torch.Tensor) – tensor of log softmax similarity scores of each query with each context/passage (dimension: n1xn2)
- Returns
predicted ranks of passages for each query
-
Optimization¶
-
class
farm.modeling.optimization.
WrappedDataParallel
(module, device_ids=None, output_device=None, dim=0)[source]¶ Bases:
torch.nn.parallel.data_parallel.DataParallel
A way of adapting attributes of underlying class to parallel mode. See: https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html#dataparallel
Gets into recursion errors. Workaround see: https://discuss.pytorch.org/t/access-att-of-model-wrapped-within-torch-nn-dataparallel-maximum-recursion-depth-exceeded/46975
-
class
farm.modeling.optimization.
WrappedDDP
(module, device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False, gradient_as_bucket_view=False)[source]¶ Bases:
torch.nn.parallel.distributed.DistributedDataParallel
A way of adapting attributes of underlying class to distributed mode. Same as in WrappedDataParallel above. Even when using distributed on a single machine with multiple GPUs, apex can speed up training significantly. Distributed code must be launched with “python -m torch.distributed.launch –nproc_per_node=1 run_script.py”
-
farm.modeling.optimization.
initialize_optimizer
(model, n_batches, n_epochs, device, learning_rate, optimizer_opts=None, schedule_opts=None, distributed=False, grad_acc_steps=1, local_rank=-1, use_amp=None)[source]¶ Initializes an optimizer, a learning rate scheduler and converts the model if needed (e.g for mixed precision). Per default, we use transformers’ AdamW and a linear warmup schedule with warmup ratio 0.1. You can easily switch optimizer and schedule via optimizer_opts and schedule_opts.
- Parameters
model (AdaptiveModel) – model to optimize (e.g. trimming weights to fp16 / mixed precision)
n_batches (int) – number of batches for training
n_epochs – number of epochs for training
device –
learning_rate (float) – Learning rate
optimizer_opts – Dict to customize the optimizer. Choose any optimizer available from torch.optim, apex.optimizers or transformers.optimization by supplying the class name and the parameters for the constructor. Examples: 1) AdamW from Transformers (Default): {“name”: “TransformersAdamW”, “correct_bias”: False, “weight_decay”: 0.01} 2) SGD from pytorch: {“name”: “SGD”, “momentum”: 0.0} 3) FusedLAMB from apex: {“name”: “FusedLAMB”, “bias_correction”: True}
schedule_opts – Dict to customize the learning rate schedule. Choose any Schedule from Pytorch or Huggingface’s Transformers by supplying the class name and the parameters needed by the constructor. If the dict does not contain
num_training_steps
it will be set by calculating it fromn_batches
,grad_acc_steps
andn_epochs
. Examples: 1) Linear Warmup (Default): {“name”: “LinearWarmup”, “num_warmup_steps”: 0.1 * num_training_steps, “num_training_steps”: num_training_steps} 2) CosineWarmup: {“name”: “CosineWarmup”, “num_warmup_steps”: 0.1 * num_training_steps, “num_training_steps”: num_training_steps} 3) CyclicLR from pytorch: {“name”: “CyclicLR”, “base_lr”: 1e-5, “max_lr”:1e-4, “step_size_up”: 100}distributed – Whether training on distributed machines
grad_acc_steps – Number of steps to accumulate gradients for. Helpful to mimic large batch_sizes on small machines.
local_rank – rank of the machine in a distributed setting
use_amp – Optimization level of nvidia’s automatic mixed precision (AMP). The higher the level, the faster the model. Options: “O0” (Normal FP32 training) “O1” (Mixed Precision => Recommended) “O2” (Almost FP16) “O3” (Pure FP16). See details on: https://nvidia.github.io/apex/amp.html
- Returns
model, optimizer, scheduler
-
farm.modeling.optimization.
get_scheduler
(optimizer, opts)[source]¶ Get the scheduler based on dictionary with options. Options are passed to the scheduler constructor.
- Parameters
optimizer – optimizer whose learning rate to control
opts – dictionary of args to be passed to constructor of schedule
- Returns
created scheduler
-
farm.modeling.optimization.
optimize_model
(model, device, local_rank, optimizer=None, distributed=False, use_amp=None)[source]¶ Wraps MultiGPU or distributed usage around a model No support for ONNX models
- Parameters
model (AdaptiveModel) – model to optimize (e.g. trimming weights to fp16 / mixed precision)
device – either gpu or cpu, get the device from initialize_device_settings()
distributed – Whether training on distributed machines
local_rank – rank of the machine in a distributed setting
use_amp – Optimization level of nvidia’s automatic mixed precision (AMP). The higher the level, the faster the model. Options: “O0” (Normal FP32 training) “O1” (Mixed Precision => Recommended) “O2” (Almost FP16) “O3” (Pure FP16). See details on: https://nvidia.github.io/apex/amp.html
- Returns
model, optimizer
Tokenization¶
Tokenization classes.
-
class
farm.modeling.tokenization.
Tokenizer
[source]¶ Bases:
object
Simple Wrapper for Tokenizers from the transformers package. Enables loading of different Tokenizer classes with a uniform interface.
-
classmethod
load
(pretrained_model_name_or_path, revision=None, tokenizer_class=None, use_fast=True, **kwargs)[source]¶ Enables loading of different Tokenizer classes with a uniform interface. Either infer the class from model config or define it manually via tokenizer_class.
- Parameters
pretrained_model_name_or_path (str) – The path of the saved pretrained model or its name (e.g. bert-base-uncased)
revision (str) – The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
tokenizer_class (str) – (Optional) Name of the tokenizer class to load (e.g. BertTokenizer)
use_fast (bool) – (Optional, False by default) Indicate if FARM should try to load the fast version of the tokenizer (True) or use the Python one (False). Only DistilBERT, BERT and Electra fast tokenizers are supported.
kwargs –
- Returns
Tokenizer
-
classmethod
-
class
farm.modeling.tokenization.
EmbeddingTokenizer
(vocab_file, do_lower_case=True, unk_token='[UNK]', sep_token='[SEP]', pad_token='[PAD]', cls_token='[CLS]', mask_token='[MASK]', **kwargs)[source]¶ Bases:
transformers.tokenization_utils.PreTrainedTokenizer
Constructs an EmbeddingTokenizer.
-
__init__
(vocab_file, do_lower_case=True, unk_token='[UNK]', sep_token='[SEP]', pad_token='[PAD]', cls_token='[CLS]', mask_token='[MASK]', **kwargs)[source]¶ - Parameters
vocab_file (str) – Path to a one-word-per-line vocabulary file
do_lower_case (bool) – Flag whether to lower case the input
-
property
vocab_size
¶ int
: Size of the base vocabulary (without the added tokens).
-
-
farm.modeling.tokenization.
tokenize_with_metadata
(text, tokenizer)[source]¶ Performing tokenization while storing some important metadata for each token:
offsets: (int) Character index where the token begins in the original text
start_of_word: (bool) If the token is the start of a word. Particularly helpful for NER and QA tasks.
We do this by first doing whitespace tokenization and then applying the model specific tokenizer to each “word”.
Note
We don’t assume to preserve exact whitespaces in the tokens! This means: tabs, new lines, multiple whitespace etc will all resolve to a single ” “. This doesn’t make a difference for BERT + XLNet but it does for RoBERTa. For RoBERTa it has the positive effect of a shorter sequence length, but some information about whitespace type is lost which might be helpful for certain NLP tasks ( e.g tab for tables).
- Parameters
text (str) – Text to tokenize
tokenizer – Tokenizer (e.g. from Tokenizer.load())
- Returns
Dictionary with “tokens”, “offsets” and “start_of_word”
- Return type
dict
-
farm.modeling.tokenization.
truncate_sequences
(seq_a, seq_b, tokenizer, max_seq_len, truncation_strategy='longest_first', with_special_tokens=True, stride=0)[source]¶ Reduces a single sequence or a pair of sequences to a maximum sequence length. The sequences can contain tokens or any other elements (offsets, masks …). If with_special_tokens is enabled, it’ll remove some additional tokens to have exactly enough space for later adding special tokens (CLS, SEP etc.)
Supported truncation strategies:
longest_first: (default) Iteratively reduce the inputs sequence until the input is under max_length starting from the longest one at each token (when there is a pair of input sequences). Overflowing tokens only contains overflow from the first sequence.
only_first: Only truncate the first sequence. raise an error if the first sequence is shorter or equal to than num_tokens_to_remove.
only_second: Only truncate the second sequence
do_not_truncate: Does not truncate (raise an error if the input sequence is longer than max_length)
- Parameters
seq_a (list) – First sequence of tokens/offsets/…
seq_b (None or list) – Optional second sequence of tokens/offsets/…
tokenizer – Tokenizer (e.g. from Tokenizer.load())
max_seq_len (int) –
truncation_strategy (str) – how the sequence(s) should be truncated down. Default: “longest_first” (see above for other options).
with_special_tokens (bool) – If true, it’ll remove some additional tokens to have exactly enough space for later adding special tokens (CLS, SEP etc.)
stride (int) – optional stride of the window during truncation
- Returns
truncated seq_a, truncated seq_b, overflowing tokens
-
farm.modeling.tokenization.
insert_at_special_tokens_pos
(seq, special_tokens_mask, insert_element)[source]¶ Adds elements to a sequence at the positions that align with special tokens. This is useful for expanding label ids or masks, so that they align with corresponding tokens (incl. the special tokens)
Example:
# Tokens: ["CLS", "some", "words","SEP"] >>> special_tokens_mask = [1,0,0,1] >>> lm_label_ids = [12,200] >>> insert_at_special_tokens_pos(lm_label_ids, special_tokens_mask, insert_element=-1) [-1, 12, 200, -1]
- Parameters
seq (list) – List where you want to insert new elements
special_tokens_mask (list) – list with “1” for positions of special chars
insert_element – the value you want to insert
- Returns
list
-
farm.modeling.tokenization.
tokenize_batch_question_answering
(pre_baskets, tokenizer, indices)[source]¶ Tokenizes text data for question answering tasks. Tokenization means splitting words into subwords, depending on the tokenizer’s vocabulary.
We first tokenize all documents in batch mode. (When using FastTokenizers Rust multithreading can be enabled by TODO add how to enable rust mt)
Then we tokenize each question individually
We construct dicts with question and corresponding document text + tokens + offsets + ids
- Parameters
pre_baskets – input dicts with QA info #todo change to input objects
tokenizer – tokenizer to be used
indices – list, indices used during multiprocessing so that IDs assigned to our baskets are unique
- Returns
baskets, list containing question and corresponding document information