Examples ================================ You can find exemplary scripts for the major down-stream tasks in :code:`examples/` Document Classification ########################## (see :code:`examples/doc_classification.py` for full script) 1.Create a tokenizer:: tokenizer = Tokenizer.load( pretrained_model_name_or_path=lang_model, do_lower_case=False) 2. Create a DataProcessor that handles all the conversion from raw text into a pytorch Dataset:: processor = GermEval18CoarseProcessor(tokenizer=tokenizer, max_seq_len=128, data_dir="../data/germeval18") 3. Create a DataSilo that loads several datasets (train/dev/test), provides DataLoaders for them and calculates a few descriptive statistics of our datasets:: data_silo = DataSilo( processor=processor, batch_size=batch_size) 4. Create an AdaptiveModel a) which consists of a pretrained language model as a basis:: language_model = LanguageModel.load(lang_model) b) and a prediction head on top that is suited for our task => Text classification:: prediction_head = TextClassificationHead(layer_dims=[768, len(processor.label_list)]) model = AdaptiveModel( language_model=language_model, prediction_heads=[prediction_head], embeds_dropout_prob=0.1, lm_output_types=["per_sequence"], device=device) 5. Create an optimizer and optionally optimize model and optimizer with AMP:: model, optimizer, warmup_linear = initialize_optimizer( model=model, learning_rate=2e-5, warmup_proportion=0.1, n_examples=data_silo.n_samples("train"), batch_size=batch_size, n_epochs=1) 6. Feed everything to the Trainer, which keeps care of growing our model into powerful plant and evaluates it from time to time:: trainer = Trainer( optimizer=optimizer, data_silo=data_silo, epochs=n_epochs, n_gpu=1, warmup_linear=warmup_linear, evaluate_every=evaluate_every, device=device) 7. Let it grow:: model = trainer.train(model) 8. Hooray! You have a model. Store it:: save_dir = "save/bert-german-GNAD-tutorial" model.save(save_dir) processor.save(save_dir) 9. Load it & harvest your fruits (Inference):: basic_texts = [ {"text": "Schartau sagte dem Tagesspiegel, dass Fischer ein Idiot ist"}, {"text": "Martin Müller spielt Fussball"}, ] model = Inferencer(save_dir) result = model.inference_from_dicts(dicts=basic_texts) print(result)