Skip to content

Data Loader Error when trying to finetune Roberta #26

@evelynkyl

Description

@evelynkyl

❓ Questions & Help

I'm trying to fine-tune a pretrained XLM-Roberta model using run-lm-fine-tuning and the following script. However, it returns RuntimeError: the stack expects each tensor to be equal size, but got [12] at entry 0 and [2] at entry 1. I used the same dataset as the readme on the example page. Then, I tried changing the model_type and model_name_or_path to roberta, but the same error occurs.

export TRAIN_FILE=/home/evelyn/usr/examples/wikitext-2-raw/wiki.train.raw
export TEST_FILE=/home/evelyn/usr/examples/wikitext-2-raw/wiki.test.raw

CUDA_VISIBLE_DEVICES=1 python3 run_lm_finetuning.py \
    --output_dir=output \
    --model_type=roberta \
    --model_name_or_path=roberta-base \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE \
    --mlm

Below is the full output log:

02/28/2022 00:08:18 - WARNING - __main__ -   Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
02/28/2022 00:08:23 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, block_size=510, cache_dir='', config_name='', device=device(type='cpu'), do_eval=True, do_lower_case=False, do_train=True, eval_all_checkpoints=False, eval_data_file='/home/evelyn/usr/examples/wikitext-2-raw/wiki.test.raw', evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=50, max_grad_norm=1.0, max_steps=-1, mlm=True, mlm_probability=0.15, model_name_or_path='roberta-base', model_type='roberta', n_gpu=0, no_cuda=False, num_train_epochs=1.0, output_dir='output', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=4, save_steps=50, save_total_limit=None, seed=42, server_ip='', server_port='', tokenizer_name='', train_data_file='/home/evelyn/usr/examples/wikitext-2-raw/wiki.train.raw', warmup_steps=0, weight_decay=0.0)
02/28/2022 00:08:23 - INFO - __main__ -   Loading features from cached file /home/evelyn/usr/examples/wikitext-2-raw/cached_lm_510_wiki.train.raw
/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
02/28/2022 00:08:23 - INFO - __main__ -   ***** Running training *****
02/28/2022 00:08:23 - INFO - __main__ -     Num examples = 36718
02/28/2022 00:08:23 - INFO - __main__ -     Num Epochs = 1
02/28/2022 00:08:23 - INFO - __main__ -     Instantaneous batch size per GPU = 4
02/28/2022 00:08:23 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 4
02/28/2022 00:08:23 - INFO - __main__ -     Gradient Accumulation steps = 1
02/28/2022 00:08:23 - INFO - __main__ -     Total optimization steps = 9180
Iteration:   0%|                                                                                            | 0/9180 [00:00<?, ?it/s]
Epoch:   0%|                                                                                                   | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run_lm_finetuning.py", line 596, in <module>
    main()
  File "run_lm_finetuning.py", line 548, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_lm_finetuning.py", line 259, in train
    for step, batch in enumerate(epoch_iterator):
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/tqdm-4.62.3-py3.8.egg/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [2] at entry 1

Any pointer on how to resolve this issue? Thanks so much!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions