- Pre-tokenize the entire text dataset ahead of time. - Pre-process the images ahead of time. - Distribute the dataset across nodes.