-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
When training Vision Mamba (VMamba) on CIFAR-10 from scratch using TRADES, I would like to evaluate its robustness accuracy. What changes should I make to the get_val_loader function from classification/generate_adv_images.py?
The current function is as follows:
def get_val_loader(data_path, batch_size): transform = transforms.Compose([ transforms.Resize(256, interpolation=transforms.InterpolationMode.BICUBIC), transforms.CenterCrop(224), transforms.ToTensor(), ]) # Load ImageNet validation dataset val_dataset = ImageNet5k(root=os.path.join(data_path, "val"), transform=transform) val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False) return val_loader, val_dataset
During training, the following build_transform function is applied:
`def build_transform(is_train, config):
resize_im = config.DATA.IMG_SIZE > 32
if is_train:
# this should always dispatch to transforms_imagenet_train
transform = create_transform(
input_size=config.DATA.IMG_SIZE,
is_training=True,
color_jitter=config.AUG.COLOR_JITTER if config.AUG.COLOR_JITTER > 0 else None,
auto_augment=config.AUG.AUTO_AUGMENT if config.AUG.AUTO_AUGMENT != 'none' else None,
re_prob=config.AUG.REPROB,
re_mode=config.AUG.REMODE,
re_count=config.AUG.RECOUNT,
interpolation=config.DATA.INTERPOLATION,
)
if not resize_im:
# replace RandomResizedCropAndInterpolation with
# RandomCrop
transform.transforms[0] = transforms.RandomCrop(config.DATA.IMG_SIZE, padding=4)
return transform
t = []
if resize_im:
if config.TEST.CROP:
size = int((256 / 224) * config.DATA.IMG_SIZE)
t.append(
transforms.Resize(size, interpolation=_pil_interp(config.DATA.INTERPOLATION)),
# to maintain same ratio w.r.t. 224 images
)
t.append(transforms.CenterCrop(config.DATA.IMG_SIZE))
else:
t.append(
transforms.Resize((config.DATA.IMG_SIZE, config.DATA.IMG_SIZE),
interpolation=_pil_interp(config.DATA.INTERPOLATION))
)
t.append(transforms.ToTensor())
t.append(transforms.Normalize(IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD))
return transforms.Compose(t)`
It appears that for CIFAR-10, the model processes images with a shape of 3x32x32 (since resize_im = False) for both train and test
sets. Should I modify the get_val_loader function so that the only transformation applied is:
transform = transforms.Compose([transforms.ToTensor(),])
Additionally, it would be great if you could share your fine-tuning setup for CIFAR-10, both with and without TRADES, including details such as the number of epochs, learning rate, etc.