-
Notifications
You must be signed in to change notification settings - Fork 2
Tensorflow training
You will need to have gsutil on your path. We configure that in Matlab using the command or mcCloudConfigure or something.
Maybe one basic routine we need is
- Download a url data set
- Extract the tar file data
- Use gsutil to copy it somewhere
Or maybe copy the tar file to the cloud and extract it there.
Do this outside of Matlab. Go get your data and annotations and put them somewhere. For example, this is how you get and extract the test data set for the 'pet' example.
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
tar -xvf images.tar.gz
tar -xvf annotations.tar.gz
Each data set needs a method to convert the images and annotations into the TF format. Find the method for your data and run it. For the 'pet' data this is the command
# convert pet format to tf-format
python object_detection/dataset_tools/create_pet_tf_record.py \
--label_map_path=object_detection/data/pet_label_map.pbtxt \
--data_dir=`pwd` \
--output_dir=`pwd`
Then copy them to the cloud. You might tar all the files, copy the tar files, and extract them on the cloud.
gsutil cp pet_train_with_masks.record gs://${YOUR_GCS_BUCKET}/data/pet_train.record
gsutil cp pet_val_with_masks.record gs://${YOUR_GCS_BUCKET}/data/pet_val.record
gsutil cp object_detection/data/pet_label_map.pbtxt gs://${YOUR_GCS_BUCKET}/data/pet_label_map.pbtxt
You need to figure out a way to get your model. Seems familiar. Download, extract and copy.
# download a coco-pretrained model
wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${YOUR_GCS_BUCKET}/data/
Edit it and copy it up. This is the 'pet' example. Not sure yet how general this is.
# Edit the faster_rcnn_resnet101_pets.config template. Please note that there
# are multiple places where PATH_TO_BE_CONFIGURED needs to be set to the working dir.
sed -i '' "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \
object_detection/samples/configs/faster_rcnn_resnet101_pets.config
# Copy edited template to cloud.
gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
The cloud.yml file defines GPU resources. We could write a function that returns this command based on parameters in the mc object. This could be a gCloud method.
gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
--runtime-version 1.2 \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--train_dir=gs://${YOUR_GCS_BUCKET}/train \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
There could be a gCloud method that returns this evaluate command based on the parameters stored in the gCloud object.
gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` \
--runtime-version 1.2 \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.eval \
--region us-central1 \
--scale-tier BASIC_GPU \
-- \
--checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
--eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
Use the gCloud object to bring up the tensorboard. That could be a method, also.
# Monitor Progress with Tensorboard, for the first time.
gcloud auth application-default login
tensorboard --logdir=gs://${YOUR_GCS_BUCKET}
# Then Navigate to 'localhost:6006'
#Please note it may take Tensorboard a couple minutes to populate with data.