Tensorflow training

You will need to have gsutil on your path. We configure that in Matlab using the command or mcCloudConfigure or something.

Maybe one basic routine we need is

Download a url data set
Extract the tar file data
Use gsutil to copy it somewhere

Or maybe copy the tar file to the cloud and extract it there.

Getting data

Do this outside of Matlab. Go get your data and annotations and put them somewhere. For example, this is how you get and extract the test data set for the 'pet' example.

wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
tar -xvf images.tar.gz
tar -xvf annotations.tar.gz

Formatting the data

Each data set needs a method to convert the images and annotations into the TF format. Find the method for your data and run it. For the 'pet' data this is the command

# convert pet format to tf-format
python object_detection/dataset_tools/create_pet_tf_record.py \
    --label_map_path=object_detection/data/pet_label_map.pbtxt \
    --data_dir=`pwd` \
    --output_dir=`pwd`

Then copy them to the cloud. You might tar all the files, copy the tar files, and extract them on the cloud.

gsutil cp pet_train_with_masks.record gs://${YOUR_GCS_BUCKET}/data/pet_train.record
gsutil cp pet_val_with_masks.record gs://${YOUR_GCS_BUCKET}/data/pet_val.record
gsutil cp object_detection/data/pet_label_map.pbtxt gs://${YOUR_GCS_BUCKET}/data/pet_label_map.pbtxt

Getting the model

You need to figure out a way to get your model. Seems familiar. Download, extract and copy.

# download a coco-pretrained model
wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${YOUR_GCS_BUCKET}/data/

Configure the model parameter file

Edit it and copy it up. This is the 'pet' example. Not sure yet how general this is.

# Edit the faster_rcnn_resnet101_pets.config template. Please note that there
# are multiple places where PATH_TO_BE_CONFIGURED needs to be set to the working dir. 
sed -i '' "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \
    object_detection/samples/configs/faster_rcnn_resnet101_pets.config

# Copy edited template to cloud.
gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
    gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Start training

The cloud.yml file defines GPU resources. We could write a function that returns this command based on parameters in the mc object. This could be a gCloud method.

gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
    --runtime-version 1.2 \
    --job-dir=gs://${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.train \
    --region us-central1 \
    --config object_detection/samples/cloud/cloud.yml \
    -- \
    --train_dir=gs://${YOUR_GCS_BUCKET}/train \
    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Evaluate concurrently

There could be a gCloud method that returns this evaluate command based on the parameters stored in the gCloud object.

gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` \
    --runtime-version 1.2 \
    --job-dir=gs://${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.eval \
    --region us-central1 \
    --scale-tier BASIC_GPU \
    -- \
    --checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
    --eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Evaluation of final training on tensorboard

Use the gCloud object to bring up the tensorboard. That could be a method, also.

# Monitor Progress with Tensorboard, for the first time.
gcloud auth application-default login

tensorboard --logdir=gs://${YOUR_GCS_BUCKET}
# Then Navigate to 'localhost:6006'
#Please note it may take Tensorboard a couple minutes to populate with data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow training

Getting data

Formatting the data

Getting the model

Configure the model parameter file

Start training

Evaluate concurrently

Evaluation of final training on tensorboard

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally