From 1f2e32b4216af1cd6ca7438ead81ff9d8392bc11 Mon Sep 17 00:00:00 2001 From: Scott Sievert Date: Sun, 17 Apr 2016 10:59:43 -0500 Subject: [PATCH] provides monospace formatting for code/filenames --- README.md | 44 ++++++++++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 46aee4a..f85d89a 100644 --- a/README.md +++ b/README.md @@ -34,21 +34,23 @@ And this package is needed to create and manipulate netcdf data files with pytho To build RNNLIB do +``` shell $ cmake -DCMAKE_BUILD_TYPE=Release . $ cmake --build . +``` -Cmake run creates the binary files 'rnnlib', 'rnnsynth' and 'gradient_check' in the current directory. +Cmake run creates the binary files `rnnlib`, `rnnsynth` and `gradient_check` in the current directory. It is recommended that you add the directory containing the 'rnnlib' binary to your path, -as otherwise the tools in the 'utilities' directory will not work. +as otherwise the tools in the `utilities` directory will not work. -Project files for the integrated development environments can be generated by cmake. Run cmake --help +Project files for the integrated development environments can be generated by cmake. Run `cmake --help` to get list of supported IDEs. # Handwriting synthesis -Step in to examples/online_prediction and go through few steps below to prepare the +Step in to `examples/online_prediction` and go through few steps below to prepare the training data, train the model and eventually plot the results of the synthesis ## Downloading online handwriting dataset @@ -57,26 +59,28 @@ Start by registering and downloading pen strokes data from http://www.iam.unibe.ch/~fkiwww/iamondb/data/lineStrokes-all.tar.gz Text lables for strokes can be found here http://www.iam.unibe.ch/~fkiwww/iamondb/data/ascii-all.tar.gz -Then unzip ./lineStrokes and ./ascii under examples/online_prediction. +Then unzip `./lineStroke`s and `./ascii under examples/online_prediction`. Data format in the downloaded files can not be used as is and requires further preprocessing to convert pen coordinates to offsets from previous point and merge them into the single file of netcdf format. ## Preparing the training data -Run ./build_netcdf.sh to split dataset to training and validation sets. +Run `./build_netcdf.sh` to split dataset to training and validation sets. The same script does all necessary preprocessing including normalisation -of the input and makes corresponding online.nc and online_validation.nc +of the input and makes corresponding `online.nc` and `online_validation.nc` files for use with rnnlib . -Each point in the input sequences from online.nc consists of three numbers: +Each point in the input sequences from `online.nc` consists of three numbers: the x and y offset from the previous point, and the binary end-of-stroke feature. ## Gradient check To gain some confidence that the build is fine run the gradient check: +``` shell gradient_check --autosave=false check_synth2.config +``` ## Training @@ -87,7 +91,9 @@ too slow convergence rate. ### Step 1 +``` shell rnnlib --verbose=false synth1d.config +``` Where synth1d.config is 1st step configuration file that defines network topology: 3 LSTM hidden layers of 400 cells, 20 gaussian mixtures as output layer, 10 mixtures @@ -96,7 +102,7 @@ Somewhere between training epoch 10-15 it will find optimal solution and will do "early stopping" w/o improvement for 20 epoch. "Early" here takes 3 days on Intel Sandybridge CPU. Normally training can be stopped as long as loss starts rising up for 2-3 consequent epochs. -The best solution found is stored in synth1d@