Skip to content
Eugene Wu edited this page Nov 4, 2016 · 2 revisions

This document describes coding in terms of setting up and running experiments, tools to help with statistical analyses and plotting, and simplifying code deployments.

Running Experiments

    tldr; put all the raw data into a database

You want your experiments to be reproducible because they will fail the first (and second, third) times, so you will want an easy way to examine and debug them. The following are some basic instructions.

Put things in a database (say, SQLite3). You probably want ta schema similar to the following

    experiment_name   // what experiment are you running?  
    run_id            // you probably want to run multiple times and compute mean/std/CI statistics
    dataset(s)        // if you are using different datasets 
    seed              // if there is ANY source of randomness in your experiment, use a seed so you can reproduce it later
    parameter(s)      // what are you varying?  Each should be an attribute.  If unused, set the value to NULL
    measure(s)        // what you are measuring.  precision, recall, latency etc.
                      // store the RAW measures that can be used to compute aggregate measures
    timestamp(s)      // in systems work, you should be collecting timestamps.  Store them!

Some rules of thumb:

  • Parameters go on the x axis
  • Metrics are computed from the measures, and go on the y axis
    • Never ever compute aggregate statistics in your code and log only those. Always log all of the data and compute statistics later!
  • Always use a seed and record what it is
  • If the above table gets too wide, it's ok to denormalize.

If you want to go whole hog, try ReproZip

Plotting experiments

Simple advice

  • Did you compute aggregate values (e.g., mean latency etc)? Show the standard deviation, or more meaningfully, bootstrapped confidence intervals. See the appendix for UDF code to compute it in PostgreSQL

Flow chart for picking plots

  • TBD

Tools

  • For the vast majority of plots, ggplot2 in R is the way to go. Find a tutorial and follow it.
  • If you use python for everything like I do, try the pygg library. It gives you ggplot2 syntax in python. It can't handle multiple layers, which you probably shouldn't be doing anyways.

Presenting Experiments

There are simple steps to better understand what's going on. These are also the steps for presenting an experiment. Do the following

  1. List your hypothesis
  2. What ideal plots will help validate or debunk your hypothesis?
    1. Draw out what the plots should look like if your hypothesis is correct and if incorrect
    2. include x and y axes, shape of curves
  3. What will you do to generate these plots?
  4. Why do the plots look this way?
    1. Which parts confirm your hypothesis?
    2. What's surprising/does not confirm your hypothesis? Why?
  5. What are the next steps to
    1. answer 4ii
    2. use what you learned in your system
  6. Go to step 1

Deployments

You'll likely setup many application/server deployments as part of research. Automate your deployment

SSH

Fabric

Releasing Code

Releasing working code is a very good idea. If you are writing a python application, make it pip installable. The following repo provides a skeleton and simple example code for structuring a python package so it can be uploaded to pypi.

Appendix

Useful Code

PostgreSQL code for defining 95% confidence intervals

    DROP LANGUAGE plpythonu;
    CREATE LANGUAGE plpythonu;

    DROP FUNCTION IF EXISTS ci_final(numeric[]) cascade;
    CREATE  FUNCTION ci_final(vs numeric[])
    RETURNS numeric[]
    as $$
      vs = args[0]
      sortedvs = sorted(vs)
      from scikits import bootstrap
      return bootstrap.ci(sortedvs, alpha=0.05)
    $$ language plpythonu;

    DROP AGGREGATE IF EXISTS ci(numeric);
    CREATE AGGREGATE ci (numeric) (
      SFUNC = array_append,
      STYPE = numeric[],
      initcond = '{}',
      FINALFUNC = ci_final
    );


    -- an example query

    SELECT ci(measure)[0] AS lower_bound,
          ci(measure)[1] AS upper_bound
    FROM dataset

Clone this wiki locally