-
Notifications
You must be signed in to change notification settings - Fork 0
code
This document describes coding in terms of setting up and running experiments, tools to help with statistical analyses and plotting, and simplifying code deployments.
tldr; put all the raw data into a database
You want your experiments to be reproducible because they will fail the first (and second, third) times, so you will want an easy way to examine and debug them. The following are some basic instructions.
Put things in a database (say, SQLite3). You probably want ta schema similar to the following
experiment_name // what experiment are you running?
run_id // you probably want to run multiple times and compute mean/std/CI statistics
dataset(s) // if you are using different datasets
seed // if there is ANY source of randomness in your experiment, use a seed so you can reproduce it later
parameter(s) // what are you varying? Each should be an attribute. If unused, set the value to NULL
measure(s) // what you are measuring. precision, recall, latency etc.
// store the RAW measures that can be used to compute aggregate measures
timestamp(s) // in systems work, you should be collecting timestamps. Store them!
Some rules of thumb:
- Parameters go on the x axis
- Metrics are computed from the measures, and go on the y axis
- Never ever compute aggregate statistics in your code and log only those. Always log all of the data and compute statistics later!
- Always use a seed and record what it is
- If the above table gets too wide, it's ok to denormalize.
If you want to go whole hog, try ReproZip
Simple advice
- Did you compute aggregate values (e.g., mean latency etc)? Show the standard deviation, or more meaningfully, bootstrapped confidence intervals. See the appendix for UDF code to compute it in PostgreSQL
Flow chart for picking plots
- TBD
Tools
- For the vast majority of plots,
ggplot2inRis the way to go. Find a tutorial and follow it. - If you use python for everything like I do, try the pygg library. It gives you ggplot2 syntax in python. It can't handle multiple layers, which you probably shouldn't be doing anyways.
There are simple steps to better understand what's going on. These are also the steps for presenting an experiment. Do the following
- List your hypothesis
- What ideal plots will help validate or debunk your hypothesis?
- Draw out what the plots should look like if your hypothesis is correct and if incorrect
- include x and y axes, shape of curves
- What will you do to generate these plots?
- Why do the plots look this way?
- Which parts confirm your hypothesis?
- What's surprising/does not confirm your hypothesis? Why?
- What are the next steps to
- answer 4ii
- use what you learned in your system
- Go to step 1
You'll likely setup many application/server deployments as part of research. Automate your deployment
SSH
- google for "passwordless ssh"
- http://www.linuxproblem.org/art_9.html
Fabric
-
create a
fabfile.pyin your project directoryfrom fabric.api import run, env, local env.hosts = ['clic.cs.columbia.edu'] def taskA(): run('ls') # runs on your env.hosts machines def taskB(): local('ls') # runs on your local machine -
Type the following in the same directory as
fabfile.pyto list the commandsfab -l -
documentation
Releasing working code is a very good idea. If you are writing a python application, make it pip installable. The following repo provides a skeleton and simple example code for structuring a python package so it can be uploaded to pypi.
PostgreSQL code for defining 95% confidence intervals
DROP LANGUAGE plpythonu;
CREATE LANGUAGE plpythonu;
DROP FUNCTION IF EXISTS ci_final(numeric[]) cascade;
CREATE FUNCTION ci_final(vs numeric[])
RETURNS numeric[]
as $$
vs = args[0]
sortedvs = sorted(vs)
from scikits import bootstrap
return bootstrap.ci(sortedvs, alpha=0.05)
$$ language plpythonu;
DROP AGGREGATE IF EXISTS ci(numeric);
CREATE AGGREGATE ci (numeric) (
SFUNC = array_append,
STYPE = numeric[],
initcond = '{}',
FINALFUNC = ci_final
);
-- an example query
SELECT ci(measure)[0] AS lower_bound,
ci(measure)[1] AS upper_bound
FROM dataset