Skip to content

Tips: Folder Structure

jmgclark edited this page May 18, 2016 · 1 revision

Folder Structure Best Practices


The following is a suggested structure for organizing data and code related to a research project in a way that others can more easily understand and replicate your project. We've created a template folder (download here) that includes a basic folder structure, readme file, and suggested headings for code in R and Stata. These specifics may not work for every situation, but the concept can be adapted to just about any project. In general:

  • Data folders should be broken down into raw data, cleaned data, and/or final data for analysis (and any other categories that may be necessary).

  • Data folders should not contain code in them.

  • Any code subfolders should be numbered in the order that they should be run.

    • E.g. 01_Cleaning, 02_Data_Prep, 03_Analysis.
    • For each of these folders, it is helpful to have a master run file (e.g. master do file for those using stata) that runs all of the code within that folder.
    • There should be a master file that runs all of the other "sub-master" files. Ideally, someone downloading your code and data should be able to replicate the entire work by simply running a single file.
  • Results should be saved to a separate output folder.

  • The master folder should contain a readme file with information that helps the user navigate and understand the folder contents

Below is a screenshot of a folder structure following these guidelines:

Clone this wiki locally