-
Notifications
You must be signed in to change notification settings - Fork 9
PDEL Data Transparency Services
Researchers, especially junior faculty under pressure to publish to make tenure, often do not have either the time nor the research support needed to clean, annotate and publish the data files at the conclusion of their research. Too often competing research demands and new projects take their attention away from this all-important step. Yet this is a crucial part of conducting transparent and reproducible social science: it bolsters the quality of research by permitting replication, and it expands investments made in generating new data by permitting other authors to extend pre-existing data in dimensions not envisioned by the original PIs.
The intent of PDEL’s data publication services is to take datasets that are currently not in the public domain and put in the work required to anonymize and annotate them so that they are ready for publication and dissemination. The end goal is to migrate the cleaned datasets and code into the UCSD Library Digital Collection at the end of each project.
Our RAs can help this process by providing the following support:
- Carefully anonymize data in compliance with IRB requirements;
- Clean, streamline, and annotate code and data to meet standards for transparency and replicability;
- Replicate data analyses and cross-check with results in the paper;
- Assist in setting up new projects in GitHub (an online repository with superior version control) to store all the code and files used in the research process;
- Work with the UCSD Library to enter the data and metadata into the Digital Collection, which provides unlimited data storage and permanent Digital Object Identifiers (unique universal library identifier) for the researchers’ benefit.
When considering using these services, we ask that PIs send a short description of the data and be able to answer “yes” to the following questions:
- Is the data an original dataset, either original survey or experiment data or a novel dataset that merges publically available data in a new way?
- Do you have final results that have been or are about to be submitted to a journal?
- Can you share the data and code for replication?
- Are we able to publish the raw data within two years? (we can embargo publication for this period)
- How likely do you think these data will be used by other researchers? (On a scale from 1-10)