This Streamlit application automatically redacts sensitive information from transcript files using Microsoft Presidio. It is designed for anonymizing transcriptions of Stanford Deliberative Democracy Lab sessions, but can be used for any text-based content.
- File formats supported:
.txtand.xlsx - User-configurable settings:
- Select which entity types to redact from a predefined list (e.g.,
PERSON,LOCATION,EMAIL_ADDRESS,PHONE_NUMBER,ORGANIZATION, etc.) - Choose your own redaction replacement text (default:
REDACTED)
- Select which entity types to redact from a predefined list (e.g.,
- Customizable column selection for Excel files
- Clear, consistent replacement for all detected entities
- Download redacted output in the same format as uploaded
- Choose which entity types Presidio should detect.
- Set the replacement text to be used for redacted entities.
- Upload a
.txtor.xlsxtranscript file. - For
.xlsxfiles, choose which column contains the text to redact. - The app detects only your selected entities and replaces them with your chosen text.
- Download the anonymized file.
The tool is deployed online — no installation required.
- Visit ddl-transcript-anonymizer.streamlit.app
- Select entity types to redact from the dropdown.
- Enter your preferred replacement text.
- Upload your
.txtor.xlsxtranscript file. - (For Excel files) Select the column containing the text to redact.
- View the redacted text directly in the browser.
- Download your anonymized file in the same format as uploaded.
- Entities to redact: Configured via the dropdown menu; defaults to
PERSONandLOCATION. - Replacement text: Configured via the input field; defaults to
REDACTED. - Exclusion list: Certain location terms like "United States" are not redacted (can be changed in code).
- Entity list: You can expand or reduce
ENTITY_OPTIONSin the code.