Video Annotation Tool

Automated video annotation and analysis tool using Google Gemini Vision Language Model for behavioral research

Transform hours of manual video annotation into minutes of automated analysis while maintaining research-grade accuracy. Originally developed at Cornell's Interaction Research Lab for human-robot interaction studies.

How It Works

📁 Box Video → 🤖 Gemini VLM → 📊 Excel Annotations

The tool automatically:

Downloads videos from Box (with OAuth authentication)
Chunks large files (>2GB) into processable segments
Analyzes each segment with Gemini's vision capabilities
Extracts interactions, dialogue, emotions, and behaviors
Combines results into structured Excel output

Quick Start

Prerequisites

Google Cloud account with Gemini API access
Box account with videos to analyze
Google Colab (highly recommended so you can link with Google Drive)

One-Command Setup

# In Google Colab - run this cell:
!pip install google-generativeai boxsdk pandas openpyxl

# Configure your credentials
GEMINI_API_KEY = "your-gemini-api-key"
BOX_CLIENT_ID = "your-box-client-id" 
BOX_CLIENT_SECRET = "your-box-client-secret"

Run Analysis

# Get Box file ID from URL: box.com/file/123456789 → use "123456789"
BOX_FILE_ID = "123456789"
OUTPUT_PATH = "annotations.xlsx"

df, annotations = complete_box_to_annotations_pipeline(BOX_FILE_ID, OUTPUT_PATH)

What You Get

Structured Output

Each interaction includes:

Timestamps (start/end times)
Interaction type (approaching, talking, photographing, etc.)
Detailed observations (behavior, body language, context)
Direct quotes from audio
Emotional classification (positive, negative, neutral, mixed)
Confidence scores (0.0-1.0)

Sample Results

Time	Event	Observations	Dialogue	Emotion
02:15-02:34	talking	Woman approaches robot hesitantly, takes photo while laughing with friend	"What is this thing? It's so cool!"	positive
05:42-06:01	avoiding	Man deliberately walks wide path around robot, shakes head	"I don't trust those things"	negative

Performance Validation

Tested on 5-minute HRI video segments:

Precision: 90% (9/10 auto-detected were valid!)
Recall: 75% (9/12 human-detected were found)
Agreement on interaction types: 89%
Time savings: ~95% reduction in annotation time

The system consistently identifies major social interactions while occasionally missing very brief or ambiguous behaviors.

Technical Features

Smart File Handling

Small files (<2GB): Direct processing
Large files (>2GB): Automatic chunking with timestamp coordination
Any length: From minutes to hours of footage

Cost Efficiency

~$0.04 per 26-minute video (Gemini Flash)
~$2-5 for 20 hours of video analysis
Automatic cleanup of temporary files

Research Integration

Excel output compatible with statistical software
Confidence scoring for quality control
Maintains original video timestamps across chunks

Use Cases

Human-Robot Interaction: Public reactions to autonomous systems
Public Space Studies: Social dynamics in urban environments
Accessibility Research: Barrier identification and accommodation analysis
Customer Behavior: Shopping and service interaction patterns
Educational Research: Classroom interaction analysis

Setup Guide

1. Google Cloud Setup

Create Google Cloud account
Enable Gemini API
Create API key
Set billing alerts ($10/month recommended)

2. Box Integration

Create Box developer app
Get Client ID and Secret
Set redirect URL to: https://irl.tech.cornell.edu/video-annotator/box-oauth-redirect.html

3. Run in Colab

Open the notebook in Google Colab
Add your credentials
Run the complete pipeline function

Limitations & Best Practices

What Works Best

Clear video quality with audible speech
Direct human interactions and behaviors
Videos with obvious social dynamics

Current Limitations

File size: 2GB max per chunk (handled automatically)
Subtle behaviors: May miss very brief or ambiguous actions
Audio dependent: Requires clear speech for dialogue capture
Processing time: ~2-3 minutes per video minute

Recommended Workflow

Test on a sample video first
Review low-confidence annotations manually
Validate results against manual coding subset
Document any prompt modifications for reproducibility

Research Applications

Originally developed for human-robot interaction studies at Cornell's Interaction Research Lab under Dr. Wendy Ju. The tool addresses the common research challenge of vast amounts of unanalyzed video data sitting in storage.

Citation

If you use this tool in your research, please cite the Interaction Research Lab at Cornell University.

Contributing

This tool is actively developed for research use. Suggestions and improvements welcome for:

Additional interaction categories
Domain-specific prompt templates
Integration with other video platforms
Statistical analysis extensions

Cornell Interaction Research Lab | Making video analysis easier for researchers!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
data/example_video		data/example_video
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
behavioral_video_annotator.ipynb		behavioral_video_annotator.ipynb
box-oauth-redirect.html		box-oauth-redirect.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Annotation Tool

How It Works

Quick Start

Prerequisites

One-Command Setup

Run Analysis

What You Get

Structured Output

Sample Results

Performance Validation

Technical Features

Smart File Handling

Cost Efficiency

Research Integration

Use Cases

Setup Guide

1. Google Cloud Setup

2. Box Integration

3. Run in Colab

Limitations & Best Practices

What Works Best

Current Limitations

Recommended Workflow

Research Applications

Citation

Contributing

About

Uh oh!

Releases

Packages

Languages

License

Alubeto/video-annotator

Folders and files

Latest commit

History

Repository files navigation

Video Annotation Tool

How It Works

Quick Start

Prerequisites

One-Command Setup

Run Analysis

What You Get

Structured Output

Sample Results

Performance Validation

Technical Features

Smart File Handling

Cost Efficiency

Research Integration

Use Cases

Setup Guide

1. Google Cloud Setup

2. Box Integration

3. Run in Colab

Limitations & Best Practices

What Works Best

Current Limitations

Recommended Workflow

Research Applications

Citation

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages