Create prediction platform to test altitude and distance indices #3

BrandonTrigueros · 2025-12-28T04:28:10Z

Resolves #2

Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025.

Results Summary

Analysis of 13 cities completed successfully:

Hypothesis 1: Lower A_i → Better for cycling CONFIRMED

Pearson correlation: r = -0.657, p = 0.0147 (significant)
Spearman correlation: r = -0.522, p = 0.067
R² = 0.431 (43% of variance explained)
Conclusion: Flat terrain significantly correlates with better cycling conditions

Hypothesis 2: D_i closer to 1 → Better for cycling NOT SIGNIFICANT

Pearson correlation: r = -0.483, p = 0.095 (not significant)
Spearman correlation: r = -0.313, p = 0.297
R² = 0.233
Conclusion: Network connectivity shows no statistically significant relationship

Data Source

Copenhagenize Index 2025 Edition (EIT Urban Mobility)
URL: https://copenhagenizeindex.eu/
30 top-ranked bicycle-friendly cities (scores 50.3-71.1)
13 cities successfully analyzed (2 skipped due to size limitations)

Components Added

Data Collection (`scripts/`)

retrieve_data.py: Fetch and structure Copenhagenize Index data
calculate_indices.py: Calculate A_i (altitude) and D_i (distance) indices
- Uses OpenStreetMap bike network data via OSMnx
- Open Topo Data API for elevation (free, no API key required)

Analysis Platform (`analysis/`)

prediction_platform.py: Main statistical analysis tool
- Automated city sampling (top 5, middle 5, bottom 5)
- Pearson/Spearman correlations + linear regression
- 4-panel visualization generation
- CSV export of results
- 5-minute timeout per city to handle problematic areas
- Enhanced logging with progress tracking and timing
- Early exit when calculations fail (prevents cascading timeouts)

Data & Results

data/copenhagenize_index_2025.csv: Reference dataset
results/: Generated outputs (CSV, PNG visualizations)

Documentation

Comprehensive README with methodology and usage instructions
CHANGELOG documenting development history
requirements-platform.txt: Python dependencies

Technical Improvements

Timeout mechanism (5 min/city) to skip problematic large areas
Early exit strategy: if altitude calculation fails, skip distance calculation
Real-time progress logging with flush=True for immediate visibility
Graceful handling of KeyboardInterrupt
Architecture refactor: scripts-based platform (removed obsolete package structure)

Known Limitations

Québec skipped (area 900x Overpass API limit, would require hours)
Cache improves performance on repeated runs (~40s → ~15s per city)
Large metropolitan areas may timeout (expected behavior)

Technical Stack

Python 3.12+, OSMnx, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, networkx, geopandas

Files Changed

8 commits, ~500 lines added
New: scripts/, analysis/, data/, results/ directories
Removed: bikenv/ package (unused), setup.py (obsolete)

…es (#2) Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data. ## Data Source - Copenhagenize Index 2025 Edition (EIT Urban Mobility) - Source: https://copenhagenizeindex.eu/ - Official name: 'The Global Ranking of Bicycle-Friendly Cities' - Top 30 cities with scores from 50.3 to 71.1 ## Components Added ### Data Collection - retrieve_data.py: Script to fetch/update Copenhagenize Index data - copenhagenize_index_2025.csv: Reference dataset (30 cities) ### Index Calculations - calculate_indices.py: Functions to compute A_i and D_i * altitude_index: Measures hilliness using OSM elevation data * distance_index: Measures network connectivity/compactness ### Analysis Platform - prediction_platform.py: Main statistical analysis tool * Pearson/Spearman correlation tests * Linear regression modeling * Visualization generation * CSV export of results - demo_platform.py: Simplified demo with synthetic data - verify_structure.py: Project structure validation ### Documentation - Comprehensive README with methodology and usage - CHANGELOG with development history - requirements-platform.txt for dependencies ## Hypotheses Tested 1. H1: Lower A_i → Better for cycling (flat terrain) 2. H2: D_i closer to 1 → Better for cycling (direct routes) ## Technical Stack Python 3.12+, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, osmnx, networkx, geopandas Resolves #2

- Tests data file integrity (CSV structure, data types) - Validates analysis platform structure and functions - Tests hypothesis logic with mock data - Confirms documentation completeness Results: 4/5 tests passing - ✓ Data file (2025 Copenhagenize Index) - ✓ Analysis platform structure - ✓ Documentation files - ✓ Hypothesis testing logic - ⚠ Index functions (requires OSMnx installation) Note: Full testing with OSMnx requires dependencies installation

- Add results/ to .gitignore (generated outputs) - Verified demo platform works successfully

- Fixed f-string formatting error in prediction_platform.py - Created run_analysis_skip_problematic.py to use successfully calculated cities - Analyzed 13 cities from Copenhagenize Index 2025 - Added cache/ directories to .gitignore (OSMnx temporary files) Results: - H1 (Altitude): SUPPORTED (r=-0.604, p=0.0288) Lower altitude index correlates with higher bicycle scores - H2 (Distance): NOT SIGNIFICANT (r=-0.475, p=0.101) No significant relationship found Skipped cities with problematic areas (Quebec: 900x size limit)

The bikenv/ package was not being imported or used anywhere. All functionality is in scripts/ and analysis/ directories.

…ging Features: - Added 5-minute timeout per city to skip problematic large areas (like Québec) - Enhanced progress logging with flush=True for immediate output visibility - Show [X/Y] progress counter for each city - Display elapsed time for each city calculation - Better error messages distinguishing timeouts, area limits, and other errors - Improved exception handling with KeyboardInterrupt support - Main function now validates minimum data requirements Changes: - Removed run_analysis_skip_problematic.py (no longer needed) - Old results cleared (will be regenerated with improved platform) The platform now handles edge cases gracefully and provides clear feedback during the ~10-15 minute analysis process.

Resolves #2 Critical fix: - calculate_indices_for_city now exits early if altitude calculation fails - Prevents attempting distance calculation after timeout (was causing 67min hang) - Québec timeout now properly stops after 5 minutes instead of continuing Cleanup: - Removed setup.py (project is no longer an installable package) - Updated README to document new structure as scripts-based analysis platform - All dependencies managed via requirements-platform.txt This fixes the issue where Québec timed out on altitude (5min) but then tried to calculate distance anyway, causing another 67-minute hang before user interruption.

Results from 13 successfully analyzed cities: - H1 (Altitude): CONFIRMED (r=-0.657, p=0.0147, R²=0.431) - H2 (Distance): NOT SIGNIFICANT (r=-0.483, p=0.095) Files: - results/cities_with_indices.csv (13 cities with calculated indices) - results/statistical_results.csv (hypothesis test results) - results/hypothesis_testing_results.png (visualizations) - results/altitude_index_plot.png (individual plot) Note: Québec skipped due to area size (900x Overpass limit)

Copilot

Pull request overview

This PR implements a comprehensive testing platform to evaluate whether altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025. The analysis of 13 cities found significant correlation between flat terrain and better cycling conditions (p=0.0147), while network connectivity showed no statistically significant relationship.

Key changes:

Implements altitude and distance index calculation functions using OpenStreetMap data
Creates statistical analysis platform with correlation tests and linear regression
Adds comprehensive documentation and data from Copenhagenize Index 2025
Refactors from package structure to scripts-based platform

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 21 comments.

Show a summary per file

File	Description
setup.py	Removed obsolete package setup file as project is now scripts-based
bikenv/module.py, bikenv/_api.py, bikenv/init.py	Removed placeholder package module in favor of functional scripts
scripts/retrieve_data.py	Manual data entry script for Copenhagenize Index 2025 with 30 top cities
scripts/calculate_indices.py	Core calculation functions for altitude (hilliness) and distance (connectivity) indices using OSMnx
analysis/prediction_platform.py	Main statistical analysis platform with timeout handling, correlation tests, and visualization
analysis/demo_platform.py	Demo version using synthetic data for testing without API dependencies
analysis/README.md	Comprehensive documentation of methodology, usage, and results interpretation
data/copenhagenize_index_2025.csv	Reference dataset with 30 bicycle-friendly cities and their scores
results/statistical_results.csv	Output file with correlation and regression statistics
results/cities_with_indices.csv	Calculated indices for analyzed cities
requirements-platform.txt	Python dependencies for the analysis platform
README.md	Updated project overview reflecting new scripts-based structure
CHANGELOG.md	Development history documenting implementation details
.gitignore	Added results/ and cache/ directories to ignore list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-28T04:33:42Z

scripts/calculate_indices.py

+        print(f"Error calculating distance index for {city_name}: {e}")
+        return None


The error handling also returns None without raising an exception. This is the same pattern as in calculate_altitude_index. Consider using a more explicit error handling strategy that provides better diagnostic information to the caller.

Suggested change

print(f"Error calculating distance index for {city_name}: {e}")

return None

error_message = f"Error calculating distance index for {city_name}: {e}"

print(error_message)

raise RuntimeError(error_message) from e

Copilot · 2025-12-28T04:33:42Z

scripts/retrieve_data.py

+    return cities_data
+
+
+def save_to_csv(data: List[Dict], output_file: str = "../data/copenhagenize_index_2025.csv"):


The relative path '../data/copenhagenize_index_2025.csv' assumes the script is run from the scripts/ directory. Consider using file-based path resolution or adding validation that the file exists with a helpful error message.

Copilot · 2025-12-28T04:33:42Z

analysis/README.md

+
+1. **Sample Size**: Analysis uses 15 cities for computational efficiency
+2. **API Dependencies**: Requires OpenStreetMap data access
+3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations


The documentation mentions "Elevation data may require Google Elevation API key for accurate altitude calculations" but the code in calculate_indices.py actually uses the Open Topo Data API (which is free and doesn't require an API key). This is misleading and should be corrected to accurately reflect the implementation.

Suggested change

3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations

3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution

Copilot · 2025-12-28T04:33:43Z

analysis/demo_platform.py

+    os.makedirs('../results', exist_ok=True)
+    output_path = '../results/demo_results.png'
+    plt.savefig(output_path, dpi=300, bbox_inches='tight')
+    print(f"\n✓ Saved visualization to: {output_path}")
+
+    # Save data
+    output_csv = '../results/demo_cities_with_indices.csv'


The relative paths '../data/copenhagenize_index_2025.csv', '../results/demo_results.png', and '../results/demo_cities_with_indices.csv' assume the script is run from the analysis/ directory. Use file-based path resolution for robustness.

Suggested change

os.makedirs('../results', exist_ok=True)

output_path = '../results/demo_results.png'

plt.savefig(output_path, dpi=300, bbox_inches='tight')

print(f"\n✓ Saved visualization to: {output_path}")

# Save data

output_csv = '../results/demo_cities_with_indices.csv'

results_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'results')

os.makedirs(results_dir, exist_ok=True)

output_path = os.path.join(results_dir, 'demo_results.png')

plt.savefig(output_path, dpi=300, bbox_inches='tight')

print(f"\n✓ Saved visualization to: {output_path}")

# Save data

output_csv = os.path.join(results_dir, 'demo_cities_with_indices.csv')

Copilot · 2025-12-28T04:33:43Z

scripts/calculate_indices.py

+        # Get the road network with elevation data
+        G = ox.graph_from_place(query, network_type="bike")
+
+        # Add elevation data using free Open Topo Data API (no key needed)


The function name ox.add_node_elevations_google is misleading when used with the Open Topo Data API. While the configuration at line 14-15 correctly sets the URL to opentopodata.org, the function name still references Google. This could confuse future maintainers about which API is actually being used. Consider adding a comment explaining this discrepancy or checking if OSMnx has a more appropriately named function for custom elevation APIs.

Suggested change

# Add elevation data using free Open Topo Data API (no key needed)

# Add elevation data using free Open Topo Data API (no key needed).

# Note: ox.add_node_elevations_google uses ox.settings.elevation_url_template,

# which we configured above to point to the Open Topo Data endpoint, so

# despite the function name, this does NOT call the Google Elevation API.

Copilot · 2025-12-28T04:33:46Z

analysis/prediction_platform.py

+
+    print(f"✓ Saved individual plot: altitude_index_plot.png")
+
+    plt.show()


The plt.show() call at line 386 is problematic in non-interactive environments (e.g., running on a server, in CI/CD, or without a display). This will cause the script to hang or fail. Consider making this call optional via a command-line flag or environment variable, or wrapping it in a try-except block to handle environments where display is not available.

Suggested change

plt.show()

# Optionally show plots in interactive environments.

# Controlled via environment variable to avoid blocking in headless/CI environments.

if os.environ.get("PREDICTION_PLATFORM_SHOW_PLOTS", "").lower() in ("1", "true", "yes"):

try:

plt.show()

except Exception as e:

print(f"⚠ Unable to display plots (plt.show failed): {e}")

Copilot · 2025-12-28T04:33:46Z

analysis/prediction_platform.py

+def save_results(df: pd.DataFrame, altitude_results: dict, distance_results: dict, 
+                 output_dir: str = '../results'):


Similarly, the relative path '../results' assumes execution from the analysis/ directory. This will create results in unexpected locations if the script is run from elsewhere. Use file-based path resolution for consistency.

Copilot · 2025-12-28T04:33:47Z

analysis/demo_platform.py

+    print("Creating Visualizations")
+    print("="*70)
+
+    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))


Variable fig is not used.

Suggested change

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

_, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

Copilot · 2025-12-28T04:33:47Z

scripts/calculate_indices.py

+"""
+
+import osmnx as ox
+import networkx as nx


Import of 'nx' is not used.

Suggested change

import networkx as nx

Copilot · 2025-12-28T04:33:47Z

scripts/retrieve_data.py

+Source: https://copenhagenizeindex.eu/
+"""
+
+import re


Import of 're' is not used.

Suggested change

import re

BrandonTrigueros added 9 commits December 27, 2025 16:38

chore: Update gitignore and fix test file path

77072c4

- Add results/ to .gitignore (generated outputs) - Verified demo platform works successfully

fix: Properly format gitignore to ignore results directory

fb9566a

Remove unused bikenv/ package folder

355f240

The bikenv/ package was not being imported or used anywhere. All functionality is in scripts/ and analysis/ directories.

BrandonTrigueros requested review from Copilot and fabianabarca December 28, 2025 04:28

Copilot started reviewing on behalf of BrandonTrigueros December 28, 2025 04:28 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

Fix file path handling in data loading and saving functions

98cc6d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create prediction platform to test altitude and distance indices #3

Create prediction platform to test altitude and distance indices #3

Uh oh!

BrandonTrigueros commented Dec 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		print(f"Error calculating distance index for {city_name}: {e}")
		return None

-        print(f"Error calculating distance index for {city_name}: {e}")
-        return None
+        error_message = f"Error calculating distance index for {city_name}: {e}"
+        print(error_message)
+        raise RuntimeError(error_message) from e

		return cities_data


		def save_to_csv(data: List[Dict], output_file: str = "../data/copenhagenize_index_2025.csv"):

	3. Elevation Data: May require Google Elevation API key for accurate altitude calculations
	3. Elevation Data: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution

-        # Add elevation data using free Open Topo Data API (no key needed)
+        # Add elevation data using free Open Topo Data API (no key needed).
+        # Note: ox.add_node_elevations_google uses ox.settings.elevation_url_template,
+        # which we configured above to point to the Open Topo Data endpoint, so
+        # despite the function name, this does NOT call the Google Elevation API.


		print(f"✓ Saved individual plot: altitude_index_plot.png")

		plt.show()

-    plt.show()
+    # Optionally show plots in interactive environments.
+    # Controlled via environment variable to avoid blocking in headless/CI environments.
+    if os.environ.get("PREDICTION_PLATFORM_SHOW_PLOTS", "").lower() in ("1", "true", "yes"):
+        try:
+            plt.show()
+        except Exception as e:
+            print(f"⚠ Unable to display plots (plt.show failed): {e}")

		def save_results(df: pd.DataFrame, altitude_results: dict, distance_results: dict,
		output_dir: str = '../results'):

	fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
	_, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

Create prediction platform to test altitude and distance indices #3

Are you sure you want to change the base?

Create prediction platform to test altitude and distance indices #3

Uh oh!

Conversation

BrandonTrigueros commented Dec 28, 2025

Results Summary

Hypothesis 1: Lower A_i → Better for cycling CONFIRMED

Hypothesis 2: D_i closer to 1 → Better for cycling NOT SIGNIFICANT

Data Source

Components Added

Data Collection (scripts/)

Analysis Platform (analysis/)

Data & Results

Documentation

Technical Improvements

Known Limitations

Technical Stack

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Data Collection (`scripts/`)

Analysis Platform (`analysis/`)