Skip to content

Conversation

@BrandonTrigueros
Copy link

Resolves #2

Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025.

Results Summary

Analysis of 13 cities completed successfully:

Hypothesis 1: Lower A_i → Better for cycling CONFIRMED

  • Pearson correlation: r = -0.657, p = 0.0147 (significant)
  • Spearman correlation: r = -0.522, p = 0.067
  • R² = 0.431 (43% of variance explained)
  • Conclusion: Flat terrain significantly correlates with better cycling conditions

Hypothesis 2: D_i closer to 1 → Better for cycling NOT SIGNIFICANT

  • Pearson correlation: r = -0.483, p = 0.095 (not significant)
  • Spearman correlation: r = -0.313, p = 0.297
  • R² = 0.233
  • Conclusion: Network connectivity shows no statistically significant relationship

Data Source

  • Copenhagenize Index 2025 Edition (EIT Urban Mobility)
  • URL: https://copenhagenizeindex.eu/
  • 30 top-ranked bicycle-friendly cities (scores 50.3-71.1)
  • 13 cities successfully analyzed (2 skipped due to size limitations)

Components Added

Data Collection (scripts/)

  • retrieve_data.py: Fetch and structure Copenhagenize Index data
  • calculate_indices.py: Calculate A_i (altitude) and D_i (distance) indices
    • Uses OpenStreetMap bike network data via OSMnx
    • Open Topo Data API for elevation (free, no API key required)

Analysis Platform (analysis/)

  • prediction_platform.py: Main statistical analysis tool
    • Automated city sampling (top 5, middle 5, bottom 5)
    • Pearson/Spearman correlations + linear regression
    • 4-panel visualization generation
    • CSV export of results
    • 5-minute timeout per city to handle problematic areas
    • Enhanced logging with progress tracking and timing
    • Early exit when calculations fail (prevents cascading timeouts)

Data & Results

  • data/copenhagenize_index_2025.csv: Reference dataset
  • results/: Generated outputs (CSV, PNG visualizations)

Documentation

  • Comprehensive README with methodology and usage instructions
  • CHANGELOG documenting development history
  • requirements-platform.txt: Python dependencies

Technical Improvements

  • Timeout mechanism (5 min/city) to skip problematic large areas
  • Early exit strategy: if altitude calculation fails, skip distance calculation
  • Real-time progress logging with flush=True for immediate visibility
  • Graceful handling of KeyboardInterrupt
  • Architecture refactor: scripts-based platform (removed obsolete package structure)

Known Limitations

  • Québec skipped (area 900x Overpass API limit, would require hours)
  • Cache improves performance on repeated runs (~40s → ~15s per city)
  • Large metropolitan areas may timeout (expected behavior)

Technical Stack

Python 3.12+, OSMnx, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, networkx, geopandas

Files Changed

  • 8 commits, ~500 lines added
  • New: scripts/, analysis/, data/, results/ directories
  • Removed: bikenv/ package (unused), setup.py (obsolete)

…es (#2)

Implements a comprehensive testing platform to evaluate if altitude_index (A_i)
and distance_index (D_i) can predict cycling friendliness using real-world data.

## Data Source
- Copenhagenize Index 2025 Edition (EIT Urban Mobility)
- Source: https://copenhagenizeindex.eu/
- Official name: 'The Global Ranking of Bicycle-Friendly Cities'
- Top 30 cities with scores from 50.3 to 71.1

## Components Added

### Data Collection
- retrieve_data.py: Script to fetch/update Copenhagenize Index data
- copenhagenize_index_2025.csv: Reference dataset (30 cities)

### Index Calculations
- calculate_indices.py: Functions to compute A_i and D_i
  * altitude_index: Measures hilliness using OSM elevation data
  * distance_index: Measures network connectivity/compactness

### Analysis Platform
- prediction_platform.py: Main statistical analysis tool
  * Pearson/Spearman correlation tests
  * Linear regression modeling
  * Visualization generation
  * CSV export of results
- demo_platform.py: Simplified demo with synthetic data
- verify_structure.py: Project structure validation

### Documentation
- Comprehensive README with methodology and usage
- CHANGELOG with development history
- requirements-platform.txt for dependencies

## Hypotheses Tested
1. H1: Lower A_i → Better for cycling (flat terrain)
2. H2: D_i closer to 1 → Better for cycling (direct routes)

## Technical Stack
Python 3.12+, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn,
osmnx, networkx, geopandas

Resolves #2
- Tests data file integrity (CSV structure, data types)
- Validates analysis platform structure and functions
- Tests hypothesis logic with mock data
- Confirms documentation completeness

Results: 4/5 tests passing
- ✓ Data file (2025 Copenhagenize Index)
- ✓ Analysis platform structure
- ✓ Documentation files
- ✓ Hypothesis testing logic
- ⚠ Index functions (requires OSMnx installation)

Note: Full testing with OSMnx requires dependencies installation
- Add results/ to .gitignore (generated outputs)
- Verified demo platform works successfully
- Fixed f-string formatting error in prediction_platform.py
- Created run_analysis_skip_problematic.py to use successfully calculated cities
- Analyzed 13 cities from Copenhagenize Index 2025
- Added cache/ directories to .gitignore (OSMnx temporary files)

Results:
- H1 (Altitude): SUPPORTED (r=-0.604, p=0.0288)
  Lower altitude index correlates with higher bicycle scores
- H2 (Distance): NOT SIGNIFICANT (r=-0.475, p=0.101)
  No significant relationship found

Skipped cities with problematic areas (Quebec: 900x size limit)
The bikenv/ package was not being imported or used anywhere.
All functionality is in scripts/ and analysis/ directories.
…ging

Features:
- Added 5-minute timeout per city to skip problematic large areas (like Québec)
- Enhanced progress logging with flush=True for immediate output visibility
- Show [X/Y] progress counter for each city
- Display elapsed time for each city calculation
- Better error messages distinguishing timeouts, area limits, and other errors
- Improved exception handling with KeyboardInterrupt support
- Main function now validates minimum data requirements

Changes:
- Removed run_analysis_skip_problematic.py (no longer needed)
- Old results cleared (will be regenerated with improved platform)

The platform now handles edge cases gracefully and provides clear feedback
during the ~10-15 minute analysis process.
Resolves #2

Critical fix:
- calculate_indices_for_city now exits early if altitude calculation fails
- Prevents attempting distance calculation after timeout (was causing 67min hang)
- Québec timeout now properly stops after 5 minutes instead of continuing

Cleanup:
- Removed setup.py (project is no longer an installable package)
- Updated README to document new structure as scripts-based analysis platform
- All dependencies managed via requirements-platform.txt

This fixes the issue where Québec timed out on altitude (5min) but then
tried to calculate distance anyway, causing another 67-minute hang before
user interruption.
Results from 13 successfully analyzed cities:
- H1 (Altitude): CONFIRMED (r=-0.657, p=0.0147, R²=0.431)
- H2 (Distance): NOT SIGNIFICANT (r=-0.483, p=0.095)

Files:
- results/cities_with_indices.csv (13 cities with calculated indices)
- results/statistical_results.csv (hypothesis test results)
- results/hypothesis_testing_results.png (visualizations)
- results/altitude_index_plot.png (individual plot)

Note: Québec skipped due to area size (900x Overpass limit)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive testing platform to evaluate whether altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025. The analysis of 13 cities found significant correlation between flat terrain and better cycling conditions (p=0.0147), while network connectivity showed no statistically significant relationship.

Key changes:

  • Implements altitude and distance index calculation functions using OpenStreetMap data
  • Creates statistical analysis platform with correlation tests and linear regression
  • Adds comprehensive documentation and data from Copenhagenize Index 2025
  • Refactors from package structure to scripts-based platform

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
setup.py Removed obsolete package setup file as project is now scripts-based
bikenv/module.py, bikenv/_api.py, bikenv/init.py Removed placeholder package module in favor of functional scripts
scripts/retrieve_data.py Manual data entry script for Copenhagenize Index 2025 with 30 top cities
scripts/calculate_indices.py Core calculation functions for altitude (hilliness) and distance (connectivity) indices using OSMnx
analysis/prediction_platform.py Main statistical analysis platform with timeout handling, correlation tests, and visualization
analysis/demo_platform.py Demo version using synthetic data for testing without API dependencies
analysis/README.md Comprehensive documentation of methodology, usage, and results interpretation
data/copenhagenize_index_2025.csv Reference dataset with 30 bicycle-friendly cities and their scores
results/statistical_results.csv Output file with correlation and regression statistics
results/cities_with_indices.csv Calculated indices for analyzed cities
requirements-platform.txt Python dependencies for the analysis platform
README.md Updated project overview reflecting new scripts-based structure
CHANGELOG.md Development history documenting implementation details
.gitignore Added results/ and cache/ directories to ignore list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +146
print(f"Error calculating distance index for {city_name}: {e}")
return None
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling also returns None without raising an exception. This is the same pattern as in calculate_altitude_index. Consider using a more explicit error handling strategy that provides better diagnostic information to the caller.

Suggested change
print(f"Error calculating distance index for {city_name}: {e}")
return None
error_message = f"Error calculating distance index for {city_name}: {e}"
print(error_message)
raise RuntimeError(error_message) from e

Copilot uses AI. Check for mistakes.
return cities_data


def save_to_csv(data: List[Dict], output_file: str = "../data/copenhagenize_index_2025.csv"):
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative path '../data/copenhagenize_index_2025.csv' assumes the script is run from the scripts/ directory. Consider using file-based path resolution or adding validation that the file exists with a helpful error message.

Copilot uses AI. Check for mistakes.

1. **Sample Size**: Analysis uses 15 cities for computational efficiency
2. **API Dependencies**: Requires OpenStreetMap data access
3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions "Elevation data may require Google Elevation API key for accurate altitude calculations" but the code in calculate_indices.py actually uses the Open Topo Data API (which is free and doesn't require an API key). This is misleading and should be corrected to accurately reflect the implementation.

Suggested change
3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations
3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution

Copilot uses AI. Check for mistakes.
Comment on lines +128 to +134
os.makedirs('../results', exist_ok=True)
output_path = '../results/demo_results.png'
plt.savefig(output_path, dpi=300, bbox_inches='tight')
print(f"\n✓ Saved visualization to: {output_path}")

# Save data
output_csv = '../results/demo_cities_with_indices.csv'
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative paths '../data/copenhagenize_index_2025.csv', '../results/demo_results.png', and '../results/demo_cities_with_indices.csv' assume the script is run from the analysis/ directory. Use file-based path resolution for robustness.

Suggested change
os.makedirs('../results', exist_ok=True)
output_path = '../results/demo_results.png'
plt.savefig(output_path, dpi=300, bbox_inches='tight')
print(f"\n✓ Saved visualization to: {output_path}")
# Save data
output_csv = '../results/demo_cities_with_indices.csv'
results_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'results')
os.makedirs(results_dir, exist_ok=True)
output_path = os.path.join(results_dir, 'demo_results.png')
plt.savefig(output_path, dpi=300, bbox_inches='tight')
print(f"\n✓ Saved visualization to: {output_path}")
# Save data
output_csv = os.path.join(results_dir, 'demo_cities_with_indices.csv')

Copilot uses AI. Check for mistakes.
# Get the road network with elevation data
G = ox.graph_from_place(query, network_type="bike")

# Add elevation data using free Open Topo Data API (no key needed)
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name ox.add_node_elevations_google is misleading when used with the Open Topo Data API. While the configuration at line 14-15 correctly sets the URL to opentopodata.org, the function name still references Google. This could confuse future maintainers about which API is actually being used. Consider adding a comment explaining this discrepancy or checking if OSMnx has a more appropriately named function for custom elevation APIs.

Suggested change
# Add elevation data using free Open Topo Data API (no key needed)
# Add elevation data using free Open Topo Data API (no key needed).
# Note: ox.add_node_elevations_google uses ox.settings.elevation_url_template,
# which we configured above to point to the Open Topo Data endpoint, so
# despite the function name, this does NOT call the Google Elevation API.

Copilot uses AI. Check for mistakes.

print(f"✓ Saved individual plot: altitude_index_plot.png")

plt.show()
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plt.show() call at line 386 is problematic in non-interactive environments (e.g., running on a server, in CI/CD, or without a display). This will cause the script to hang or fail. Consider making this call optional via a command-line flag or environment variable, or wrapping it in a try-except block to handle environments where display is not available.

Suggested change
plt.show()
# Optionally show plots in interactive environments.
# Controlled via environment variable to avoid blocking in headless/CI environments.
if os.environ.get("PREDICTION_PLATFORM_SHOW_PLOTS", "").lower() in ("1", "true", "yes"):
try:
plt.show()
except Exception as e:
print(f"⚠ Unable to display plots (plt.show failed): {e}")

Copilot uses AI. Check for mistakes.
Comment on lines +389 to +390
def save_results(df: pd.DataFrame, altitude_results: dict, distance_results: dict,
output_dir: str = '../results'):
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the relative path '../results' assumes execution from the analysis/ directory. This will create results in unexpected locations if the script is run from elsewhere. Use file-based path resolution for consistency.

Copilot uses AI. Check for mistakes.
print("Creating Visualizations")
print("="*70)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable fig is not used.

Suggested change
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
_, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

Copilot uses AI. Check for mistakes.
"""

import osmnx as ox
import networkx as nx
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'nx' is not used.

Suggested change
import networkx as nx

Copilot uses AI. Check for mistakes.
Source: https://copenhagenizeindex.eu/
"""

import re
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 're' is not used.

Suggested change
import re

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create platform to test prediction accuracy

2 participants