-
Notifications
You must be signed in to change notification settings - Fork 0
Create prediction platform to test altitude and distance indices #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…es (#2) Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data. ## Data Source - Copenhagenize Index 2025 Edition (EIT Urban Mobility) - Source: https://copenhagenizeindex.eu/ - Official name: 'The Global Ranking of Bicycle-Friendly Cities' - Top 30 cities with scores from 50.3 to 71.1 ## Components Added ### Data Collection - retrieve_data.py: Script to fetch/update Copenhagenize Index data - copenhagenize_index_2025.csv: Reference dataset (30 cities) ### Index Calculations - calculate_indices.py: Functions to compute A_i and D_i * altitude_index: Measures hilliness using OSM elevation data * distance_index: Measures network connectivity/compactness ### Analysis Platform - prediction_platform.py: Main statistical analysis tool * Pearson/Spearman correlation tests * Linear regression modeling * Visualization generation * CSV export of results - demo_platform.py: Simplified demo with synthetic data - verify_structure.py: Project structure validation ### Documentation - Comprehensive README with methodology and usage - CHANGELOG with development history - requirements-platform.txt for dependencies ## Hypotheses Tested 1. H1: Lower A_i → Better for cycling (flat terrain) 2. H2: D_i closer to 1 → Better for cycling (direct routes) ## Technical Stack Python 3.12+, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, osmnx, networkx, geopandas Resolves #2
- Tests data file integrity (CSV structure, data types) - Validates analysis platform structure and functions - Tests hypothesis logic with mock data - Confirms documentation completeness Results: 4/5 tests passing - ✓ Data file (2025 Copenhagenize Index) - ✓ Analysis platform structure - ✓ Documentation files - ✓ Hypothesis testing logic - ⚠ Index functions (requires OSMnx installation) Note: Full testing with OSMnx requires dependencies installation
- Add results/ to .gitignore (generated outputs) - Verified demo platform works successfully
- Fixed f-string formatting error in prediction_platform.py - Created run_analysis_skip_problematic.py to use successfully calculated cities - Analyzed 13 cities from Copenhagenize Index 2025 - Added cache/ directories to .gitignore (OSMnx temporary files) Results: - H1 (Altitude): SUPPORTED (r=-0.604, p=0.0288) Lower altitude index correlates with higher bicycle scores - H2 (Distance): NOT SIGNIFICANT (r=-0.475, p=0.101) No significant relationship found Skipped cities with problematic areas (Quebec: 900x size limit)
The bikenv/ package was not being imported or used anywhere. All functionality is in scripts/ and analysis/ directories.
…ging Features: - Added 5-minute timeout per city to skip problematic large areas (like Québec) - Enhanced progress logging with flush=True for immediate output visibility - Show [X/Y] progress counter for each city - Display elapsed time for each city calculation - Better error messages distinguishing timeouts, area limits, and other errors - Improved exception handling with KeyboardInterrupt support - Main function now validates minimum data requirements Changes: - Removed run_analysis_skip_problematic.py (no longer needed) - Old results cleared (will be regenerated with improved platform) The platform now handles edge cases gracefully and provides clear feedback during the ~10-15 minute analysis process.
Resolves #2 Critical fix: - calculate_indices_for_city now exits early if altitude calculation fails - Prevents attempting distance calculation after timeout (was causing 67min hang) - Québec timeout now properly stops after 5 minutes instead of continuing Cleanup: - Removed setup.py (project is no longer an installable package) - Updated README to document new structure as scripts-based analysis platform - All dependencies managed via requirements-platform.txt This fixes the issue where Québec timed out on altitude (5min) but then tried to calculate distance anyway, causing another 67-minute hang before user interruption.
Results from 13 successfully analyzed cities: - H1 (Altitude): CONFIRMED (r=-0.657, p=0.0147, R²=0.431) - H2 (Distance): NOT SIGNIFICANT (r=-0.483, p=0.095) Files: - results/cities_with_indices.csv (13 cities with calculated indices) - results/statistical_results.csv (hypothesis test results) - results/hypothesis_testing_results.png (visualizations) - results/altitude_index_plot.png (individual plot) Note: Québec skipped due to area size (900x Overpass limit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a comprehensive testing platform to evaluate whether altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025. The analysis of 13 cities found significant correlation between flat terrain and better cycling conditions (p=0.0147), while network connectivity showed no statistically significant relationship.
Key changes:
- Implements altitude and distance index calculation functions using OpenStreetMap data
- Creates statistical analysis platform with correlation tests and linear regression
- Adds comprehensive documentation and data from Copenhagenize Index 2025
- Refactors from package structure to scripts-based platform
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| setup.py | Removed obsolete package setup file as project is now scripts-based |
| bikenv/module.py, bikenv/_api.py, bikenv/init.py | Removed placeholder package module in favor of functional scripts |
| scripts/retrieve_data.py | Manual data entry script for Copenhagenize Index 2025 with 30 top cities |
| scripts/calculate_indices.py | Core calculation functions for altitude (hilliness) and distance (connectivity) indices using OSMnx |
| analysis/prediction_platform.py | Main statistical analysis platform with timeout handling, correlation tests, and visualization |
| analysis/demo_platform.py | Demo version using synthetic data for testing without API dependencies |
| analysis/README.md | Comprehensive documentation of methodology, usage, and results interpretation |
| data/copenhagenize_index_2025.csv | Reference dataset with 30 bicycle-friendly cities and their scores |
| results/statistical_results.csv | Output file with correlation and regression statistics |
| results/cities_with_indices.csv | Calculated indices for analyzed cities |
| requirements-platform.txt | Python dependencies for the analysis platform |
| README.md | Updated project overview reflecting new scripts-based structure |
| CHANGELOG.md | Development history documenting implementation details |
| .gitignore | Added results/ and cache/ directories to ignore list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| print(f"Error calculating distance index for {city_name}: {e}") | ||
| return None |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling also returns None without raising an exception. This is the same pattern as in calculate_altitude_index. Consider using a more explicit error handling strategy that provides better diagnostic information to the caller.
| print(f"Error calculating distance index for {city_name}: {e}") | |
| return None | |
| error_message = f"Error calculating distance index for {city_name}: {e}" | |
| print(error_message) | |
| raise RuntimeError(error_message) from e |
| return cities_data | ||
|
|
||
|
|
||
| def save_to_csv(data: List[Dict], output_file: str = "../data/copenhagenize_index_2025.csv"): |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative path '../data/copenhagenize_index_2025.csv' assumes the script is run from the scripts/ directory. Consider using file-based path resolution or adding validation that the file exists with a helpful error message.
|
|
||
| 1. **Sample Size**: Analysis uses 15 cities for computational efficiency | ||
| 2. **API Dependencies**: Requires OpenStreetMap data access | ||
| 3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation mentions "Elevation data may require Google Elevation API key for accurate altitude calculations" but the code in calculate_indices.py actually uses the Open Topo Data API (which is free and doesn't require an API key). This is misleading and should be corrected to accurately reflect the implementation.
| 3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations | |
| 3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution |
| os.makedirs('../results', exist_ok=True) | ||
| output_path = '../results/demo_results.png' | ||
| plt.savefig(output_path, dpi=300, bbox_inches='tight') | ||
| print(f"\n✓ Saved visualization to: {output_path}") | ||
|
|
||
| # Save data | ||
| output_csv = '../results/demo_cities_with_indices.csv' |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative paths '../data/copenhagenize_index_2025.csv', '../results/demo_results.png', and '../results/demo_cities_with_indices.csv' assume the script is run from the analysis/ directory. Use file-based path resolution for robustness.
| os.makedirs('../results', exist_ok=True) | |
| output_path = '../results/demo_results.png' | |
| plt.savefig(output_path, dpi=300, bbox_inches='tight') | |
| print(f"\n✓ Saved visualization to: {output_path}") | |
| # Save data | |
| output_csv = '../results/demo_cities_with_indices.csv' | |
| results_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'results') | |
| os.makedirs(results_dir, exist_ok=True) | |
| output_path = os.path.join(results_dir, 'demo_results.png') | |
| plt.savefig(output_path, dpi=300, bbox_inches='tight') | |
| print(f"\n✓ Saved visualization to: {output_path}") | |
| # Save data | |
| output_csv = os.path.join(results_dir, 'demo_cities_with_indices.csv') |
| # Get the road network with elevation data | ||
| G = ox.graph_from_place(query, network_type="bike") | ||
|
|
||
| # Add elevation data using free Open Topo Data API (no key needed) |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name ox.add_node_elevations_google is misleading when used with the Open Topo Data API. While the configuration at line 14-15 correctly sets the URL to opentopodata.org, the function name still references Google. This could confuse future maintainers about which API is actually being used. Consider adding a comment explaining this discrepancy or checking if OSMnx has a more appropriately named function for custom elevation APIs.
| # Add elevation data using free Open Topo Data API (no key needed) | |
| # Add elevation data using free Open Topo Data API (no key needed). | |
| # Note: ox.add_node_elevations_google uses ox.settings.elevation_url_template, | |
| # which we configured above to point to the Open Topo Data endpoint, so | |
| # despite the function name, this does NOT call the Google Elevation API. |
|
|
||
| print(f"✓ Saved individual plot: altitude_index_plot.png") | ||
|
|
||
| plt.show() |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plt.show() call at line 386 is problematic in non-interactive environments (e.g., running on a server, in CI/CD, or without a display). This will cause the script to hang or fail. Consider making this call optional via a command-line flag or environment variable, or wrapping it in a try-except block to handle environments where display is not available.
| plt.show() | |
| # Optionally show plots in interactive environments. | |
| # Controlled via environment variable to avoid blocking in headless/CI environments. | |
| if os.environ.get("PREDICTION_PLATFORM_SHOW_PLOTS", "").lower() in ("1", "true", "yes"): | |
| try: | |
| plt.show() | |
| except Exception as e: | |
| print(f"⚠ Unable to display plots (plt.show failed): {e}") |
| def save_results(df: pd.DataFrame, altitude_results: dict, distance_results: dict, | ||
| output_dir: str = '../results'): |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, the relative path '../results' assumes execution from the analysis/ directory. This will create results in unexpected locations if the script is run from elsewhere. Use file-based path resolution for consistency.
| print("Creating Visualizations") | ||
| print("="*70) | ||
|
|
||
| fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable fig is not used.
| fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) | |
| _, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) |
| """ | ||
|
|
||
| import osmnx as ox | ||
| import networkx as nx |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'nx' is not used.
| import networkx as nx |
| Source: https://copenhagenizeindex.eu/ | ||
| """ | ||
|
|
||
| import re |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 're' is not used.
| import re |
Resolves #2
Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025.
Results Summary
Analysis of 13 cities completed successfully:
Hypothesis 1: Lower A_i → Better for cycling CONFIRMED
Hypothesis 2: D_i closer to 1 → Better for cycling NOT SIGNIFICANT
Data Source
Components Added
Data Collection (
scripts/)retrieve_data.py: Fetch and structure Copenhagenize Index datacalculate_indices.py: Calculate A_i (altitude) and D_i (distance) indicesAnalysis Platform (
analysis/)prediction_platform.py: Main statistical analysis toolData & Results
data/copenhagenize_index_2025.csv: Reference datasetresults/: Generated outputs (CSV, PNG visualizations)Documentation
requirements-platform.txt: Python dependenciesTechnical Improvements
Known Limitations
Technical Stack
Python 3.12+, OSMnx, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, networkx, geopandas
Files Changed
scripts/,analysis/,data/,results/directoriesbikenv/package (unused),setup.py(obsolete)