Skip to content

Restructure as Python Package, Internalize Input Builder, and Improve Robustness#6

Open
zsarnoczay wants to merge 14 commits intoOpenPBEE:mainfrom
zsarnoczay:create_python_package
Open

Restructure as Python Package, Internalize Input Builder, and Improve Robustness#6
zsarnoczay wants to merge 14 commits intoOpenPBEE:mainfrom
zsarnoczay:create_python_package

Conversation

@zsarnoczay
Copy link
Collaborator

Summary

This PR transforms the codebase from a collection of scripts into a structured, installable Python package (atc138). It modernizes the workflow by introducing a standard CLI, internalizes and slightly refactors the input generation script for robustness, and fixes an error in the repair schedule simulation.

Key Changes

1. Package Structure & CLI (Major)

  • Repo Restructure: Moved all source code to a standard src/atc138 layout to support packaging and distribution.
  • pyproject.toml: Added modern build configuration to replace setup scripts and define dependencies (NumPy, Pandas, etc.).
  • CLI Implementation: Created src/atc138/cli.py to provide a unified entry point.
    • Usage: python -m atc138.cli <model_dir> <output_dir> ; see README.md for more information.
  • Relative Imports: Updated internal modules (e.g., driver.py, repair_schedule) to use relative imports, ensuring the package works locally and when installed.

2. Robust Input Builder Refactor

  • Internalization: Refactored the standalone build_inputs.py into the atc138.input_builder module, integrating it into the main pipeline.
  • Auto-Trigger: The CLI now detects when simulated_inputs.json is missing and automatically triggers generation from raw inputs, removing manual setup steps.
  • Robustness Improvements:
    • Implemented clean_types for safe JSON serialization of NumPy types (preserving NaN).
    • Switched to path-independent file I/O (replacing os.chdir).
    • Replaced scalar extraction with robust Pandas .iloc accessors.
    • Improved optional inputs merging (loading default_inputs.json + optional_inputs.json override).
  • Cleanup: Removed inputs/ folder in favor of a clean examples/ directory structure.

3. Repair Schedule Fix

  • Scalar Safety: Enforced explicit scalar extraction (e.g., [0][0]) for system indices in np.where calls to prevent dimension mismatch errors in downstream calculations.

4. Documentation

  • Updated README: Updated README.md to reflect the package changes.
    • Installation instructions (pip install -e .).
    • Documented the CLI and new workflow.
    • Clarified configuration options and input data requirements.
  • Git Hygiene: Added git clone instructions and cleaned up requirements.txt.

Verification

  • Installation: Verified pip install -e . works correctly in a virtual environment.
  • Example Run: Ran the ICSB example via the new CLI:
    python -m atc138.cli examples/ICSB examples/ICSB/output
  • Results: Verified successful generation of simulated_inputs.json and recovery_outputs.json.

Establishes the foundational src-layout for the Python package, renaming the primary module to `atc138`.

Changes:
- Created `src/atc138` directory structure.
- Moved and renamed core scripts:
  - `main_PBEE_recovery.py` -> `src/atc138/engine.py`
  - `driver_PBEE_recovery.py` -> `src/atc138/driver.py`
  - `fn_red_tag.py` -> `src/atc138/red_tag.py`
- Moved data directories:
  - `static_tables/` -> `src/atc138/data/`
- Moved submodules to package root:
  - `functionality/`
  - `impedance/`
  - `preprocessing/`
  - `repair_schedule/`
- Added `src/atc138/__init__.py`.
Adds `pyproject.toml` to define the build system and dependencies for the `atc138` package.

Changes:
- Added `pyproject.toml` with `setuptools` build backend.
- Defined package metadata (name, version, authors).
- Listed runtime dependencies: `numpy`, `pandas`, `scipy`, `matplotlib`, `seaborn`.
- Verified editable installation.
Adds a command-line interface `atc138` and refactors I/O to be robust and independent of CWD.

Changes:
- Added `src/atc138/cli.py` with `atc138` entry point accepting explicit `input_dir` and `output_dir`.
- Registered `atc138` script in `pyproject.toml`.
- Refactored `src/atc138/driver.py`:
  - `run_analysis` now accepts `input_dir` and `output_dir` instead of deriving paths.
  - Removed internal dependency on relative directory structures (`inputs/example_inputs`).
  - Updated path resolution for static data (`src/atc138/data`).
- Refactored `src/atc138/engine.py` to use relative imports.
- Added `__init__.py` to submodules to make them proper packages.
- Moved input generation logic to `src/atc138/input_builder.py`. This is the implementation in the old `inputs/Inputs2Copy/build_input.py` with minimal adjustments to make it compliant with package structure.
- Moved `inputs/example_inputs` to `examples/` and removed `inputs/` directory.
- Added `src/atc138/data/default_inputs.json` for centralized configuration. This file is equivalent to the output generated by the old `optional_inputs.py`.
- Refactored `driver.py` to support auto-generation of inputs with the new input_builder module. If the input folder already has a `simulated_inputs.json` file, it will be used. Otherwise, it will be generated. Another minor update in `driver.py` is enhancing how story indices in some o the inputs are converted to int from strings. The updated version can handle both int and string representations of story index inputs.
- Refactored submodules (`functionality`, `impedance`, `repair_schedule`) to use relative imports.
- Updated `.gitignore` to ignore output directories.
- Moved imports to the module level.
- Removed usage of `os.chdir()` which mutated global state.
- Updated all file operations to use `os.path.join(model_dir, ...)` for explicit path resolution.
- This creates path independence, allowing the builder to function correctly regardless of the current working directory.
- Scalar/Array handling: Changed `comp_attr[0, [(col_idx)]]` to `comp_attr[0, col_idx]` in component info loop. This explicitly extracts the scalar value from the DataFrame/array, preventing `ValueError: setting an array element with a sequence` when the target list expects scalars.
- Key Generation: Enforced integer suffixes for `qty_dir_X` keys (e.g., `qty_dir_1` vs `qty_dir_1.0`) to match downstream expectations in `red_tag.py`.
- Missing Data Handling: Added zero-filling for missing story-direction pairs in `building_model`. This prevents `KeyError` in `red_tag.py` (which iterates hardcoded directions 1-3).
  - Verified: Downstream usage in `red_tag.py` calculates `ratio = damage / quantity`. With zero quantity (and zero damage implied), this results in `NaN`, which evaluates to `False` against red tag thresholds, safely avoiding false positives or crashes.
- Added `clean_types` recursive helper function to handle type conversion.
  - Converts Numpy types (`int64`, `float32`, `ndarray`) to native Python types (`int`, `float`, `list`) ensuring successful `json.dump`.
  - Preserves `NaN` values as `float('nan')` instead of `None` or crashing. This is critical for downstream engine compatibility where `NaN` is used to skip calculations (e.g., missing temp repair times).
- Replaced manual type conversion loops with a single pass of `clean_types` on the final `simulated_inputs` dictionary.
- Simplified `tenant_units` DataFrame-to-Dict conversion using `.to_dict(orient='list')`.
- Added `recursive_update` helper to support deep merging of dictionaries.
- Updated "OPTIONAL INPUTS" logic to:
  1. Load complete default configuration from `src/atc138/data/default_inputs.json`.
  2. Load user-provided `optional_inputs.json` (if present) from the model directory.
  3. Recursively merge user options into defaults.
- This ensures the simulation always has a complete set of configuration parameters, even if the user provides a partial overrides file, significantly improving robustness against missing configuration keys.
- Refactored logic to use `.loc` / `.iloc` for strict scalar extraction.
  - Previous code filtered dataframes (returning a 1-row Series/DataFrame) and tried to assign them to scalar slots. This raises errors in modern Pandas because "setting an array element with a sequence" is ambiguous.
  - New logic uses `.iloc[0]` to explicitly extract the scalar value from the first row of the filtered result, ensuring safe assignment.
  - Because variables (e.g. `ds_attr`) are now Series rather than single-row DataFrames, redundant `[0]` indexing was removed.
- Replaced fragile chained indexing (e.g., `df['col'][idx] = val`) with robust loc-based assignment (`df.loc[idx, 'col'] = val`) to prevent `SettingWithCopyWarning`.
- **Regex Loop Improvement:** Updated regex loop to append `True`/`False` instead of `1`/`0`. This ensures the resulting array is treated as a Boolean Mask by Pandas, preventing ambiguity where `0`/`1` integer arrays could be misinterpreted as column keys.
- **Logic Cleanup:** Refactored `ds_sub_id` handling to resolve NaNs *before* appending to the list.
- Removed unnecessary `DataFrame -> NumPy -> DataFrame` round-trip conversion, utilizing direct Pandas filtering which preserves original data types better.
- **Resource Management:** Switched from `json.loads(open(...).read())` to `with open(...)` context managers for all file reads. This ensures file handles are properly closed, preventing resource leaks.
- **Exceptions:** Replaced `sys.exit('error...')` with `raise ValueError('error...')`. Raising exceptions is preferred for library/module code as it allows the calling application to handle the error rather than abruptly terminating the interpreter.
- **Repair Cost Logic:** Fixed checking for 'repair_cost_ratio_engineering' using `if key not in dict` instead of `if key in dict.keys() == False`. Refactored the calculation to use efficient Numpy array accumulation instead of loop-based list updates.
- **CSV Loading:** Removed redundant arguments (`header=0`, `encoding='unicode_escape'`) where standard `pd.read_csv` defaults suffice, aligning with clean codebase standards.
…y models

- Updated the normalization logic for single-story building models to check if attributes are already lists/sequences before wrapping them.
- This idempotency ensures that if the input `building_model.json` is already correctly formatted (as a list) logic doesn't corrupt the data by creating nested lists (e.g. `[[val]]` instead of `[val]`).
- Activated the function docstring for `build_simulated_inputs` to improve code discoverability and documentation in IDEs.
- Updated index retrieval in `fn_set_repair_constraints` to explicitly extract the scalar integer from `np.where` results.
- Added `[0]` (resulting in `[0][0]`) to `np.where(...)` calls when looking up 'interior', and 'structural' system indices.
- This ensures the indices are passed as native integers (or scalar numpy ints) rather than single-element arrays, preventing dimension mismatch errors when finding constraints or indexing into matrices.
Updates readme to reflect CLI installation process
@dustin-cook
Copy link
Collaborator

I reviewed each commit, tested the functionality and then updated the ReadMe. This looks good to me. @hgp297 please merge when you fell this is ready.

Copy link
Collaborator

@dustin-cook dustin-cook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants