Skip to content

Add fact parser validation, tests, and bulk fact loading with tests#100

Open
ColtonPayne wants to merge 22 commits intomainfrom
input-validation
Open

Add fact parser validation, tests, and bulk fact loading with tests#100
ColtonPayne wants to merge 22 commits intomainfrom
input-validation

Conversation

@ColtonPayne
Copy link
Collaborator

@ColtonPayne ColtonPayne commented Jan 21, 2026

Summary

This PR adds comprehensive input validation to the fact parser and implements bulk fact loading from CSV files with extensive test coverage.

Fact Parser Validation (fact_parser.py) (Issue #91 )

Bulk Fact Loading (pyreason.py)

  • Implemented add_fact_in_bulk() function to load facts from CSV files
  • Supports optional header row detection
  • Handles optional columns: name, start_time, end_time, static
  • Provides warnings for invalid data (malformed facts, invalid times, etc.) without crashing
  • Supports multiple boolean formats for static field (True/true/1/yes, False/false/0/no)

Test Coverage

  • 333 new lines in test_fact_parser.py:
    • Tests for valid fact parsing (node/edge facts, intervals, negation, etc.)
    • Tests for invalid inputs (missing parentheses, empty fields, invalid characters, etc.)
    • Edge cases and boundary conditions
  • 204 new lines in test_pyreason_file_loading.py:
    • Tests for bulk fact loading from CSV
    • Warning validation for invalid facts
    • Tests with/without headers, various static value formats
    • Error handling tests

Test Data

  • Created example_facts.csv with comprehensive test scenarios including both valid and invalid facts
  • Created example_facts_no_header.csv for testing CSV without headers

Implementation Notes

  • All validation errors raise ValueError with descriptive messages
  • Bulk loading continues processing even when individual rows fail (with warnings)
  • Predicate validation regex: ^[a-zA-Z_][a-zA-Z0-9_]*$ (follows Python identifier rules)

🤖 Generated with Claude Code

@ColtonPayne ColtonPayne added the AI PR contains AI Generated Code label Jan 21, 2026
@dyumanaditya dyumanaditya self-requested a review January 29, 2026 17:39
Copy link
Contributor

@dyumanaditya dyumanaditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColtonPayne Everything looks good except for my one comment on explicit bound inverses.

- `'pred@name(node)'` - invalid characters in predicate
- `'pred(node1,node2,node3)'` - more than 2 components
- `'pred(node):[1.5,2.0]'` - values out of range [0,1]
- `'~pred(node):[0.2,0.8]'` - negation with explicit bound
Copy link
Contributor

@dyumanaditya dyumanaditya Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negation with explicit bound This should actually be allowed. The negation of an explicit bound is defined and has an explicit formula. Currently it is not supported but it needs to be.

The formula for the inverse of a bound [l, u] is: ~[l, u] = [1-u, 1-l]

"name": "seen-fact-zach",
"start_time": 0,
"end_time": 3,
"static": false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think json should have bounds for facts too. For example a fact can be: at(house):[0.5,1]; t:0->3.
Current structure has everything except the bounds.

Make it optional. If field left emoty or not included, assume [1,1]

Copy link
Collaborator Author

@ColtonPayne ColtonPayne Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON should load in the parameters expected from the public fact class. The bounds are extracted here in the parser and are validated in the parser code in the PR. We want to load facts from the JSON the same way we load them in add_fact(), since all this function is doing is calling add_fact() multiple times for each fact extracted from the JSON.

class Fact:
def init(self, fact_text: str, name: str = None, start_time: int = 0, end_time: int = 0, static: bool = False):

def add_fact_in_bulk(json_path: str, raise_errors = True) -> None:
"""Load multiple facts from a JSON file.

The CSV should have columns representing Fact attributes in this order:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keep load_facts_from_csv as a method. JSON should be default, but since we have built the functionality for CSV already, keep that in. Someone may already have a working project that uses CSV input. Someone may download a dataset in CSV format etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back support for fact loading from csv.

raise ValueError(f"Item {idx}: Invalid end_time '{fact_obj.get('end_time')}'") from None
warnings.warn(f"Item {idx}: Invalid end_time '{fact_obj.get('end_time')}', using default value")
end_time = 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check that end_time > start_time. Else throw warning and put end_time = start_time and then load fact.

Copy link
Collaborator Author

@ColtonPayne ColtonPayne Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this check here. The check you are referring to is just making sure the user entered an integer in the csv so we can apply the other validation checks correctly.

{
"fact_text": "Viewed(EmptyOptionals)"
}
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add cases where bounds are specified. Edge cases where bounds are wrong - non-numeric or outside [0,1] or lower > upper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are a part of the fact parser updates in the PR here. The load_from_json() validates that the csv inputs are integer values - see my response to the comment you left above.

Copy link
Member

@kmukherji kmukherji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments are inline.

name = row[1].strip() if len(row) > 1 and row[1].strip() else None
name = fact_obj.get('name')
if name is not None:
name = str(name).strip() if str(name).strip() else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we want to handle duplicates here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question - I added a check that requires all fact names in the json/csv to be distinct.
There is no such validation in the regular add_fact() function, so there is nothing stopping a user
from adding multiple facts with the same name using the traditional method of loading facts. We
could add a check in pyreason.py to prohibit this as well, but it would be a breaking change for
anyone who is currently running experiments with facts that have duplicate names. Because fact
names are only used in the atom trace and not in the fp operator for convergence, having duplicate
names will not impact correctness and would just leave to a confusing atom trace. I decided to handle this discrepancy by adding a warning to the add_fact method for users who add duplicate facts this way.

lower, upper = round(1 - upper, 10), round(1 - lower, 10)

bound = interval.closed(lower, upper)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyumanaditya Looks good to me, can you check this once?

@ColtonPayne ColtonPayne added the Ready for Review Awaiting PR Review label Feb 4, 2026
ColtonPayne added a commit that referenced this pull request Feb 6, 2026
Implements add_rule_from_csv() and add_rule_from_json() for bulk rule
loading, following the same pattern as bulk fact loading from PR #100.
Updates add_rules_from_file() with raise_errors parameter for
backwards-compatible error handling. Adds comprehensive test coverage.

Resolves #117

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI PR contains AI Generated Code Ready for Review Awaiting PR Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants