Add fact parser validation, tests, and bulk fact loading with tests#100
Add fact parser validation, tests, and bulk fact loading with tests#100ColtonPayne wants to merge 22 commits intomainfrom
Conversation
dyumanaditya
left a comment
There was a problem hiding this comment.
@ColtonPayne Everything looks good except for my one comment on explicit bound inverses.
pyreason/scripts/facts/fact.py
Outdated
| - `'pred@name(node)'` - invalid characters in predicate | ||
| - `'pred(node1,node2,node3)'` - more than 2 components | ||
| - `'pred(node):[1.5,2.0]'` - values out of range [0,1] | ||
| - `'~pred(node):[0.2,0.8]'` - negation with explicit bound |
There was a problem hiding this comment.
negation with explicit bound This should actually be allowed. The negation of an explicit bound is defined and has an explicit formula. Currently it is not supported but it needs to be.
The formula for the inverse of a bound [l, u] is: ~[l, u] = [1-u, 1-l]
| "name": "seen-fact-zach", | ||
| "start_time": 0, | ||
| "end_time": 3, | ||
| "static": false |
There was a problem hiding this comment.
I think json should have bounds for facts too. For example a fact can be: at(house):[0.5,1]; t:0->3.
Current structure has everything except the bounds.
Make it optional. If field left emoty or not included, assume [1,1]
There was a problem hiding this comment.
The JSON should load in the parameters expected from the public fact class. The bounds are extracted here in the parser and are validated in the parser code in the PR. We want to load facts from the JSON the same way we load them in add_fact(), since all this function is doing is calling add_fact() multiple times for each fact extracted from the JSON.
class Fact:
def init(self, fact_text: str, name: str = None, start_time: int = 0, end_time: int = 0, static: bool = False):
pyreason/pyreason.py
Outdated
| def add_fact_in_bulk(json_path: str, raise_errors = True) -> None: | ||
| """Load multiple facts from a JSON file. | ||
|
|
||
| The CSV should have columns representing Fact attributes in this order: |
There was a problem hiding this comment.
I think keep load_facts_from_csv as a method. JSON should be default, but since we have built the functionality for CSV already, keep that in. Someone may already have a working project that uses CSV input. Someone may download a dataset in CSV format etc.
There was a problem hiding this comment.
Added back support for fact loading from csv.
pyreason/pyreason.py
Outdated
| raise ValueError(f"Item {idx}: Invalid end_time '{fact_obj.get('end_time')}'") from None | ||
| warnings.warn(f"Item {idx}: Invalid end_time '{fact_obj.get('end_time')}', using default value") | ||
| end_time = 0 | ||
|
|
There was a problem hiding this comment.
Add a check that end_time > start_time. Else throw warning and put end_time = start_time and then load fact.
There was a problem hiding this comment.
We have this check here. The check you are referring to is just making sure the user entered an integer in the csv so we can apply the other validation checks correctly.
| { | ||
| "fact_text": "Viewed(EmptyOptionals)" | ||
| } | ||
| ] |
There was a problem hiding this comment.
Add cases where bounds are specified. Edge cases where bounds are wrong - non-numeric or outside [0,1] or lower > upper.
There was a problem hiding this comment.
These checks are a part of the fact parser updates in the PR here. The load_from_json() validates that the csv inputs are integer values - see my response to the comment you left above.
kmukherji
left a comment
There was a problem hiding this comment.
Some more comments are inline.
pyreason/pyreason.py
Outdated
| name = row[1].strip() if len(row) > 1 and row[1].strip() else None | ||
| name = fact_obj.get('name') | ||
| if name is not None: | ||
| name = str(name).strip() if str(name).strip() else None |
There was a problem hiding this comment.
How do we want to handle duplicates here?
There was a problem hiding this comment.
This is a good question - I added a check that requires all fact names in the json/csv to be distinct.
There is no such validation in the regular add_fact() function, so there is nothing stopping a user
from adding multiple facts with the same name using the traditional method of loading facts. We
could add a check in pyreason.py to prohibit this as well, but it would be a breaking change for
anyone who is currently running experiments with facts that have duplicate names. Because fact
names are only used in the atom trace and not in the fp operator for convergence, having duplicate
names will not impact correctness and would just leave to a confusing atom trace. I decided to handle this discrepancy by adding a warning to the add_fact method for users who add duplicate facts this way.
| lower, upper = round(1 - upper, 10), round(1 - lower, 10) | ||
|
|
||
| bound = interval.closed(lower, upper) | ||
|
|
There was a problem hiding this comment.
@dyumanaditya Looks good to me, can you check this once?
Implements add_rule_from_csv() and add_rule_from_json() for bulk rule loading, following the same pattern as bulk fact loading from PR #100. Updates add_rules_from_file() with raise_errors parameter for backwards-compatible error handling. Adds comprehensive test coverage. Resolves #117 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR adds comprehensive input validation to the fact parser and implements bulk fact loading from CSV files with extensive test coverage.
Fact Parser Validation (
fact_parser.py) (Issue #91 )Bulk Fact Loading (
pyreason.py)add_fact_in_bulk()function to load facts from CSV filesTest Coverage
test_fact_parser.py:test_pyreason_file_loading.py:Test Data
example_facts.csvwith comprehensive test scenarios including both valid and invalid factsexample_facts_no_header.csvfor testing CSV without headersImplementation Notes
ValueErrorwith descriptive messages^[a-zA-Z_][a-zA-Z0-9_]*$(follows Python identifier rules)🤖 Generated with Claude Code