Add fact parser validation, tests, and bulk fact loading with tests by ColtonPayne · Pull Request #100 · lab-v2/pyreason

ColtonPayne · 2026-01-21T18:18:05Z

Summary

This PR adds comprehensive input validation to the fact parser and implements bulk fact loading from CSV files with extensive test coverage.

Fact Parser Validation (`fact_parser.py`) (Issue #91 )

Added validation for empty/whitespace-only input
Validates parentheses structure and placement (Issue BUG-038: Missing Parenthesis Validation - Causes Silent Data Corruption in Fact Parser #89 )
Enforces valid predicate naming (must start with letter or underscore, alphanumeric + underscore allowed)
Validates component structure (no nested parentheses, colons, etc.)
Validates interval bounds are within [0, 1] range and are in the proper format (issue BUG-039: No Validation for Interval Bound Format - Crashes on Malformed Input #90 )
Validates interval lower <= upper bound
Prevents double negation and negation with explicit bounds
Provides clear, specific error messages for each validation failure

Bulk Fact Loading (`pyreason.py`)

Implemented add_fact_in_bulk() function to load facts from CSV files
Supports optional header row detection
Handles optional columns: name, start_time, end_time, static
Provides warnings for invalid data (malformed facts, invalid times, etc.) without crashing
Supports multiple boolean formats for static field (True/true/1/yes, False/false/0/no)

Test Coverage

333 new lines in test_fact_parser.py:
- Tests for valid fact parsing (node/edge facts, intervals, negation, etc.)
- Tests for invalid inputs (missing parentheses, empty fields, invalid characters, etc.)
- Edge cases and boundary conditions
204 new lines in test_pyreason_file_loading.py:
- Tests for bulk fact loading from CSV
- Warning validation for invalid facts
- Tests with/without headers, various static value formats
- Error handling tests

Test Data

Created example_facts.csv with comprehensive test scenarios including both valid and invalid facts
Created example_facts_no_header.csv for testing CSV without headers

Implementation Notes

All validation errors raise ValueError with descriptive messages
Bulk loading continues processing even when individual rows fail (with warnings)
Predicate validation regex: ^[a-zA-Z_][a-zA-Z0-9_]*$ (follows Python identifier rules)

🤖 Generated with Claude Code

dyumanaditya

@ColtonPayne Everything looks good except for my one comment on explicit bound inverses.

dyumanaditya · 2026-01-29T17:43:31Z

pyreason/scripts/facts/fact.py

+            - `'pred@name(node)'` - invalid characters in predicate
+            - `'pred(node1,node2,node3)'` - more than 2 components
+            - `'pred(node):[1.5,2.0]'` - values out of range [0,1]
+            - `'~pred(node):[0.2,0.8]'` - negation with explicit bound


negation with explicit bound This should actually be allowed. The negation of an explicit bound is defined and has an explicit formula. Currently it is not supported but it needs to be.

The formula for the inverse of a bound [l, u] is: ~[l, u] = [1-u, 1-l]

kmukherji · 2026-02-02T17:43:02Z

pyreason/pyreason.py

+            "name": "seen-fact-zach",
+            "start_time": 0,
+            "end_time": 3,
+            "static": false


I think json should have bounds for facts too. For example a fact can be: at(house):[0.5,1]; t:0->3.
Current structure has everything except the bounds.

Make it optional. If field left emoty or not included, assume [1,1]

The JSON should load in the parameters expected from the public fact class. The bounds are extracted here in the parser and are validated in the parser code in the PR. We want to load facts from the JSON the same way we load them in add_fact(), since all this function is doing is calling add_fact() multiple times for each fact extracted from the JSON.

class Fact:
def init(self, fact_text: str, name: str = None, start_time: int = 0, end_time: int = 0, static: bool = False):

kmukherji · 2026-02-02T17:44:57Z

pyreason/pyreason.py

+def add_fact_in_bulk(json_path: str, raise_errors = True) -> None:
+    """Load multiple facts from a JSON file.

-    The CSV should have columns representing Fact attributes in this order:


I think keep load_facts_from_csv as a method. JSON should be default, but since we have built the functionality for CSV already, keep that in. Someone may already have a working project that uses CSV input. Someone may download a dataset in CSV format etc.

Added back support for fact loading from csv.

kmukherji · 2026-02-02T17:46:56Z

pyreason/pyreason.py

+                    raise ValueError(f"Item {idx}: Invalid end_time '{fact_obj.get('end_time')}'") from None
+                warnings.warn(f"Item {idx}: Invalid end_time '{fact_obj.get('end_time')}', using default value")
                end_time = 0



Add a check that end_time > start_time. Else throw warning and put end_time = start_time and then load fact.

We have this check here. The check you are referring to is just making sure the user entered an integer in the csv so we can apply the other validation checks correctly.

kmukherji · 2026-02-02T17:48:51Z

tests/api_tests/test_files/example_facts.json

+    {
+        "fact_text": "Viewed(EmptyOptionals)"
+    }
+]


Add cases where bounds are specified. Edge cases where bounds are wrong - non-numeric or outside [0,1] or lower > upper.

These checks are a part of the fact parser updates in the PR here. The load_from_json() validates that the csv inputs are integer values - see my response to the comment you left above.

kmukherji

Some more comments are inline.

kmukherji · 2026-02-02T17:49:27Z

pyreason/pyreason.py

-            name = row[1].strip() if len(row) > 1 and row[1].strip() else None
+            name = fact_obj.get('name')
+            if name is not None:
+                name = str(name).strip() if str(name).strip() else None


How do we want to handle duplicates here?

This is a good question - I added a check that requires all fact names in the json/csv to be distinct.
There is no such validation in the regular add_fact() function, so there is nothing stopping a user
from adding multiple facts with the same name using the traditional method of loading facts. We
could add a check in pyreason.py to prohibit this as well, but it would be a breaking change for
anyone who is currently running experiments with facts that have duplicate names. Because fact
names are only used in the atom trace and not in the fp operator for convergence, having duplicate
names will not impact correctness and would just leave to a confusing atom trace. I decided to handle this discrepancy by adding a warning to the add_fact method for users who add duplicate facts this way.

kmukherji · 2026-02-02T17:53:55Z

pyreason/scripts/utils/fact_parser.py

+            lower, upper = round(1 - upper, 10), round(1 - lower, 10)
+
+        bound = interval.closed(lower, upper)



@dyumanaditya Looks good to me, can you check this once?

…put-validation

Implements add_rule_from_csv() and add_rule_from_json() for bulk rule loading, following the same pattern as bulk fact loading from PR #100. Updates add_rules_from_file() with raise_errors parameter for backwards-compatible error handling. Adds comprehensive test coverage. Resolves #117 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add fact parser validation, tests, and bulk fact loading with tests

972b263

ColtonPayne added the AI PR contains AI Generated Code label Jan 21, 2026

ColtonPayne added 10 commits January 21, 2026 13:29

Don't hardcode default values

e498f4a

Prevent predicates from starting with a digit

0e2e395

Fix typo in MAKEFILE

cf592e2

Improve CSV loader tests

0908471

Add fact string formatting rules in docstring

5a78cea

Remove extranious f string for linter

83a2452

Fix api test file loading

ee0ea04

Add test for example with no header

f1395dc

Make invalid csv file loads raise exceptions by default

0e3db89

Upd tests

a402321

dyumanaditya self-requested a review January 29, 2026 17:39

dyumanaditya requested changes Jan 29, 2026

View reviewed changes

ColtonPayne added 5 commits January 30, 2026 07:18

Add support for negated interval and negated explicit true/false

d1cb309

Load facts from json instead of csv

b903281

Update file loading tests

cad2d95

Revert

3eecf32

Final cleanup

05b3748

kmukherji reviewed Feb 2, 2026

View reviewed changes

kmukherji requested changes Feb 2, 2026

View reviewed changes

ColtonPayne and others added 2 commits February 2, 2026 19:35

Add back csv file loading and add duplicate name checks

0c79988

Merge branch 'main' into input-validation

b1ea06c

ColtonPayne requested review from dyumanaditya and kmukherji February 4, 2026 13:35

ColtonPayne added 2 commits February 4, 2026 08:38

CSV Formatting

01a20a4

Add back load rules from file

42d54e2

ColtonPayne added 2 commits February 4, 2026 08:40

Merge branch 'input-validation' of github.com:lab-v2/pyreason into in…

f4fbcb0

…put-validation

Requrie exact header match for csv headers

3ba07ec

ColtonPayne assigned ColtonPayne and kmukherji and unassigned ColtonPayne Feb 4, 2026

ColtonPayne added the Ready for Review Awaiting PR Review label Feb 4, 2026

ColtonPayne mentioned this pull request Feb 6, 2026

Add bulk rule loading from CSV and JSON files #120

Open

3 tasks

		lower, upper = round(1 - upper, 10), round(1 - lower, 10)

		bound = interval.closed(lower, upper)

Conversation

ColtonPayne commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fact Parser Validation (fact_parser.py) (Issue #91 )

Bulk Fact Loading (pyreason.py)

Test Coverage

Test Data

Implementation Notes

Uh oh!

dyumanaditya left a comment

Choose a reason for hiding this comment

Uh oh!

dyumanaditya Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColtonPayne Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColtonPayne Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmukherji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ColtonPayne commented Jan 21, 2026 •

edited

Loading

Fact Parser Validation (`fact_parser.py`) (Issue #91 )

Bulk Fact Loading (`pyreason.py`)

dyumanaditya Jan 29, 2026 •

edited

Loading

ColtonPayne Feb 2, 2026 •

edited

Loading

ColtonPayne Feb 2, 2026 •

edited

Loading