Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
964b93a
fix mishandling of ^ inside an expression
marler8997 Mar 6, 2021
7fb7a51
add cbmc verify and fix a --conversion-check
Jun 10, 2022
69afafe
extend CBMC checks to all APIs
Jun 10, 2022
e448651
fix GH #76 out-of-bounds
Jun 10, 2022
bd55c35
refactor cbmc proofs a bit
Jun 10, 2022
9d25c22
support "\\\\" pattern, and disallow "..\\"
Jun 10, 2022
7bd15de
Clarify python2 is needed
Jun 10, 2022
0388df3
re-enable INV_CHAR_CLASS
Jun 10, 2022
f334c5b
prepare multi-byte support
Jun 10, 2022
148e229
TODOs and new tests
Jun 20, 2022
89f513f
fix ranges with ending -
Jun 20, 2022
e9f6a87
fix end anchor match length
perazz Dec 18, 2022
359a38c
add end anchor test to makefile
perazz Dec 18, 2022
45dfbe4
Update formal_verification.md
kokke Jun 11, 2021
fd17b66
use flat memory layout
marler8997 Mar 6, 2021
87502b5
simplify matchplus and matchstar
marler8997 Mar 7, 2021
b5d1b7e
fix mishandling of ^ inside an expression
marler8997 Mar 6, 2021
ad71eb1
use flat memory layout
marler8997 Mar 6, 2021
46af3b8
simplify matchplus and matchstar
marler8997 Mar 7, 2021
7febe14
Merge branch 'master' of https://github.com/shahar99s/tiny-regex-c
shahar99s Dec 23, 2022
623e3fd
silence gcc tests warnings
shahar99s May 25, 2024
973e990
use default python in makefile
shahar99s May 25, 2024
1072ecf
cleanup tests + add windows support
shahar99s May 25, 2024
0ecadd4
prevent '*' expansion in scripts
shahar99s May 25, 2024
b9fa9cc
support character class starting with '-'
shahar99s May 25, 2024
9d7176f
Merge pull request #5 from shahar99s/fix-tests
shahar99s May 25, 2024
6fec7bb
return error for invalid escaping
shahar99s May 25, 2024
d2f6b57
Support compiling regex in Python
shahar99s May 26, 2024
25152bf
Add Python regex compilation tests
shahar99s May 26, 2024
5beb2b3
Remove redundent character class null terminator
shahar99s May 26, 2024
806da4b
Merge pull request #6 from shahar99s/support-python-regex-compilation
shahar99s May 26, 2024
1a2544d
Fix character class matching
shahar99s May 26, 2024
73081fe
Merge pull request #7 from shahar99s/fix-match-charclass
shahar99s May 26, 2024
22e1080
Merge pull request #8 from perazz/master
shahar99s May 27, 2024
2967b89
Fix 'test_compile.c' warning
shahar99s May 27, 2024
de2a008
Merge pull request #9 from shahar99s/fix-tests-warnings
shahar99s May 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 62 additions & 14 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,21 @@ CC := gcc
# Number of random text expressions to generate, for random testing
NRAND_TESTS := 1000

PYTHON != if (python --version 2>&1 | grep -q 'Python 2\..*'); then \
echo 'python'; \
elif command -v python2 >/dev/null 2>&1; then \
echo 'python2'; \
else \
echo 'Error: no compatible python version found.' >&2; \
exit 1; \
fi
PYTHON := python

# Flags to pass to compiler
CFLAGS := -O3 -Wall -Wextra -std=c99 -I.

all:
@$(CC) $(CFLAGS) re.c tests/test1.c -o tests/test1
@$(CC) $(CFLAGS) re.c tests/test2.c -o tests/test2
@$(CC) $(CFLAGS) re.c tests/test_rand.c -o tests/test_rand
@$(CC) $(CFLAGS) re.c tests/test_rand_neg.c -o tests/test_rand_neg
@$(CC) $(CFLAGS) re.c tests/test_compile.c -o tests/test_compile
@$(CC) $(CFLAGS) re.c tests/test1.c -o tests/test1
@$(CC) $(CFLAGS) re.c tests/test2.c -o tests/test2
@$(CC) $(CFLAGS) re.c tests/test_rand.c -o tests/test_rand
@$(CC) $(CFLAGS) re.c tests/test_rand_neg.c -o tests/test_rand_neg
@$(CC) $(CFLAGS) re.c tests/test_compile.c -o tests/test_compile
@$(CC) $(CFLAGS) re.c tests/test_end_anchor.c -o tests/test_end_anchor

clean:
@rm -f tests/test1 tests/test2 tests/test_rand tests/test_compile
@rm -f tests/test1 tests/test2 tests/test_rand tests/test_compile tests/test_end_anchor
@#@$(foreach test_bin,$(TEST_BINS), rm -f $(test_bin) ; )
@rm -f a.out
@rm -f *.o
Expand All @@ -37,6 +31,49 @@ test: all
@./tests/test1
@echo Testing handling of invalid regex patterns
@./tests/test_compile
@echo Compiling patterns in both Python and C and verifying the results are the same:
@echo
@$(PYTHON) ./scripts/regex_test_compile.py \\d+\\w?\\D\\d
@$(PYTHON) ./scripts/regex_test_compile.py \\s+[a-zA-Z0-9?]*
@$(PYTHON) ./scripts/regex_test_compile.py \\w*\\d?\\w\\?
@$(PYTHON) ./scripts/regex_test_compile.py [^\\d]+\\\\?\\s
@$(PYTHON) ./scripts/regex_test_compile.py [^\\w][^-1-4]
@$(PYTHON) ./scripts/regex_test_compile.py [^\\w]
@$(PYTHON) ./scripts/regex_test_compile.py [^1-4]
@$(PYTHON) ./scripts/regex_test_compile.py [^-1-4]
@$(PYTHON) ./scripts/regex_test_compile.py [^\\d]+\\s?[\\w]*
@$(PYTHON) ./scripts/regex_test_compile.py a+b*[ac]*.+.*.[\\.].
@$(PYTHON) ./scripts/regex_test_compile.py a?b[ac*]*.?[\\]+[?]?
@$(PYTHON) ./scripts/regex_test_compile.py [1-5-]+[-1-2]-[-]
@$(PYTHON) ./scripts/regex_test_compile.py [-1-3]-[-]+
@$(PYTHON) ./scripts/regex_test_compile.py [1-5]+[-1-2]-[\\-]
@$(PYTHON) ./scripts/regex_test_compile.py [-1-2]*
@$(PYTHON) ./scripts/regex_test_compile.py \\s?[a-fKL098]+-?
@$(PYTHON) ./scripts/regex_test_compile.py [\\-]*
@$(PYTHON) ./scripts/regex_test_compile.py [\\\\]+
@$(PYTHON) ./scripts/regex_test_compile.py [0-9a-fA-F]+
@$(PYTHON) ./scripts/regex_test_compile.py [1379][2468][abcdef]
@$(PYTHON) ./scripts/regex_test_compile.py [012345-9]?[0123-789]
@$(PYTHON) ./scripts/regex_test_compile.py [012345-9]
@$(PYTHON) ./scripts/regex_test_compile.py [0-56789]
@$(PYTHON) ./scripts/regex_test_compile.py [abc-zABC-Z]
@$(PYTHON) ./scripts/regex_test_compile.py [a\d]?1234
@$(PYTHON) ./scripts/regex_test_compile.py .*123faerdig
@$(PYTHON) ./scripts/regex_test_compile.py .?\\w+jsj$
@$(PYTHON) ./scripts/regex_test_compile.py [?to][+to][?ta][*ta]
@$(PYTHON) ./scripts/regex_test_compile.py \\d+
@$(PYTHON) ./scripts/regex_test_compile.py [a-z]+
@$(PYTHON) ./scripts/regex_test_compile.py \\s+[a-zA-Z0-9?]*
@$(PYTHON) ./scripts/regex_test_compile.py \\w
@$(PYTHON) ./scripts/regex_test_compile.py \\d
@$(PYTHON) ./scripts/regex_test_compile.py [\\d]
@$(PYTHON) ./scripts/regex_test_compile.py [^\\d]
@$(PYTHON) ./scripts/regex_test_compile.py [^-1-4]
@$(PYTHON) ./scripts/regex_test_compile.py \\x01[^\\xff][^
@$(PYTHON) ./scripts/regex_test_compile.py \\x01[^\\xff][\
@echo
@echo
@echo
@echo Testing patterns against $(NRAND_TESTS) random strings matching the Python implementation and comparing:
@echo
@$(PYTHON) ./scripts/regex_test.py \\d+\\w?\\D\\d $(NRAND_TESTS)
Expand Down Expand Up @@ -101,9 +138,20 @@ test: all
@$(PYTHON) ./scripts/regex_test_neg.py [012345-9] $(NRAND_TESTS)
@$(PYTHON) ./scripts/regex_test_neg.py [0-56789] $(NRAND_TESTS)
@$(PYTHON) ./scripts/regex_test_neg.py .*123faerdig $(NRAND_TESTS)
@$(PYTHON) ./scripts/regex_test_neg.py a^ $(NRAND_TESTS)
@echo
@echo
@./tests/test2
@echo
@echo
@echo
@echo
@./tests/test_end_anchor
@echo
@echo

CBMC := cbmc

# unwindset: loop max MAX_REGEXP_OBJECTS patterns
verify:
$(CBMC) -DCPROVER --unwindset 8 --unwind 16 --depth 16 --bounds-check --pointer-check --memory-leak-check --div-by-zero-check --signed-overflow-check --unsigned-overflow-check --pointer-overflow-check --conversion-check --undefined-shift-check --enum-range-check $(CBMC_ARGS) re.c
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,6 @@ int re_match(const char* pattern, const char* text, int* matchlength);
### Supported regex-operators
The following features / regex-operators are supported by this library.

NOTE: inverted character classes are buggy - see the test harness for concrete examples.


- `.` Dot, matches any character
- `^` Start anchor, matches beginning of string
Expand All @@ -63,7 +61,7 @@ NOTE: inverted character classes are buggy - see the test harness for concrete e
- `[abc]` Character class, match if one of {'a', 'b', 'c'}
- `[^abc]` Inverted class, match if NOT one of {'a', 'b', 'c'}
- `[a-zA-Z]` Character ranges, the character set of the ranges { a-z | A-Z }
- `\s` Whitespace, \t \f \r \n \v and spaces
- `\s` Whitespace, '\t' '\f' '\r' '\n' '\v' and spaces
- `\S` Non-whitespace
- `\w` Alphanumeric, [a-zA-Z0-9_]
- `\W` Non-alphanumeric
Expand All @@ -90,7 +88,7 @@ int match_length;
/* Standard null-terminated C-string to search: */
const char* string_to_search = "ahem.. 'hello world !' ..";

/* Compile a simple regular expression using character classes, meta-char and greedy + non-greedy quantifiers: */
/* Compile a simple regular expression using character classes, meta-char and greedy quantifiers: */
re_t pattern = re_compile("[Hh]ello [Ww]orld\\s*[!]?");

/* Check if the regex matches the text: */
Expand All @@ -104,10 +102,15 @@ if (match_idx != -1)
For more usage examples I encourage you to look at the code in the `tests`-folder.

### TODO
- Fix the implementation of inverted character classes.
- Fix implementation of branches (`|`), and see if that can lead us closer to groups as well, e.g. `(a|b)+`.
- Fix implementation of branches (`|`) (see the branch), and add groups as well, e.g. `(a|b)+`.
- `re_match_capture()` with groups.
- Add `example.c` that demonstrates usage.
- Add `tests/test_perf.c` for performance and time measurements.
- Add optional multibyte support (e.g. UTF-8). On non-wchar systems roll our own.
- Word boundary: \b \B
- non-greedy, lazy quantifiers (??, +?, *?, {n,m}?)
- case-insensitive option or API. `re_matchi()`
- '.' may not match '\r' nor '\n', unless a single-line option is given.
- Testing: Improve pattern rejection testing.

### FAQ
Expand All @@ -118,6 +121,3 @@ For more usage examples I encourage you to look at the code in the `tests`-folde
### License
All material in this repository is in the public domain.




5 changes: 5 additions & 0 deletions formal_verification.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,8 @@ sys 9m34.654s
klee@780432c1aaae0:~$
```

----

For the formal verifier CBMC just call make verify.
This verifier is much faster and better than klee.
https://www.cprover.org/cbmc/
Loading