-
Notifications
You must be signed in to change notification settings - Fork 51
Feature db connector solr synonyms api #1918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* init * some unit tests * fix marshmallow load * unit tests * gunicorn config * restore var * clean up build * attempt to fix CI
* init * some unit tests * fix marshmallow load * unit tests * gunicorn config * restore var * clean up build * attempt to fix CI * put back gunicorn config
| designation_rx = re.compile(r'({0})|{1}'.format(exception_designation_rx, ws_generic_rx), re.I) | ||
| exception_designation_rx = "|".join(map(re.escape, exception_designation)) | ||
| ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes) | ||
| designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I) |
Check failure
Code scanning / CodeQL
Regular expression injection High
user-provided value
re.sub
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix this issue, we must escape all metacharacters in the prefixes string before interpolating it into the regex pattern, so that even if a user supplies input containing special regular expression characters, they are treated as literals and cannot manipulate the regular expression's behavior.
How to fix:
- Split the
prefixesstring into its component words (assuming it is currently a pipe-separated string, e.g.,"foo|bar"), escape each component withre.escape, and then join them back together with the pipe symbol as before. This will ensure that all input terms are treated as literals in the resulting regular expression. - Use the sanitized string when building
ws_generic_rx.
Where to edit:
- In
solr-synonyms-api/synonyms/services/synonyms/synonym.py, in methodregex_prefixes, lines 253–256.
What is needed:
- Use
re.escapeto escape each prefix. - No new imports are necessary as
reis already imported.
-
Copy modified lines R255-R256
| @@ -252,9 +252,9 @@ | ||
| @classmethod | ||
| def regex_prefixes(cls, text, prefixes, exception_designation): | ||
| exception_designation_rx = "|".join(map(re.escape, exception_designation)) | ||
| ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes) | ||
| escaped_prefixes = "|".join(re.escape(p) for p in prefixes.split("|")) if prefixes else "" | ||
| ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(escaped_prefixes) | ||
| designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I) | ||
|
|
||
| text = designation_rx.sub(lambda x: x.group(1) or (x.group(2) + x.group(4)), text) | ||
|
|
||
| return " ".join(text.split()) |
| text = re.sub( | ||
| r"(?<=[a-zA-Z\.])\'[Ss]|\(.*\d+.*\)|\(?No.?\s*\d+\)?|\(?lot.?\s*\d+[-]?\d*\)?", | ||
| "", | ||
| text, |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
regular expression
user-provided value
This
regular expression
user-provided value
| text = re.sub( | ||
| r"(\b[A-Za-z]{1,2}\b)\s+(?=[a-zA-Z]{1,2}\b)|\s+$", | ||
| r"\1", | ||
| text, |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
regular expression
user-provided value
| def remove_french(text): | ||
| text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?', | ||
| r'\1 ', | ||
| text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
| def remove_french(text): | ||
| text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?', | ||
| r'\1 ', | ||
| text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
| def remove_french(text): | ||
| text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?', | ||
| r'\1 ', | ||
| text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
| def remove_french(text): | ||
| text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?', | ||
| r'\1 ', | ||
| text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?", |
Check failure
Code scanning / CodeQL
Inefficient regular expression High
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 9442085 | Triggered | Generic Password | ab5d5c3 | solr-synonyms-api/config.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
|




Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the namex license (Apache 2.0).