Skip to content

Conversation

@bolyachevets
Copy link
Collaborator

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the namex license (Apache 2.0).

* init

* some unit tests

* fix marshmallow load

* unit tests

* gunicorn config

* restore var

* clean up build

* attempt to fix CI
* init

* some unit tests

* fix marshmallow load

* unit tests

* gunicorn config

* restore var

* clean up build

* attempt to fix CI

* put back gunicorn config
designation_rx = re.compile(r'({0})|{1}'.format(exception_designation_rx, ws_generic_rx), re.I)
exception_designation_rx = "|".join(map(re.escape, exception_designation))
ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes)
designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I)

Check failure

Code scanning / CodeQL

Regular expression injection High

This regular expression depends on a
user-provided value
and is executed by
re.sub
.

Copilot Autofix

AI 3 months ago

To fix this issue, we must escape all metacharacters in the prefixes string before interpolating it into the regex pattern, so that even if a user supplies input containing special regular expression characters, they are treated as literals and cannot manipulate the regular expression's behavior.

How to fix:

  • Split the prefixes string into its component words (assuming it is currently a pipe-separated string, e.g., "foo|bar"), escape each component with re.escape, and then join them back together with the pipe symbol as before. This will ensure that all input terms are treated as literals in the resulting regular expression.
  • Use the sanitized string when building ws_generic_rx.

Where to edit:

  • In solr-synonyms-api/synonyms/services/synonyms/synonym.py, in method regex_prefixes, lines 253–256.

What is needed:

  • Use re.escape to escape each prefix.
  • No new imports are necessary as re is already imported.

Suggested changeset 1
solr-synonyms-api/synonyms/services/synonyms/synonym.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/solr-synonyms-api/synonyms/services/synonyms/synonym.py b/solr-synonyms-api/synonyms/services/synonyms/synonym.py
--- a/solr-synonyms-api/synonyms/services/synonyms/synonym.py
+++ b/solr-synonyms-api/synonyms/services/synonyms/synonym.py
@@ -252,9 +252,9 @@
     @classmethod
     def regex_prefixes(cls, text, prefixes, exception_designation):
         exception_designation_rx = "|".join(map(re.escape, exception_designation))
-        ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes)
+        escaped_prefixes = "|".join(re.escape(p) for p in prefixes.split("|")) if prefixes else ""
+        ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(escaped_prefixes)
         designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I)
-
         text = designation_rx.sub(lambda x: x.group(1) or (x.group(2) + x.group(4)), text)
 
         return " ".join(text.split())
EOF
@@ -252,9 +252,9 @@
@classmethod
def regex_prefixes(cls, text, prefixes, exception_designation):
exception_designation_rx = "|".join(map(re.escape, exception_designation))
ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes)
escaped_prefixes = "|".join(re.escape(p) for p in prefixes.split("|")) if prefixes else ""
ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(escaped_prefixes)
designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I)

text = designation_rx.sub(lambda x: x.group(1) or (x.group(2) + x.group(4)), text)

return " ".join(text.split())
Copilot is powered by AI and may make mistakes. Always verify output.
text = re.sub(
r"(?<=[a-zA-Z\.])\'[Ss]|\(.*\d+.*\)|\(?No.?\s*\d+\)?|\(?lot.?\s*\d+[-]?\d*\)?",
"",
text,

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
user-provided value
may run slow on strings starting with '(' and with many repetitions of '0'.
This
regular expression
that depends on a
user-provided value
may run slow on strings starting with '(0' and with many repetitions of '0'.
text = re.sub(
r"(\b[A-Za-z]{1,2}\b)\s+(?=[a-zA-Z]{1,2}\b)|\s+$",
r"\1",
text,

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
user-provided value
may run slow on strings with many repetitions of ' '.
def remove_french(text):
text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
r'\1 ',
text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with 'a' and containing many repetitions of '{Z'.
def remove_french(text):
text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
r'\1 ',
text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with 'a/' and containing many repetitions of 'aZ'.
def remove_french(text):
text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
r'\1 ',
text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with 'a/a/a' and containing many repetitions of '/Z'.
def remove_french(text):
text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
r'\1 ',
text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with 'a/a/a/' and containing many repetitions of 'aZ'.
@gitguardian
Copy link

gitguardian bot commented Nov 11, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9442085 Triggered Generic Password ab5d5c3 solr-synonyms-api/config.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@bolyachevets bolyachevets merged commit 8e2747b into main Nov 12, 2025
14 of 18 checks passed
@bolyachevets bolyachevets deleted the feature-db-connector-solr-synonyms-api branch November 14, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant