Feature db connector solr synonyms api #1918

bolyachevets · 2025-11-11T07:14:59Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the namex license (Apache 2.0).

* init * some unit tests * fix marshmallow load * unit tests * gunicorn config * restore var * clean up build * attempt to fix CI

* init * some unit tests * fix marshmallow load * unit tests * gunicorn config * restore var * clean up build * attempt to fix CI * put back gunicorn config

solr-synonyms-api/synonyms/services/synonyms/synonym.py

-        designation_rx = re.compile(r'({0})|{1}'.format(exception_designation_rx, ws_generic_rx), re.I)
+        exception_designation_rx = "|".join(map(re.escape, exception_designation))
+        ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes)
+        designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I)


To fix this issue, we must escape all metacharacters in the prefixes string before interpolating it into the regex pattern, so that even if a user supplies input containing special regular expression characters, they are treated as literals and cannot manipulate the regular expression's behavior.

How to fix:

Split the prefixes string into its component words (assuming it is currently a pipe-separated string, e.g., "foo|bar"), escape each component with re.escape, and then join them back together with the pipe symbol as before. This will ensure that all input terms are treated as literals in the resulting regular expression.

Use the sanitized string when building ws_generic_rx.

Where to edit:

In solr-synonyms-api/synonyms/services/synonyms/synonym.py, in method regex_prefixes, lines 253–256.

What is needed:

Use re.escape to escape each prefix.

No new imports are necessary as re is already imported.

solr-synonyms-api/synonyms/services/synonyms/synonym.py

+        text = re.sub(
+            r"(?<=[a-zA-Z\.])\'[Ss]|\(.*\d+.*\)|\(?No.?\s*\d+\)?|\(?lot.?\s*\d+[-]?\d*\)?",
+            "",
+            text,


solr-synonyms-api/synonyms/services/synonyms/synonym.py

+        text = re.sub(
+            r"(\b[A-Za-z]{1,2}\b)\s+(?=[a-zA-Z]{1,2}\b)|\s+$",
+            r"\1",
+            text,


solr-synonyms-api/synonyms/utils/service_utils.py

 def remove_french(text):
-    text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
-                  r'\1 ',
+    text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",


solr-synonyms-api/synonyms/utils/service_utils.py

 def remove_french(text):
-    text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
-                  r'\1 ',
+    text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",


solr-synonyms-api/synonyms/utils/service_utils.py

 def remove_french(text):
-    text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
-                  r'\1 ',
+    text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",


solr-synonyms-api/synonyms/utils/service_utils.py

 def remove_french(text):
-    text = re.sub(r'(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?',
-                  r'\1 ',
+    text = re.sub(r"(^\w+(?:[^\w\n]+\w+)+[^\w\n]*)/(\w+(?:[^\w\n]+\w+)+[^\w\n]*$)?",


gitguardian · 2025-11-11T07:15:03Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
9442085	Triggered	Generic Password	`ab5d5c3`	solr-synonyms-api/config.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

sonarqubecloud · 2025-11-11T07:16:01Z

Quality Gate failed

Failed conditions
2 Security Hotspots
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

bolyachevets added 2 commits November 10, 2025 14:22

Solr synonyms api cleanup (#1916)

ab5d5c3

* init * some unit tests * fix marshmallow load * unit tests * gunicorn config * restore var * clean up build * attempt to fix CI

Solr synonyms api cleanup (#1917)

c0e5e9e

* init * some unit tests * fix marshmallow load * unit tests * gunicorn config * restore var * clean up build * attempt to fix CI * put back gunicorn config

bolyachevets requested review from eve-git, mengdong19, ozamani9gh, rarmitag, shaangill025 and stevenc987 as code owners November 11, 2025 07:15

github-advanced-security bot found potential problems Nov 11, 2025

View reviewed changes

bolyachevets mentioned this pull request Nov 12, 2025

namex-solr-synonyms-api needs regex checked/optimized bcgov/entity#31221

Open

bolyachevets merged commit 8e2747b into main Nov 12, 2025
14 of 18 checks passed

bolyachevets deleted the feature-db-connector-solr-synonyms-api branch November 14, 2025 16:47

mengdong19 mentioned this pull request Dec 18, 2025

Deploy NameX Jobs and Services to Test and Finish Testing bcgov/entity#31736

Open

84 tasks

@@ -252,9 +252,9 @@
                 @classmethod
                 def regex_prefixes(cls, text, prefixes, exception_designation):
                     exception_designation_rx = "|".join(map(re.escape, exception_designation))
-                    ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(prefixes)
+                    escaped_prefixes = "|".join(re.escape(p) for p in prefixes.split("|")) if prefixes else ""
+                    ws_generic_rx = r"(?<![a-zA-Z0-9_.])({0})\s*([ &/.-])\s*([A-Za-z]+)".format(escaped_prefixes)
                     designation_rx = re.compile(r"({0})|{1}".format(exception_designation_rx, ws_generic_rx), re.I)
                     text = designation_rx.sub(lambda x: x.group(1) or (x.group(2) + x.group(4)), text)
                     return " ".join(text.split())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature db connector solr synonyms api #1918

Feature db connector solr synonyms api #1918

Uh oh!

bolyachevets commented Nov 11, 2025

Uh oh!

Check failure

Copilot Autofix

Check failure

Check failure

Check failure

Check failure

Check failure

Check failure

gitguardian bot commented Nov 11, 2025

Uh oh!

sonarqubecloud bot commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature db connector solr synonyms api #1918

Feature db connector solr synonyms api #1918

Uh oh!

Conversation

bolyachevets commented Nov 11, 2025

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Check failure

Check failure

Check failure

Check failure

gitguardian bot commented Nov 11, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

sonarqubecloud bot commented Nov 11, 2025

Quality Gate failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant