Apply feedback from documentation-website to PPL command docs #4997

ritvibhatt · 2025-12-24T23:43:31Z

Description

Apply feedback from technical writers from documentation-website PR (Export PPL Command docs from SQL plugin repo documentation-website#11688) to SQL repo to keep documentation in sync
Update script to auto-convert codeblock tables to markdown tables, support single directory export, remove empty tables, improve Jekyll angle bracket/asterisk escaping, convert markdown emphasis to Jekyll attributes
Add README with SOP for docs exporter script

When the docs exporter script is applied now, the differences from the docs website are: links to sections not yet ported to documentation-website are commented out in the docs website (functions, admin), whitespace differences, different nav order for some command files, and some differences in escape characters for angle brackets and asterisks. Everything except the commented out admin links will be addressed in the next documentation-website PR (opensearch-project/documentation-website#11747).

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kyle Hounslow <kylhouns@amazon.com>

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai · 2025-12-24T23:43:36Z

📝 Walkthrough

Walkthrough

Reformats ~50 PPL command and function docs into consistent Syntax/Parameters/Examples layouts, expands and splits ML/AD docs into time-series vs non-time-series variants, and upgrades the docs exporter tool with directory mappings, title overrides, SQL-table→Markdown conversion, redirect handling, and a CLI for selective exports. Only exporter script contains code changes.

Changes

Cohort / File(s)	Summary
PPL command docs `docs/user/ppl/cmd/...` `ad.md`, `addcoltotals.md`, `addtotals.md`, `append.md`, `appendcol.md`, `appendpipe.md`, `bin.md`, `chart.md`, `dedup.md`, `describe.md`, `eval.md`, `eventstats.md`, `expand.md`, `explain.md`, `fields.md`, `fillnull.md`, `flatten.md`, `grok.md`, `head.md`, `index.md`, `join.md`, `kmeans.md`, `lookup.md`, `ml.md`, `multisearch.md`, `parse.md`, `patterns.md`, `rare.md`, `regex.md`, `rename.md`, `replace.md`, `reverse.md`, `rex.md`, `search.md`, `showdatasources.md`, `sort.md`, `spath.md`, `stats.md`, `streamstats.md`, `subquery.md`, `syntax.md`, `table.md`, `timechart.md`, `top.md`, `trendline.md`, `where.md`	Large, consistent rewrites: add fenced Syntax blocks, Parameters tables, standardized Examples and outputs; split/adapt `ad.md` and `ml.md` for time-series vs non-time-series RCF variants; add deprecation/plug‑in notes where applicable. Documentation-only edits.
PPL function docs `docs/user/ppl/functions/...` `aggregations.md`, `collection.md`, `condition.md`, `conversion.md`, `cryptographic.md`, `datetime.md`, `ip.md`, `json.md`, `math.md`, `statistical.md`, `string.md`, `system.md`	Presentation standardization: code-formatted signatures, bolded type labels, explicit "Example" headings, and consistent formatting across many function pages. No behavioral changes.
Docs exporter script `scripts/docs_exporter/export_to_docs_website.py`	Significant refactor and features: replace base constants with parent/base variants, add `CUSTOM_REDIRECTS`, `DIR_PATH_MAPPINGS`, `CMD_TITLE_OVERRIDES`, and `map_directory_path`; add SQL table→Markdown converters (`convert_sql_table_to_markdown`, `convert_tables_in_code_blocks`); broaden content pipeline (table conversion, link/path rewrites, Jekyll note conversions); change front-matter redirect handling to lists; extend `export_docs` signature to accept `auto_yes` and `only_dirs`; add argparse CLI flags for non-interactive and selective exports; other path/index generation and nav-order tweaks.
Docs exporter README `scripts/docs_exporter/README.md`	New README describing exporter purpose, SOP, usage flags (`--yes`, `--only-dirs`), and transformations performed (front-matter injection, table conversion, link fixes, Jekyll compatibility).

Sequence Diagram(s)

sequenceDiagram
    participant User as CLI user
    participant CLI as export_to_docs_website CLI
    participant Exporter as export_to_docs_website.py
    participant FS as File System (source/target)
    participant Processor as Markdown processing pipeline

    User->>CLI: run script (--yes / --only-dirs)
    CLI->>Exporter: call export_docs(auto_yes, only_dirs)
    Exporter->>FS: read source markdown files
    Exporter->>Processor: convert_tables_in_code_blocks(content)
    Processor-->>Exporter: transformed content (SQL tables → Markdown)
    Exporter->>Processor: process_content(content) (link/path rewrites, Jekyll notes, redirects)
    Processor-->>Exporter: processed content
    Exporter->>FS: write target files with front-matter and redirects
    Exporter->>FS: generate per-dir index and redirects
    Exporter-->>CLI: exit status / summary

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[DOC] PPL docs website exporter script #4950 — overlapping edits to scripts/docs_exporter/export_to_docs_website.py (path mappings, helpers, front-matter/link processing).
Migrate PPL Documentation from RST to Markdown #4912 — related PPL docs migration affecting many docs/user/ppl/* pages.
Feature addtotals and addcoltotals #4754 — directly related to addtotals / addcoltotals documentation changes.

Suggested labels

PPL, maintenance, infrastructure

Suggested reviewers

kylehounslow
penghuo
anirudha
yuancu
kavithacm
ps48
GumpacG
qianheng-aws
MaxKsyunz

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the PR: applying feedback from documentation-website to PPL command docs.
Description check	✅ Passed	The pull request description clearly explains applying documentation-website feedback, updating the exporter script with specific improvements, and adding SOP documentation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/user/ppl/functions/collection.md (1)
629-691: Apply standardized formatting to MVFIND and MVMAP type labels and example headers.

Lines 629–691 (MVFIND) and 809–875 (MVMAP) are missing the bold and inline-code formatting consistently applied throughout the file. Update Argument type: and Return type: to use **Argument type:** with backtick-wrapped values, and change Example headers to ### Example to match other functions like MVINDEX (line 710).
Changes required for MVFIND (lines 634–641):
Change:
Argument type: array: ARRAY, regex: STRING
Return type: INTEGER (nullable)
Example

To:
**Argument type:** `array: ARRAY, regex: STRING`
**Return type:** `INTEGER (nullable)`
### Example
Apply the same formatting updates to MVMAP (lines 814–820).

🤖 Fix all issues with AI agents

In @docs/user/ppl/cmd/addcoltotals.md:
- Line 27: Change the heading for "Example 1" to match the other examples by
replacing "### Example 1: Basic example" with "## Example 1: Basic example" so
all examples ("Example 1", "Example 2", "Example 3") use the same heading level
for consistency.

In @docs/user/ppl/cmd/eval.md:
- Around line 16-23: The markdown table row for the `<expression>` parameter has
trailing whitespace after the final pipe which can cause inconsistent rendering;
edit the table so the `<expression>` row exactly matches the other rows (remove
the extra spaces after the final `|` and ensure the row is `| `<expression>` |
Required | The expression to evaluate. |`), and verify the header and separator
rows (`| Parameter | Required/Optional | Description |` and `| --- | --- | ---
|`) are properly aligned with no extra trailing spaces.

In @docs/user/ppl/cmd/fillnull.md:
- Around line 191-195: Change the code block language marker for the example
that uses PPL syntax (the block starting with the comment "# This FAILS - same
value for mixed-type fields" and the line "source=accounts | fillnull value=0
firstname, age") from ```sql to ```ppl so the snippet is correctly highlighted
as PPL; leave the block contents and closing backticks unchanged.

In @docs/user/ppl/cmd/join.md:
- Around line 72-80: Update the description for the `max` parameter to
explicitly state that it has no effect for `semi` and `anti` join types; modify
the `<max>` table row (the "max | Optional | ..." entry) to add a sentence like
"Ignored for `semi` and `anti` joins, which only test existence and do not honor
result limits." so readers know `max` applies to other join types but is ignored
for `semi`/`anti`.
- Around line 216-222: The table rows contain inline code blocks with unescaped
pipe characters (e.g., the queries like `source=table1 | join left=t1 right=t2
on t1.id=t2.id table2 | eval a = 1`, `source=table1 | join on
table1.id=table2.id table2 | eval a = 1`, `source=table1 | join on
table1.id=t2.id table2 as t2 | eval a = 1`, and `source=table1 | join right=tt
on table1.id=t2.id [ source=table2 as t2 | eval b = id ] | eval a = 1`) which
break Markdown table parsing; escape each pipe inside the inline code with a
backslash (use `\|`) so the inline code remains a single cell and matches
repository documentation style.

In @docs/user/ppl/cmd/syntax.md:
- Around line 88-94: The markdown table for the columns starting with
"account_number | firstname | address | balance | gender | city | employer |
state | age | email | lastname" has a malformed separator row and missing
trailing pipes on data rows; fix it by adding a leading pipe to the header
separator line (so the ":--- | :--- | ..." row begins with "|") and ensure each
data row ends with a trailing pipe (append "|" to rows containing "1  | Amber
... Duke", "6  | Hattie ... Bond", "13 | Nanette ... Bates", and "18 | Dale ...
Adams") to match the other tables' formatting and satisfy markdownlint.
- Around line 103-106: The Markdown table has an inconsistent separator and row
pipe usage: ensure the separator line matches the header by adding a leading
pipe to the separator (change ":--- | :--- | ..." to "| :--- | :--- | ...") and
add trailing pipes to each data row so every row ends with a pipe (e.g., ensure
the rows matching "| 1 | Amber | 880 Holmes Lane | ... | Duke |" and "| 13 |
Nanette | ..." both end with a trailing "|").

In @docs/user/ppl/cmd/timechart.md:
- Around line 196-203: The example description for "Example 5" is inconsistent:
it says "1-second interval" but the query uses timechart with span=1h; update
the prose above the code block to say "1-hour interval" (or change the query to
use span=1s if the intent was seconds) so the description matches the query
(refer to the Example 5 heading and the timechart line containing "span=1h
count() by category").

In @docs/user/ppl/functions/condition.md:
- Around line 214-218: The usage line for the nested IFNULL example has a
misplaced closing backtick that breaks the code span; update the text so the
entire nested call starting with ifnull(field1, ifnull(field2, ifnull(field3,
default_value))) is wrapped in a single code span (move the closing backtick to
after the final parenthesis) so the IFNULL nesting is rendered as one inline
code block.

In @docs/user/ppl/functions/datetime.md:
- Line 140: The CONVERT_TZ documentation sentence for the convert_tz(timestamp,
from_timezone, to_timezone) function has a typo and awkward/redundant phrasing;
update the text to fix "timeszone" → "timezone" and simplify the redundant
clause so it reads e.g. "CONVERT_TZ returns null when any argument is invalid —
the timestamp is not in format yyyy-MM-dd HH:mm:ss, the timezone is not in
(+/-)HH:mm, or the date/time is invalid (e.g., February 30th) — and invalid
timezones are those outside -13:59 to +14:00." Ensure you edit the CONVERT_TZ
description line accordingly.

In @docs/user/ppl/functions/ip.md:
- Line 65: The list item describing `options` has a leading space before the
hyphen and uses "comma separated" without a hyphen; remove the extra leading
space so the `-` aligns with the other list items and change "comma separated"
to "comma-separated" in the sentence that starts with `- ` and mentions
`options` and the example fields ("country_iso_code", "country_name", ...).

In @docs/user/ppl/functions/json.md:
- Around line 58-59: Remove the trailing spaces inside the inline code spans for
the type annotations—change the backtick-wrapped tokens shown as `STRING  ` and
`BOOLEAN  ` to `STRING` and `BOOLEAN` respectively (i.e., trim the extra spaces
inside the code spans) so the markdown no longer contains spaces inside the
inline code elements.

In @scripts/docs_exporter/export_to_docs_website.py:
- Around line 321-331: Remove the duplicated only_dirs filtering block: there
are two identical if only_dirs: md_files = [f for f in md_files if
f.relative_to(source_dir).parts[0] in only_dirs] sections; keep a single
instance and delete the redundant copy so md_files is filtered only once using
only_dirs, referencing the variables md_files, only_dirs, and source_dir in the
existing function.

🧹 Nitpick comments (22)

docs/user/ppl/cmd/bin.md (1)
24-24: Align link text with section heading for consistency.

Line 24 references "Align options", but the actual section heading (line 50) is "Align time options". While the anchor reference is correct, the link text should match the section title exactly for clarity and consistency.
📝 Proposed fix
-| `aligntime` | Optional | Align the bin times for time-based fields. Valid only for time-based discretization. Valid values are `earliest`, `latest`, or a specific time. See [Align options](#align-time-options).|
+| `aligntime` | Optional | Align the bin times for time-based fields. Valid only for time-based discretization. Valid values are `earliest`, `latest`, or a specific time. See [Align time options](#align-time-options).|
docs/user/ppl/cmd/fillnull.md (1)
184-195: Redundant explanation in Limitations section.

Lines 189–190 repeat the type-matching constraint twice in consecutive sentences. Consider consolidating into a single, clearer statement to improve readability.
♻️ Suggested consolidation
 The `fillnull` command has the following limitations:
 
-* When applying the same value to all fields without specifying field names, all fields must be of the same type. For mixed types, use separate `fillnull` commands or explicitly specify fields.
-* The replacement value type must match all field types in the field list. When applying the same value to multiple fields, all fields must be of the same type (all strings or all numeric). The following query shows the error that occurs when this rule is violated:
+* The replacement value type must match all field types in the field list. When applying the same value to multiple fields, all fields must be of the same type (all strings or all numeric). For mixed types, use separate `fillnull` commands or explicitly specify fields. The following query shows the error that occurs when this rule is violated:
docs/user/ppl/cmd/streamstats.md (2)

30-43: Align example-syntax code-block label with eventstats.md.

The example-syntax block uses "ppl ignore" while eventstats.md uses "syntax" for the same purpose. For consistency across related commands, consider aligning the label. If "ppl ignore" is intentional for validation reasons, clarify in comments or standardize usage.

79-79: Standardize link text for aggregation functions reference.

The link text differs across files: streamstats and stats use "[Aggregation Functions]" while eventstats uses "[Functions]". Update line 79 to match the standard "[Aggregation Functions]" for consistency.

docs/user/ppl/cmd/eventstats.md (2)

42-42: Standardize parameter default wording across commands.

Line 42 states "Default is determined by" for bucket_nullable, while stats.md and streamstats.md use "Default is the value of." Update to match the standard phrasing for consistency.

77-77: Standardize link text for aggregation functions reference.

Line 77 uses link text "[Functions]" while stats.md and streamstats.md use "[Aggregation Functions]". Update for consistency across related command documentation.
docs/user/ppl/cmd/reverse.md (1)
8-16: Consider formalizing the Parameters section header.

The documentation restructuring introduces formal Syntax, Parameters, and Examples sections. While the Syntax section has a proper header (line 8), the Parameters section is only a descriptive sentence (line 16) without a ## Parameters header. For consistency with the standardization pattern, consider adding a formal Parameters section header even for commands with no parameters.
♻️ Proposed format adjustment
 The `reverse` command has the following syntax:
 
 ```syntax
 reverse
+## Parameters
+
The reverse command takes no parameters.
</details>

</blockquote></details>
<details>
<summary>docs/user/ppl/functions/math.md (1)</summary><blockquote>

`951-951`: **Consider simplifying wordy phrase for clarity.**

Line 951 uses "the seed is initialized prior to execution," which could be more concise. Consider "before execution" as a simpler alternative.



Based on static analysis, this is a minor style improvement that improves readability without affecting technical accuracy.

</blockquote></details>
<details>
<summary>docs/user/ppl/functions/string.md (2)</summary><blockquote>

`88-94`: **Consider simplifying wordy phrases in LIKE and ILIKE descriptions.**

Lines 92 and 123 both use "often used in conjunction with," which could be more concise. Consider using "used with" instead for better readability.



These are minor style improvements suggested by static analysis. The current phrasing is clear but slightly verbose.



Also applies to: 122-126

---

`527-548`: **Clarify terminology: avoid mixing "regexp" and "regex" in comments.**

The REGEXP_REPLACE function documentation mixes terminology: the function name is correct as `regexp_replace`, but comments should consistently use "regex" (the general term) rather than sometimes using "regexp." While the current mixing is not incorrect (PCRE = Perl Compatible Regular Expressions), standardizing on "regex" in explanatory text would improve clarity.



For example, the introductory "Usage:" line could clarify that the pattern is a regex, and the note at the end could refer consistently to "regex patterns" rather than mixing terms.

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/chart.md (1)</summary><blockquote>

`10-12`: **Complex syntax block may benefit from visual clarity.**

The syntax on line 11 is comprehensive but quite dense, making it challenging to parse at a glance:
chart [limit=(top|bottom) ] [useother=] [usenull=] [nullstr=] [otherstr=] <aggregation_function> [ by <row_split> <column_split> ] | [over <row_split> ] [ by <column_split>]
Consider breaking this into multiple lines or adding visual structure to highlight the primary patterns (basic form vs. `by` variant vs. `over/by` variant). This would improve readability for users quickly scanning the syntax.


<details>
<summary>📖 Suggested syntax block restructuring for clarity</summary>
chart [limit=(top|bottom) ] [useother=]
[usenull=] [nullstr=] [otherstr=]
<aggregation_function>
[ by <row_split> [<column_split>] ]
| [over <row_split>] [by <column_split>]
This multi-line format makes it clearer that users can either use `by` syntax or `over...by` syntax.
</details>

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/ad.md (1)</summary><blockquote>

`27-27`: **Resolve duplicate heading "### Parameters" (MD024).**

Both variants have identically named parameter sections at lines 27 and 55. While contextually appropriate (each variant has distinct parameters), markdownlint flags this as MD024 (no-duplicate-heading). Consider restructuring to use hierarchical headings such as `#### Time-Series Parameters` and `#### Non-Time-Series Parameters`, or merge both variants into a single Parameters section with variant-specific subsections.


<details>
<summary>Example: Using hierarchical headings</summary>

```diff
### Anomaly detection for time-series data

...

-### Parameters
+#### Time-Series Parameters

The fixed-in-time RCF algorithm supports the following parameters.

...

### Anomaly detection for non-time-series data

...

-### Parameters
+#### Non-Time-Series Parameters

The batch RCF algorithm supports the following parameters.
Also applies to: 55-55
docs/user/ppl/cmd/explain.md (1)
20-22: Minor style improvement: "able to" → "can".

Line 21 uses "able to pushdown" which could be more concise as "can pushdown" per common style guidance.
Proposed edit
-| `<mode>` | Optional | The explain mode. Valid values are: - `standard`: Displays the logical and physical plan along with pushdown information (query domain-specific language [DSL]). Available in both v2 and v3 engines. - `simple`: Displays the logical plan tree without attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). - `cost`: Displays the standard information plus plan cost attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). - `extended`: Displays the standard information plus the generated code. If the whole plan is able to pushdown, it is equal to the standard mode. Requires the v3 engine (`plugins.calcite.enabled` = `true`). Default is `standard`. |
+| `<mode>` | Optional | The explain mode. Valid values are: - `standard`: Displays the logical and physical plan along with pushdown information (query domain-specific language [DSL]). Available in both v2 and v3 engines. - `simple`: Displays the logical plan tree without attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). - `cost`: Displays the standard information plus plan cost attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). - `extended`: Displays the standard information plus the generated code. If the whole plan can pushdown, it is equal to the standard mode. Requires the v3 engine (`plugins.calcite.enabled` = `true`). Default is `standard`. |
docs/user/ppl/cmd/rare.md (1)
6-8: Fix Markdown blockquote formatting (MD028).

Blank lines inside blockquotes must contain the > character to keep the quote together. Line 7 is a blank line separating two blockquote blocks. Add > to line 7 to maintain blockquote continuity.
Proposed fix
> **Note**: The command returns up to 10 results for each distinct combination of values in the group-by fields.
-
+>
> **Note**: The `rare` command is not rewritten to [query domain-specific language (DSL)](https://docs.opensearch.org/latest/query-dsl/). It is only executed on the coordinating node.
docs/user/ppl/cmd/search.md (1)

54-56: Minor: Reduce repetition of "Time modifiers" at sentence boundaries.

Lines 54–55 contain "Time modifiers filter search results…" immediately followed by "Time modifiers support the following formats." Consider rewording one sentence to avoid the repetition, for example: "Time modifiers filter search results by a time range using the implicit @timestamp field and support the following formats."
scripts/docs_exporter/README.md (2)
3-3: Minor: Capitalize "Markdown" as a proper noun.

"markdown" should be capitalized to "Markdown" as it refers to the formatting language name.
📝 Suggested fix
-Exports PPL documentation to the OpenSearch documentation website. Auto-injects Jekyll front-matter, converts SQL CLI tables to markdown, fixes relative links, and adds copy buttons.
+Exports PPL documentation to the OpenSearch documentation website. Auto-injects Jekyll front-matter, converts SQL CLI tables to Markdown, fixes relative links, and adds copy buttons.
74-75: Minor: Capitalize "Markdown" as a proper noun.

Same capitalization issue as above.
📝 Suggested fix
 - Injects Jekyll front-matter (title, parent, nav_order, etc.)
-- Converts SQL CLI table output to markdown tables
+- Converts SQL CLI table output to Markdown tables
docs/user/ppl/cmd/syntax.md (1)
67-79: Consider using ## headings for examples for consistency.

Other command documentation files in this PR (e.g., timechart.md, addcoltotals.md) use ## Example N: headings, while this file uses bold text (**Example N:**). Consider aligning with the heading style for consistency and better navigation.
📝 Suggested changes
 ## Examples

-**Example 1: Search through accounts index**
+## Example 1: Search through accounts index

 In the following query, the `search` command refers to an `accounts` index as the source and uses the `fields` and `where` commands for the conditions:

-**Example 2: Get all documents**
+## Example 2: Get all documents

 To get all documents from the `accounts` index, specify it as the `source`:

-**Example 3: Get documents that match a condition**
+## Example 3: Get documents that match a condition
docs/user/ppl/cmd/top.md (1)
20-25: Vary the sentence structure in the top-options parameter description.

The three consecutive parameter definitions in line 23 each begin with "Default is", which reduces readability. Rephrase for better flow.
♻️ Proposed refactor
-| `top-options` | Optional | `showcount`: Whether to create a field in the output that represents a count of the tuple of values. Default is `true`. `countfield`: The name of the field that contains the count. Default is `count`. `usenull`: Whether to output `null` values. Default is the value of `plugins.ppl.syntax.legacy.preferred`. |
+| `top-options` | Optional | `showcount`: Whether to create a field in the output that represents a count of the tuple of values. Defaults to `true`. `countfield`: The name of the field that contains the count; defaults to `count`. `usenull`: Whether to output `null` values, with a default of `plugins.ppl.syntax.legacy.preferred`. |
docs/user/ppl/cmd/addtotals.md (1)
14-14: Parameter table notation should match syntax block formatting.

The syntax block shows field-list (unquoted), but the parameters table references it as <field-list> (with angle brackets). Similarly, parameters like row, col, labelfield, label, and fieldname appear unquoted in the syntax but are backtick-quoted in the table.

Standardize the notation across both sections to avoid reader confusion. If angle brackets (<...>) denote required placeholders and plain names denote options/literals, this distinction should be explicitly clear and consistently applied.
Proposed notation alignment

Option 1: Make syntax match table notation:
-addtotals [field-list] [label=<string>] [labelfield=<field>] [row=<boolean>] [col=<boolean>] [fieldname=<field>]
+addtotals [<field-list>] [label=<string>] [labelfield=<field>] [row=<boolean>] [col=<boolean>] [fieldname=<field>]
Option 2: Make table match syntax notation (if parameters are case-insensitive or literal keywords):
Update table to reference field-list without angle brackets for consistency, or add a legend explaining bracket conventions.
Also applies to: 21-28
docs/user/ppl/cmd/describe.md (1)

78-80: Example 3 lacks runnable code and output; reference-only example may feel incomplete.

Unlike Examples 1–2, Example 3 provides only a title and a link to external documentation without a query snippet or expected output. This creates an asymmetry in the documentation structure and may leave readers wanting a hands-on illustration.

Consider adding a minimal example query (e.g., describe <prometheus-datasource>.metrics) with a sample result, or explicitly document this as a "See also" reference rather than a numbered example.

docs/user/ppl/cmd/trendline.md (1)

18-26: Consider escaping pipe characters in code blocks within markdown tables for consistency and to avoid potential parsing issues.

The parameter names [+|-] (line 20) and (sma | wma) (line 22) contain pipe characters inside backticks. While standard markdown parsers handle this correctly, escaping the pipes as [+\|-] and (sma \| wma) improves compatibility with tools that parse tables by splitting on pipe delimiters and follows markdown best practices.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77633ef and bac0202.

📒 Files selected for processing (60)

docs/user/ppl/cmd/ad.md
docs/user/ppl/cmd/addcoltotals.md
docs/user/ppl/cmd/addtotals.md
docs/user/ppl/cmd/append.md
docs/user/ppl/cmd/appendcol.md
docs/user/ppl/cmd/appendpipe.md
docs/user/ppl/cmd/bin.md
docs/user/ppl/cmd/chart.md
docs/user/ppl/cmd/dedup.md
docs/user/ppl/cmd/describe.md
docs/user/ppl/cmd/eval.md
docs/user/ppl/cmd/eventstats.md
docs/user/ppl/cmd/expand.md
docs/user/ppl/cmd/explain.md
docs/user/ppl/cmd/fields.md
docs/user/ppl/cmd/fillnull.md
docs/user/ppl/cmd/flatten.md
docs/user/ppl/cmd/grok.md
docs/user/ppl/cmd/head.md
docs/user/ppl/cmd/index.md
docs/user/ppl/cmd/join.md
docs/user/ppl/cmd/kmeans.md
docs/user/ppl/cmd/lookup.md
docs/user/ppl/cmd/ml.md
docs/user/ppl/cmd/multisearch.md
docs/user/ppl/cmd/parse.md
docs/user/ppl/cmd/patterns.md
docs/user/ppl/cmd/rare.md
docs/user/ppl/cmd/regex.md
docs/user/ppl/cmd/rename.md
docs/user/ppl/cmd/replace.md
docs/user/ppl/cmd/reverse.md
docs/user/ppl/cmd/rex.md
docs/user/ppl/cmd/search.md
docs/user/ppl/cmd/showdatasources.md
docs/user/ppl/cmd/sort.md
docs/user/ppl/cmd/spath.md
docs/user/ppl/cmd/stats.md
docs/user/ppl/cmd/streamstats.md
docs/user/ppl/cmd/subquery.md
docs/user/ppl/cmd/syntax.md
docs/user/ppl/cmd/table.md
docs/user/ppl/cmd/timechart.md
docs/user/ppl/cmd/top.md
docs/user/ppl/cmd/trendline.md
docs/user/ppl/cmd/where.md
docs/user/ppl/functions/aggregations.md
docs/user/ppl/functions/collection.md
docs/user/ppl/functions/condition.md
docs/user/ppl/functions/conversion.md
docs/user/ppl/functions/cryptographic.md
docs/user/ppl/functions/datetime.md
docs/user/ppl/functions/ip.md
docs/user/ppl/functions/json.md
docs/user/ppl/functions/math.md
docs/user/ppl/functions/statistical.md
docs/user/ppl/functions/string.md
docs/user/ppl/functions/system.md
scripts/docs_exporter/README.md
scripts/docs_exporter/export_to_docs_website.py

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

📚 Learning: 2025-12-02T17:27:55.938Z

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

docs/user/ppl/cmd/index.md
scripts/docs_exporter/README.md
docs/user/ppl/cmd/syntax.md

📚 Learning: 2025-12-11T05:27:39.856Z

Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

docs/user/ppl/cmd/join.md

🪛 LanguageTool

docs/user/ppl/cmd/explain.md

[style] ~21-~21: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...s the generated code. If the whole plan is able to pushdown, it is equal to the standard m...

(BE_ABLE_TO)

scripts/docs_exporter/README.md

[uncategorized] ~3-~3: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ront-matter, converts SQL CLI tables to markdown, fixes relative links, and adds copy bu...

(MARKDOWN_NNP)

[uncategorized] ~75-~75: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...tc.) - Converts SQL CLI table output to markdown tables - Removes empty tables and their...

(MARKDOWN_NNP)

docs/user/ppl/functions/string.md

[style] ~92-~92: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ... There are two wildcards often used in conjunction with the LIKE operator: * % - The percent ...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

[style] ~123-~123: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...e**. There are two wildcards often used in conjunction with the ILIKE operator: * % - The percent...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

[uncategorized] ~527-~527: Do not mix variants of the same word (‘regexp’ and ‘regex’) within a single text.
Context: ...XP_REPLACE ### Description Usage: regexp_replace(str, pattern, replacement) rep...

(EN_WORD_COHERENCY)

docs/user/ppl/cmd/rename.md

[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...is removed from the search results. * *Renaming an existing field to an existing field...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~29-~29: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...arget-field>` | Required | The name you want to rename to. Must contain the same number...

(REP_WANT_TO_VB)

docs/user/ppl/functions/datetime.md

[grammar] ~140-~140: Ensure spelling is correct
Context: ...n the format yyyy-MM-dd HH:mm:ss or the timeszone is not in (+/-)HH:mm. It also is invali...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~140-~140: This phrase is redundant. Consider using “outside”.
Context: ...h and invalid timezones, which are ones outside of -13:59 and +14:00. Argument type: `...

(OUTSIDE_OF)

[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...

(QB_NEW_EN_HYPHEN)

[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...

(QB_NEW_EN_HYPHEN)

[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...

(QB_NEW_EN_HYPHEN)

[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/functions/math.md

[style] ~951-~951: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...N is specified, the seed is initialized prior to execution. One implication of this beha...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

docs/user/ppl/cmd/search.md

[style] ~55-~55: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... using the implicit @timestamp field. Time modifiers support the following formats...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[grammar] ~695-~695: Use a hyphen to join words.
Context: ...----+ ``` ### Time rounding Use time rounding expressions to filter events re...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/top.md

[style] ~23-~23: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...null: Whether to output nullvalues. Default is the value ofplugins.ppl.syntax.leg...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

docs/user/ppl/functions/ip.md

[grammar] ~65-~65: Use a hyphen to join words.
Context: ...options is an optional String of comma separated fields to output: the selectio...

(QB_NEW_EN_HYPHEN)

docs/user/ppl/cmd/patterns.md

[style] ~68-~68: Opting for a less wordy alternative here may improve the clarity of your writing.
Context: ...bled, the Calcite engine's label mode not only labels the text pattern but also assigns numbered placeholders to variable token...

(NOT_ONLY_ALSO)

🪛 markdownlint-cli2 (0.18.1)

docs/user/ppl/cmd/multisearch.md

41-41: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

docs/user/ppl/cmd/ad.md

55-55: Multiple headings with the same content

(MD024, no-duplicate-heading)

docs/user/ppl/cmd/trendline.md

20-20: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

22-22: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

docs/user/ppl/cmd/table.md

21-21: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

docs/user/ppl/functions/datetime.md

164-164: Multiple headings with the same content

(MD024, no-duplicate-heading)

184-184: Multiple headings with the same content

(MD024, no-duplicate-heading)

204-204: Multiple headings with the same content

(MD024, no-duplicate-heading)

224-224: Multiple headings with the same content

(MD024, no-duplicate-heading)

244-244: Multiple headings with the same content

(MD024, no-duplicate-heading)

264-264: Multiple headings with the same content

(MD024, no-duplicate-heading)

284-284: Multiple headings with the same content

(MD024, no-duplicate-heading)

304-304: Multiple headings with the same content

(MD024, no-duplicate-heading)

324-324: Multiple headings with the same content

(MD024, no-duplicate-heading)

344-344: Multiple headings with the same content

(MD024, no-duplicate-heading)

692-692: Multiple headings with the same content

(MD024, no-duplicate-heading)

docs/user/ppl/cmd/lookup.md

51-51: Multiple headings with the same content

(MD024, no-duplicate-heading)

73-73: Multiple headings with the same content

(MD024, no-duplicate-heading)

docs/user/ppl/cmd/rare.md

7-7: Blank line inside blockquote

(MD028, no-blanks-blockquote)

docs/user/ppl/cmd/join.md

218-218: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

219-219: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

220-220: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

221-221: Table column count
Expected: 2; Actual: 5; Too many cells, extra data will be missing

(MD056, table-column-count)

docs/user/ppl/functions/json.md

58-58: Spaces inside code span elements

(MD038, no-space-in-code)

59-59: Spaces inside code span elements

(MD038, no-space-in-code)

docs/user/ppl/cmd/syntax.md

69-69: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

79-79: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

89-89: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)

89-89: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)

90-90: Bare URL used

(MD034, no-bare-urls)

90-90: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

91-91: Bare URL used

(MD034, no-bare-urls)

91-91: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

92-92: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

93-93: Bare URL used

(MD034, no-bare-urls)

93-93: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

95-95: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

104-104: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)

104-104: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)

105-105: Bare URL used

(MD034, no-bare-urls)

docs/user/ppl/cmd/ml.md

51-51: Multiple headings with the same content

(MD024, no-duplicate-heading)

73-73: Multiple headings with the same content

(MD024, no-duplicate-heading)

docs/user/ppl/cmd/fields.md

21-21: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

docs/user/ppl/functions/ip.md

65-65: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: build-linux (21, unit)
GitHub Check: build-linux (25, integration)
GitHub Check: build-linux (21, integration)
GitHub Check: build-linux (25, unit)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)

docs/user/ppl/cmd/addcoltotals.md

coderabbitai · 2026-01-07T18:04:16Z

docs/user/ppl/cmd/eval.md

+## Parameters
+
+The `eval` command supports the following parameters.
+
+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `<field>` | Required | The name of the field to create or update. If the field does not exist, a new field is added. If it already exists, its value is overwritten. |
+| `<expression>` | Required | The expression to evaluate. |  


🛠️ Refactor suggestion | 🟠 Major

Parameter table formatting appears incomplete.

The parameter table on lines 20–23 has a trailing space issue on line 23 (<expression> row has extra spaces/content after the Description column). While this may render correctly in Markdown, ensure the table structure is clean:

| Parameter | Required/Optional | Description | | --- | --- | --- | | `<field>` | Required | The name of the field to create or update. If the field does not exist, a new field is added. If it already exists, its value is overwritten. | | `<expression>` | Required | The expression to evaluate. |

The trailing spaces on line 23 should be removed for consistency.

🧹 Proposed fix for table formatting

| Parameter | Required/Optional | Description | | --- | --- | --- | | `<field>` | Required | The name of the field to create or update. If the field does not exist, a new field is added. If it already exists, its value is overwritten. | -| `<expression>` | Required | The expression to evaluate. | +| `<expression>` | Required | The expression to evaluate. |

🤖 Prompt for AI Agents

In @docs/user/ppl/cmd/eval.md around lines 16 - 23, The markdown table row for the `<expression>` parameter has trailing whitespace after the final pipe which can cause inconsistent rendering; edit the table so the `<expression>` row exactly matches the other rows (remove the extra spaces after the final `|` and ensure the row is `| `<expression>` | Required | The expression to evaluate. |`), and verify the header and separator rows (`| Parameter | Required/Optional | Description |` and `| --- | --- | --- |`) are properly aligned with no extra trailing spaces.

coderabbitai · 2026-01-07T18:04:16Z

docs/user/ppl/cmd/fillnull.md

+    ```sql
+      # This FAILS - same value for mixed-type fields
+      source=accounts | fillnull value=0 firstname, age
+      # ERROR: fillnull failed: replacement value type INTEGER is not compatible with field 'firstname' (type: VARCHAR). The replacement value type must match the field type.
+    ```


⚠️ Potential issue | 🔴 Critical

Incorrect language marker for PPL syntax in Limitations example.

Line 191 marks the code block as sql, but the example uses PPL syntax (source=accounts |). This should be marked as ppl.

🐛 Proposed fix

- ```sql + ```ppl # This FAILS - same value for mixed-type fields source=accounts | fillnull value=0 firstname, age # ERROR: fillnull failed: replacement value type INTEGER is not compatible with field 'firstname' (type: VARCHAR). The replacement value type must match the field type. - ``` + ```

🤖 Prompt for AI Agents

In @docs/user/ppl/cmd/fillnull.md around lines 191 - 195, Change the code block language marker for the example that uses PPL syntax (the block starting with the comment "# This FAILS - same value for mixed-type fields" and the line "source=accounts | fillnull value=0 firstname, age") from ```sql to ```ppl so the snippet is correctly highlighted as PPL; leave the block contents and closing backticks unchanged.

coderabbitai · 2026-01-07T18:04:16Z

docs/user/ppl/cmd/join.md

+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |
+| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |  
+| `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
+| `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. |
+| `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. |
+| `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). |
+| `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. |


⚠️ Potential issue | 🟠 Major

Clarify that the max parameter has no effect for semi and anti join types.

The documentation should note that while max is listed as an optional parameter applying to all join types, it has no effect for semi and anti join types. These types only use the right dataset to determine record existence and filter accordingly, regardless of the max value.

Proposed documentation update

| `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). | + | `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). **Note:** This parameter has no effect for `semi` and `anti` join types, which only filter records based on existence of matches in the right dataset. |

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In @docs/user/ppl/cmd/join.md around lines 72 - 80, Update the description for the `max` parameter to explicitly state that it has no effect for `semi` and `anti` join types; modify the `<max>` table row (the "max | Optional | ..." entry) to add a sentence like "Ignored for `semi` and `anti` joins, which only test existence and do not honor result limits." so readers know `max` applies to other join types but is ignored for `semi`/`anti`.

docs/user/ppl/cmd/join.md

docs/user/ppl/functions/condition.md

docs/user/ppl/functions/datetime.md

coderabbitai · 2026-01-07T18:04:17Z

docs/user/ppl/functions/ip.md

 - `dataSourceName` must be an established dataSource on OpenSearch GeoSpatial plugin, detail of configuration can be found: https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/  
 - `ip` can be an IPv4 or an IPv6 address  
- - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema.  For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"  
+ - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema.  For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"  


⚠️ Potential issue | 🟡 Minor

Fix markdown indentation and hyphenation.

Line 65 has two issues:

Markdown indentation: Remove the leading space before the - to align with the list items above (lines 33–34).

Grammar: Change comma separated to comma-separated (hyphenate the compound adjective).

✏️ Proposed fix

- - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" + - `options` is an optional String of comma-separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"

- `options` is an optional String of comma-separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"

🧰 Tools

🪛 LanguageTool

[grammar] ~65-~65: Use a hyphen to join words.
Context: ...options is an optional String of comma separated fields to output: the selectio...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

65-65: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

🤖 Prompt for AI Agents

In @docs/user/ppl/functions/ip.md at line 65, The list item describing `options` has a leading space before the hyphen and uses "comma separated" without a hyphen; remove the extra leading space so the `-` aligns with the other list items and change "comma separated" to "comma-separated" in the sentence that starts with `- ` and mentions `options` and the example fields ("country_iso_code", "country_name", ...).

coderabbitai · 2026-01-07T18:04:17Z

docs/user/ppl/functions/json.md

+**Argument type:** `STRING  `
+**Return type:** `BOOLEAN  `


⚠️ Potential issue | 🟡 Minor

Remove trailing spaces inside code spans (MD038).

Lines 58–59 contain extra spaces within the inline code elements. Trim these to comply with markdown linting standards.

🐛 Proposed fix

- **Argument type:** `STRING ` - **Return type:** `BOOLEAN ` + **Argument type:** `STRING` + **Return type:** `BOOLEAN`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**Argument type:** `STRING `

**Return type:** `BOOLEAN `

**Argument type:** `STRING`

**Return type:** `BOOLEAN`

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

58-58: Spaces inside code span elements

(MD038, no-space-in-code)

59-59: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents

In @docs/user/ppl/functions/json.md around lines 58 - 59, Remove the trailing spaces inside the inline code spans for the type annotations—change the backtick-wrapped tokens shown as `STRING ` and `BOOLEAN ` to `STRING` and `BOOLEAN` respectively (i.e., trim the extra spaces inside the code spans) so the markdown no longer contains spaces inside the inline code elements.

coderabbitai · 2026-01-07T18:04:17Z

scripts/docs_exporter/export_to_docs_website.py

+    # Filter to only specified directories if provided
+    if only_dirs:
+        md_files = [
+            f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs
+        ]
+
+    # Filter to only specified directories if provided
+    if only_dirs:
+        md_files = [
+            f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs
+        ]


⚠️ Potential issue | 🟡 Minor

Duplicate filtering block should be removed.

The only_dirs filtering logic is duplicated. Lines 322-325 and lines 328-331 perform the exact same filtering operation.

🔧 Suggested fix

# Filter to only specified directories if provided if only_dirs: md_files = [ f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs ] - # Filter to only specified directories if provided - if only_dirs: - md_files = [ - f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs - ] - # Group files by directory for local nav_order

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Filter to only specified directories if provided

if only_dirs:

md_files = [

f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs

]

# Filter to only specified directories if provided

if only_dirs:

md_files = [

f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs

]

# Filter to only specified directories if provided

if only_dirs:

md_files = [

f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs

]

🤖 Prompt for AI Agents

In @scripts/docs_exporter/export_to_docs_website.py around lines 321 - 331, Remove the duplicated only_dirs filtering block: there are two identical if only_dirs: md_files = [f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs] sections; keep a single instance and delete the redundant copy so md_files is filtered only once using only_dirs, referencing the variables md_files, only_dirs, and source_dir in the existing function.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @docs/user/ppl/cmd/join.md:
- Around line 72-81: Update the `type` parameter's valid-values list in the
table so it explicitly includes `inner` (e.g., change the existing "left, outer
(same as left), semi, anti, and performance-sensitive types (right, full, and
cross)" string to also list `inner`), ensuring the documented valid values match
the stated default; edit the row for the `type` parameter in the join table (the
`type` parameter entry) to add `inner` to the comma-separated valid values.
- Around line 41-47: The documentation omits "inner" from the joinType
valid-values list while stating the default is inner; update the joinType row to
explicitly include "inner" among valid values (e.g., `inner`, `left`, `semi`,
`anti`, and performance-sensitive types `right`, `full`, `cross`) or
alternatively change the description to clarify that "inner" is the implicit
default behavior—modify the `joinType` entry in the table to reflect this and
ensure the default remains stated as `inner`.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bac0202 and eae078b.

📒 Files selected for processing (1)

docs/user/ppl/cmd/join.md

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

📚 Learning: 2025-12-11T05:27:39.856Z

Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

docs/user/ppl/cmd/join.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: build-linux (21, integration)
GitHub Check: bwc-tests-rolling-upgrade (25)
GitHub Check: build-linux (25, unit)
GitHub Check: build-linux (25, integration)
GitHub Check: build-linux (25, doc)
GitHub Check: build-linux (21, unit)
GitHub Check: build-linux (21, doc)
GitHub Check: bwc-tests-full-restart (21)
GitHub Check: bwc-tests-full-restart (25)
GitHub Check: bwc-tests-rolling-upgrade (21)
GitHub Check: security-it-linux (21)
GitHub Check: security-it-linux (25)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: build-windows-macos (macos-14, 25, doc)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: build-windows-macos (macos-14, 25, unit)
GitHub Check: build-windows-macos (macos-14, 21, doc)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: security-it-windows-macos (windows-latest, 25)
GitHub Check: security-it-windows-macos (macos-14, 21)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
GitHub Check: security-it-windows-macos (windows-latest, 21)
GitHub Check: security-it-windows-macos (macos-14, 25)
GitHub Check: CodeQL-Scan (java)

🔇 Additional comments (3)

docs/user/ppl/cmd/join.md (3)

218-221: Pipe escaping in table cells is correct.

The unescaped pipes that triggered markdownlint MD056 errors in the prior review have been properly escaped with \| in the table rows, aligning with repository documentation standards. This fix resolves the column count parsing issues.

84-97: Configuration section is well-structured.

The new Configuration section clearly documents the plugins.ppl.join.subsearch_maxout setting with appropriate defaults and includes a concrete example for updating it. This aligns with the documentation standards established across the PR.

1-207: Documentation structure aligns well with PR standards.

The file has been thoughtfully restructured with clear Syntax sections (Basic and Extended), dedicated parameter tables, a Configuration section, consistently-captioned examples, and a structured Limitations section. This standardized format is consistent with other PPL command documentation updates in this PR.

coderabbitai · 2026-01-07T19:12:04Z

docs/user/ppl/cmd/join.md

+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |
+| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |
+| `joinType` | Optional | The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
+| `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. |
+| `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. |


🛠️ Refactor suggestion | 🟠 Major

Add inner to the list of valid join types.

Line 45 states the default is inner but the valid values list omits it. Either inner should be explicitly listed as a valid value, or the default should be clarified if inner is derived behavior.

- | `joinType` | Optional | The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | + | `joinType` | Optional | The type of join to perform. Valid values are `inner`, `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |

🤖 Prompt for AI Agents

In @docs/user/ppl/cmd/join.md around lines 41 - 47, The documentation omits "inner" from the joinType valid-values list while stating the default is inner; update the joinType row to explicitly include "inner" among valid values (e.g., `inner`, `left`, `semi`, `anti`, and performance-sensitive types `right`, `full`, `cross`) or alternatively change the description to clarify that "inner" is the implicit default behavior—modify the `joinType` entry in the table to reflect this and ensure the default remains stated as `inner`.

coderabbitai · 2026-01-07T19:12:04Z

docs/user/ppl/cmd/join.md

+| Parameter | Required/Optional | Description |
+| --- | --- | --- |
+| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |
+| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |  
+| `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
+| `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. |
+| `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. |
+| `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). |
+| `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. |
+| `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. |


🛠️ Refactor suggestion | 🟠 Major

Add inner to the list of valid join types in extended syntax.

Line 76 states the default is inner but the valid values list omits it. For consistency with the basic syntax parameters, inner should be explicitly listed.

- | `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | + | `type` | Optional | The join type when using extended syntax. Valid values are `inner`, `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| Parameter | Required/Optional | Description |

| --- | --- | --- |

| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |

| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |

| `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |

| `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. |

| `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. |

| `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). |

| `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. |

| `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. |

| Parameter | Required/Optional | Description |

| --- | --- | --- |

| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |

| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |

| `type` | Optional | The join type when using extended syntax. Valid values are `inner`, `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |

| `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. |

| `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. |

| `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). |

| `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. |

| `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. |

🤖 Prompt for AI Agents

In @docs/user/ppl/cmd/join.md around lines 72 - 81, Update the `type` parameter's valid-values list in the table so it explicitly includes `inner` (e.g., change the existing "left, outer (same as left), semi, anti, and performance-sensitive types (right, full, and cross)" string to also list `inner`), ensuring the documented valid values match the stated default; edit the row for the `type` parameter in the join table (the `type` parameter entry) to add `inner` to the comma-separated valid values.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @docs/user/ppl/functions/datetime.md:
- Line 140: The sentence describing convert_tz has redundant wording: change
"which are ones outside of -13:59 and +14:00" to the simplified phrase "which
are outside -13:59 and +14:00" in the documentation entry for convert_tz (the
Usage line referencing convert_tz(timestamp, from_timezone, to_timezone)) so the
sentence reads clearly and removes the redundant "ones" and "of".
- Line 1135: Replace the phrase "time of day value" with the hyphenated compound
"time-of-day value" in the sentence describing hour(time) so the compound
adjective is clear; update the sentence that reads "Different from the time of
day value, the time value has a large range..." to use "Different from the
time-of-day value..." while keeping the rest of the text intact and preserving
the reference to hour(time).
- Line 1162: Update the HOUR_OF_DAY description to hyphenate "time-of-day"
consistently: change the sentence "Different from the time of day value, the
time value has a large range..." to use "time-of-day" (e.g., "Different from the
time-of-day value, the time value has a large range...") and ensure the function
name hour_of_day(time) / HOUR_OF_DAY remains unchanged.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eae078b and 7b7733f.

📒 Files selected for processing (3)

docs/user/ppl/cmd/addcoltotals.md
docs/user/ppl/functions/condition.md
docs/user/ppl/functions/datetime.md

🚧 Files skipped from review as they are similar to previous changes (1)

docs/user/ppl/cmd/addcoltotals.md

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

🪛 LanguageTool

docs/user/ppl/functions/datetime.md

[style] ~140-~140: This phrase is redundant. Consider using “outside”.
Context: ...h and invalid timezones, which are ones outside of -13:59 and +14:00. Argument type: `...

(OUTSIDE_OF)

[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...

(QB_NEW_EN_HYPHEN)

[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...

(QB_NEW_EN_HYPHEN)

[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...

(QB_NEW_EN_HYPHEN)

[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

docs/user/ppl/functions/condition.md

218-218: Multiple headings with the same content

(MD024, no-duplicate-heading)

docs/user/ppl/functions/datetime.md

164-164: Multiple headings with the same content

(MD024, no-duplicate-heading)

184-184: Multiple headings with the same content

(MD024, no-duplicate-heading)

204-204: Multiple headings with the same content

(MD024, no-duplicate-heading)

224-224: Multiple headings with the same content

(MD024, no-duplicate-heading)

244-244: Multiple headings with the same content

(MD024, no-duplicate-heading)

264-264: Multiple headings with the same content

(MD024, no-duplicate-heading)

284-284: Multiple headings with the same content

(MD024, no-duplicate-heading)

304-304: Multiple headings with the same content

(MD024, no-duplicate-heading)

324-324: Multiple headings with the same content

(MD024, no-duplicate-heading)

344-344: Multiple headings with the same content

(MD024, no-duplicate-heading)

692-692: Multiple headings with the same content

(MD024, no-duplicate-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: security-it-linux (21)
GitHub Check: security-it-linux (25)
GitHub Check: build-linux (25, doc)
GitHub Check: build-linux (21, unit)
GitHub Check: build-linux (21, integration)
GitHub Check: build-linux (25, unit)
GitHub Check: build-linux (21, doc)
GitHub Check: build-linux (25, integration)
GitHub Check: bwc-tests-rolling-upgrade (25)
GitHub Check: bwc-tests-rolling-upgrade (21)
GitHub Check: bwc-tests-full-restart (21)
GitHub Check: bwc-tests-full-restart (25)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: build-windows-macos (macos-14, 25, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: build-windows-macos (macos-14, 21, doc)
GitHub Check: build-windows-macos (macos-14, 25, doc)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: security-it-windows-macos (windows-latest, 21)
GitHub Check: security-it-windows-macos (macos-14, 25)
GitHub Check: security-it-windows-macos (macos-14, 21)
GitHub Check: security-it-windows-macos (windows-latest, 25)
GitHub Check: CodeQL-Scan (java)

🔇 Additional comments (2)

docs/user/ppl/functions/condition.md (2)

214-238: Excellent documentation structure and content quality.

The file demonstrates consistent, well-structured documentation across all condition functions. Usage syntax is properly code-formatted, argument/return types are clearly labeled, and examples include expected output tables for clarity. The IFNULL nested pattern section (lines 214–238) is particularly well-documented with contextual notes about OpenSearch version compatibility.

The backtick placement issue at line 217 from the previous review has been correctly resolved.

1-829: Address duplicate heading linting warning.

The static analysis tool flagged line 218's ### Example heading as a duplicate of earlier heading content in the file (MD024). While repeated ### Example headings are standard in technical documentation and functionally harmless, this warning suggests the project's linter is configured to flag them. Consider verifying with project contributors whether this is a style preference to address or an expected linting exception for documentation files.

coderabbitai · 2026-01-07T19:16:51Z

docs/user/ppl/functions/datetime.md

-Usage: convert_tz(timestamp, from_timezone, to_timezone) constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timeszone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00.
-Argument type: TIMESTAMP/STRING, STRING, STRING
-Return type: TIMESTAMP
+Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00.


⚠️ Potential issue | 🟡 Minor

Complete the previously requested grammar fix for line 140.

The past review requested simplifying the redundant phrasing in this sentence. While the spelling error ("timeszone" → "timezone") was correctly fixed, the redundant phrase "which are ones outside of" should be simplified to "which are outside" as originally suggested.

Proposed fix

- CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00. + CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are outside -13:59 and +14:00.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00.

Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are outside -13:59 and +14:00.

🧰 Tools

🪛 LanguageTool

[style] ~140-~140: This phrase is redundant. Consider using “outside”.
Context: ...h and invalid timezones, which are ones outside of -13:59 and +14:00. Argument type: `...

(OUTSIDE_OF)

🤖 Prompt for AI Agents

In @docs/user/ppl/functions/datetime.md at line 140, The sentence describing convert_tz has redundant wording: change "which are ones outside of -13:59 and +14:00" to the simplified phrase "which are outside -13:59 and +14:00" in the documentation entry for convert_tz (the Usage line referencing convert_tz(timestamp, from_timezone, to_timezone)) so the sentence reads clearly and removes the redundant "ones" and "of".

coderabbitai · 2026-01-07T19:16:51Z

docs/user/ppl/functions/datetime.md

-Usage: hour(time) extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.
-Argument type: STRING/TIME/TIMESTAMP
-Return type: INTEGER
+Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.


⚠️ Potential issue | 🟡 Minor

Use hyphens to join compound adjectives.

The phrase "time of day value" should use hyphens to clarify that "time-of-day" is a single compound modifier: "the time value has a large range... so the return value of hour(time) can be also greater than 23."

Proposed fix

- Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. + Usage: `hour(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.

Usage: `hour(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.

🧰 Tools

🪛 LanguageTool

[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...

(QB_NEW_EN_HYPHEN)

[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

In @docs/user/ppl/functions/datetime.md at line 1135, Replace the phrase "time of day value" with the hyphenated compound "time-of-day value" in the sentence describing hour(time) so the compound adjective is clear; update the sentence that reads "Different from the time of day value, the time value has a large range..." to use "Different from the time-of-day value..." while keeping the rest of the text intact and preserving the reference to hour(time).

coderabbitai · 2026-01-07T19:16:51Z

docs/user/ppl/functions/datetime.md

-Usage: hour_of_day(time) extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.
-Argument type: STRING/TIME/TIMESTAMP
-Return type: INTEGER
+Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.


⚠️ Potential issue | 🟡 Minor

Apply the same hyphenation fix to the HOUR_OF_DAY description.

For consistency and grammar correctness, apply the same hyphenation to "time-of-day" here as well.

Proposed fix

- Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. + Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.

Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.

🧰 Tools

🪛 LanguageTool

[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...

(QB_NEW_EN_HYPHEN)

[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

In @docs/user/ppl/functions/datetime.md at line 1162, Update the HOUR_OF_DAY description to hyphenate "time-of-day" consistently: change the sentence "Different from the time of day value, the time value has a large range..." to use "time-of-day" (e.g., "Different from the time-of-day value, the time value has a large range...") and ensure the function name hour_of_day(time) / HOUR_OF_DAY remains unchanged.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

docs/user/ppl/cmd/search.md (2)
53-56: Reduce repetitive language in the time modifiers introduction.

Lines 55-56 both open with "Time modifiers," creating redundancy. Consolidate or rephrase the second sentence to avoid repetition.
📝 Suggested fix
 ### Time modifiers
 
-Time modifiers filter search results by a time range using the implicit `@timestamp` field. Time modifiers support the following formats.
+Time modifiers filter search results by a time range using the implicit `@timestamp` field. The following formats are supported:
693-703: Add hyphen to compound modifier "Time-rounding".

The heading at line 695 should use a hyphenated form for the compound adjective.
📝 Suggested fix
-### Time rounding
+### Time-rounding

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b7733f and 445b1fb.

📒 Files selected for processing (4)

docs/user/ppl/cmd/rex.md
docs/user/ppl/cmd/search.md
docs/user/ppl/cmd/stats.md
docs/user/ppl/cmd/timechart.md

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

🪛 LanguageTool

docs/user/ppl/cmd/search.md

[style] ~55-~55: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... using the implicit @timestamp field. Time modifiers support the following formats...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[grammar] ~695-~695: Use a hyphen to join words.
Context: ...----+ ``` ### Time rounding Use time rounding expressions to filter events re...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: build-linux (21, unit)
GitHub Check: build-linux (25, integration)
GitHub Check: build-linux (25, doc)
GitHub Check: build-linux (21, integration)
GitHub Check: bwc-tests-full-restart (21)
GitHub Check: build-linux (21, doc)
GitHub Check: build-linux (25, unit)
GitHub Check: bwc-tests-rolling-upgrade (25)
GitHub Check: bwc-tests-rolling-upgrade (21)
GitHub Check: bwc-tests-full-restart (25)
GitHub Check: security-it-linux (25)
GitHub Check: security-it-linux (21)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 25, unit)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
GitHub Check: build-windows-macos (macos-14, 21, doc)
GitHub Check: build-windows-macos (macos-14, 21, integration)
GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
GitHub Check: build-windows-macos (macos-14, 21, unit)
GitHub Check: build-windows-macos (macos-14, 25, doc)
GitHub Check: build-windows-macos (macos-14, 25, integration)
GitHub Check: security-it-windows-macos (windows-latest, 25)
GitHub Check: security-it-windows-macos (macos-14, 25)
GitHub Check: security-it-windows-macos (windows-latest, 21)
GitHub Check: security-it-windows-macos (macos-14, 21)
GitHub Check: CodeQL-Scan (java)

🔇 Additional comments (3)

docs/user/ppl/cmd/stats.md (1)

6-8: Cross-reference link is correct. The section "Comparing stats, eventstats, and streamstats" exists in streamstats.md at the exact anchor path referenced in the link, so the internal documentation link will resolve properly.

docs/user/ppl/cmd/search.md (1)

520-529: Remove this comment—the table alignment is correct.

The table formatting in lines 520-529 shows properly aligned query output. All columns maintain consistent column widths and separator positions across the header, separator, and data rows (11 dashes / 5 dashes / 8 dashes in each segment). The content aligns correctly within these boundaries.

docs/user/ppl/cmd/rex.md (1)

1-298: Documentation is well-structured and pedagogically sound.

The reformat aligns well with the PR objectives. The document follows a clear progression from basic concepts through increasingly complex examples, with explicit coverage of edge cases and error scenarios. The consistent formatting across all nine examples (descriptive headings, explanatory text, PPL queries, and output) enhances readability. Error documentation in Examples 8 and 9 is particularly strong—showing both invalid queries and the exact error messages users will encounter provides immediate, actionable guidance.

The relative path reference to ../../admin/settings.rst correctly resolves to docs/user/admin/settings.rst, and the command is properly linked in the PPL index with appropriate description.

kylehounslow and others added 2 commits December 24, 2025 11:39

Format PPL cmd/ docs for documentation-website.

a3a8f74

Signed-off-by: Kyle Hounslow <kylhouns@amazon.com>

apply feedback from documentation website to docs

05a83e9

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

ritvibhatt and others added 2 commits January 5, 2026 10:25

fix links

826b66d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

Merge branch 'main' into apply-docs-website-feedback

bac68f3

Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

kylehounslow mentioned this pull request Jan 5, 2026

[DOC] PPL docs formatting for documentation-website #4966

Closed

8 tasks

Swiddis added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 5, 2026

ritvibhatt added 4 commits January 6, 2026 16:30

fix links and example formats

57d82f8

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix top.md example format and patterns wildcard escaping

b129e16

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix jekyll tag conversion

ce93f50

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update exporter script readme

bac0202

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

ritvibhatt marked this pull request as ready for review January 7, 2026 17:56

ritvibhatt requested review from GumpacG, MaxKsyunz, Swiddis, YANG-DB, Yury-Fridlyand, acarbonetto, anirudha, dai-chen, derek-ho, forestmvey, joshuali925, kavithacm, mengweieric, penghuo, ps48, seankao-az and vamsimanohar as code owners January 7, 2026 17:57

ritvibhatt requested review from LantaoJin, RyanL1997, noCharger, qianheng-aws, ykmr1224 and yuancu as code owners January 7, 2026 17:57

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

fix join table formatting

eae078b

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

fix typos

7b7733f

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

fix rex note

445b1fb

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

	- `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"
	- `options` is an optional String of comma-separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"

	Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.
	Usage: `hour(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.

	Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.
	Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.

Apply feedback from documentation-website to PPL command docs #4997

Are you sure you want to change the base?

Apply feedback from documentation-website to PPL command docs #4997

Uh oh!

Conversation

ritvibhatt commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ritvibhatt commented Dec 24, 2025 •

edited

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading