-
Notifications
You must be signed in to change notification settings - Fork 181
Apply feedback from documentation-website to PPL command docs #4997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Apply feedback from documentation-website to PPL command docs #4997
Conversation
Signed-off-by: Kyle Hounslow <kylhouns@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
📝 WalkthroughWalkthroughReformats ~50 PPL command and function docs into consistent Syntax/Parameters/Examples layouts, expands and splits ML/AD docs into time-series vs non-time-series variants, and upgrades the docs exporter tool with directory mappings, title overrides, SQL-table→Markdown conversion, redirect handling, and a CLI for selective exports. Only exporter script contains code changes. Changes
Sequence Diagram(s)sequenceDiagram
participant User as CLI user
participant CLI as export_to_docs_website CLI
participant Exporter as export_to_docs_website.py
participant FS as File System (source/target)
participant Processor as Markdown processing pipeline
User->>CLI: run script (--yes / --only-dirs)
CLI->>Exporter: call export_docs(auto_yes, only_dirs)
Exporter->>FS: read source markdown files
Exporter->>Processor: convert_tables_in_code_blocks(content)
Processor-->>Exporter: transformed content (SQL tables → Markdown)
Exporter->>Processor: process_content(content) (link/path rewrites, Jekyll notes, redirects)
Processor-->>Exporter: processed content
Exporter->>FS: write target files with front-matter and redirects
Exporter->>FS: generate per-dir index and redirects
Exporter-->>CLI: exit status / summary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/user/ppl/functions/collection.md (1)
629-691: Apply standardized formatting to MVFIND and MVMAP type labels and example headers.Lines 629–691 (MVFIND) and 809–875 (MVMAP) are missing the bold and inline-code formatting consistently applied throughout the file. Update
Argument type:andReturn type:to use**Argument type:**with backtick-wrapped values, and changeExampleheaders to### Exampleto match other functions like MVINDEX (line 710).Changes required for MVFIND (lines 634–641):
Change: Argument type: array: ARRAY, regex: STRING Return type: INTEGER (nullable) Example To: **Argument type:** `array: ARRAY, regex: STRING` **Return type:** `INTEGER (nullable)` ### ExampleApply the same formatting updates to MVMAP (lines 814–820).
🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/addcoltotals.md:
- Line 27: Change the heading for "Example 1" to match the other examples by
replacing "### Example 1: Basic example" with "## Example 1: Basic example" so
all examples ("Example 1", "Example 2", "Example 3") use the same heading level
for consistency.
In @docs/user/ppl/cmd/eval.md:
- Around line 16-23: The markdown table row for the `<expression>` parameter has
trailing whitespace after the final pipe which can cause inconsistent rendering;
edit the table so the `<expression>` row exactly matches the other rows (remove
the extra spaces after the final `|` and ensure the row is `| `<expression>` |
Required | The expression to evaluate. |`), and verify the header and separator
rows (`| Parameter | Required/Optional | Description |` and `| --- | --- | ---
|`) are properly aligned with no extra trailing spaces.
In @docs/user/ppl/cmd/fillnull.md:
- Around line 191-195: Change the code block language marker for the example
that uses PPL syntax (the block starting with the comment "# This FAILS - same
value for mixed-type fields" and the line "source=accounts | fillnull value=0
firstname, age") from ```sql to ```ppl so the snippet is correctly highlighted
as PPL; leave the block contents and closing backticks unchanged.
In @docs/user/ppl/cmd/join.md:
- Around line 72-80: Update the description for the `max` parameter to
explicitly state that it has no effect for `semi` and `anti` join types; modify
the `<max>` table row (the "max | Optional | ..." entry) to add a sentence like
"Ignored for `semi` and `anti` joins, which only test existence and do not honor
result limits." so readers know `max` applies to other join types but is ignored
for `semi`/`anti`.
- Around line 216-222: The table rows contain inline code blocks with unescaped
pipe characters (e.g., the queries like `source=table1 | join left=t1 right=t2
on t1.id=t2.id table2 | eval a = 1`, `source=table1 | join on
table1.id=table2.id table2 | eval a = 1`, `source=table1 | join on
table1.id=t2.id table2 as t2 | eval a = 1`, and `source=table1 | join right=tt
on table1.id=t2.id [ source=table2 as t2 | eval b = id ] | eval a = 1`) which
break Markdown table parsing; escape each pipe inside the inline code with a
backslash (use `\|`) so the inline code remains a single cell and matches
repository documentation style.
In @docs/user/ppl/cmd/syntax.md:
- Around line 88-94: The markdown table for the columns starting with
"account_number | firstname | address | balance | gender | city | employer |
state | age | email | lastname" has a malformed separator row and missing
trailing pipes on data rows; fix it by adding a leading pipe to the header
separator line (so the ":--- | :--- | ..." row begins with "|") and ensure each
data row ends with a trailing pipe (append "|" to rows containing "1 | Amber
... Duke", "6 | Hattie ... Bond", "13 | Nanette ... Bates", and "18 | Dale ...
Adams") to match the other tables' formatting and satisfy markdownlint.
- Around line 103-106: The Markdown table has an inconsistent separator and row
pipe usage: ensure the separator line matches the header by adding a leading
pipe to the separator (change ":--- | :--- | ..." to "| :--- | :--- | ...") and
add trailing pipes to each data row so every row ends with a pipe (e.g., ensure
the rows matching "| 1 | Amber | 880 Holmes Lane | ... | Duke |" and "| 13 |
Nanette | ..." both end with a trailing "|").
In @docs/user/ppl/cmd/timechart.md:
- Around line 196-203: The example description for "Example 5" is inconsistent:
it says "1-second interval" but the query uses timechart with span=1h; update
the prose above the code block to say "1-hour interval" (or change the query to
use span=1s if the intent was seconds) so the description matches the query
(refer to the Example 5 heading and the timechart line containing "span=1h
count() by category").
In @docs/user/ppl/functions/condition.md:
- Around line 214-218: The usage line for the nested IFNULL example has a
misplaced closing backtick that breaks the code span; update the text so the
entire nested call starting with ifnull(field1, ifnull(field2, ifnull(field3,
default_value))) is wrapped in a single code span (move the closing backtick to
after the final parenthesis) so the IFNULL nesting is rendered as one inline
code block.
In @docs/user/ppl/functions/datetime.md:
- Line 140: The CONVERT_TZ documentation sentence for the convert_tz(timestamp,
from_timezone, to_timezone) function has a typo and awkward/redundant phrasing;
update the text to fix "timeszone" → "timezone" and simplify the redundant
clause so it reads e.g. "CONVERT_TZ returns null when any argument is invalid —
the timestamp is not in format yyyy-MM-dd HH:mm:ss, the timezone is not in
(+/-)HH:mm, or the date/time is invalid (e.g., February 30th) — and invalid
timezones are those outside -13:59 to +14:00." Ensure you edit the CONVERT_TZ
description line accordingly.
In @docs/user/ppl/functions/ip.md:
- Line 65: The list item describing `options` has a leading space before the
hyphen and uses "comma separated" without a hyphen; remove the extra leading
space so the `-` aligns with the other list items and change "comma separated"
to "comma-separated" in the sentence that starts with `- ` and mentions
`options` and the example fields ("country_iso_code", "country_name", ...).
In @docs/user/ppl/functions/json.md:
- Around line 58-59: Remove the trailing spaces inside the inline code spans for
the type annotations—change the backtick-wrapped tokens shown as `STRING ` and
`BOOLEAN ` to `STRING` and `BOOLEAN` respectively (i.e., trim the extra spaces
inside the code spans) so the markdown no longer contains spaces inside the
inline code elements.
In @scripts/docs_exporter/export_to_docs_website.py:
- Around line 321-331: Remove the duplicated only_dirs filtering block: there
are two identical if only_dirs: md_files = [f for f in md_files if
f.relative_to(source_dir).parts[0] in only_dirs] sections; keep a single
instance and delete the redundant copy so md_files is filtered only once using
only_dirs, referencing the variables md_files, only_dirs, and source_dir in the
existing function.
🧹 Nitpick comments (22)
docs/user/ppl/cmd/bin.md (1)
24-24: Align link text with section heading for consistency.Line 24 references "Align options", but the actual section heading (line 50) is "Align time options". While the anchor reference is correct, the link text should match the section title exactly for clarity and consistency.
📝 Proposed fix
-| `aligntime` | Optional | Align the bin times for time-based fields. Valid only for time-based discretization. Valid values are `earliest`, `latest`, or a specific time. See [Align options](#align-time-options).| +| `aligntime` | Optional | Align the bin times for time-based fields. Valid only for time-based discretization. Valid values are `earliest`, `latest`, or a specific time. See [Align time options](#align-time-options).|docs/user/ppl/cmd/fillnull.md (1)
184-195: Redundant explanation in Limitations section.Lines 189–190 repeat the type-matching constraint twice in consecutive sentences. Consider consolidating into a single, clearer statement to improve readability.
♻️ Suggested consolidation
The `fillnull` command has the following limitations: -* When applying the same value to all fields without specifying field names, all fields must be of the same type. For mixed types, use separate `fillnull` commands or explicitly specify fields. -* The replacement value type must match all field types in the field list. When applying the same value to multiple fields, all fields must be of the same type (all strings or all numeric). The following query shows the error that occurs when this rule is violated: +* The replacement value type must match all field types in the field list. When applying the same value to multiple fields, all fields must be of the same type (all strings or all numeric). For mixed types, use separate `fillnull` commands or explicitly specify fields. The following query shows the error that occurs when this rule is violated:docs/user/ppl/cmd/streamstats.md (2)
30-43: Align example-syntax code-block label with eventstats.md.The example-syntax block uses "
ppl ignore" while eventstats.md uses "syntax" for the same purpose. For consistency across related commands, consider aligning the label. If "ppl ignore" is intentional for validation reasons, clarify in comments or standardize usage.
79-79: Standardize link text for aggregation functions reference.The link text differs across files: streamstats and stats use "[Aggregation Functions]" while eventstats uses "[Functions]". Update line 79 to match the standard "[Aggregation Functions]" for consistency.
docs/user/ppl/cmd/eventstats.md (2)
42-42: Standardize parameter default wording across commands.Line 42 states "Default is determined by" for bucket_nullable, while stats.md and streamstats.md use "Default is the value of." Update to match the standard phrasing for consistency.
77-77: Standardize link text for aggregation functions reference.Line 77 uses link text "[Functions]" while stats.md and streamstats.md use "[Aggregation Functions]". Update for consistency across related command documentation.
docs/user/ppl/cmd/reverse.md (1)
8-16: Consider formalizing the Parameters section header.The documentation restructuring introduces formal
Syntax,Parameters, andExamplessections. While the Syntax section has a proper header (line 8), the Parameters section is only a descriptive sentence (line 16) without a## Parametersheader. For consistency with the standardization pattern, consider adding a formal Parameters section header even for commands with no parameters.♻️ Proposed format adjustment
The `reverse` command has the following syntax: ```syntax reverse+## Parameters
+
Thereversecommand takes no parameters.</details> </blockquote></details> <details> <summary>docs/user/ppl/functions/math.md (1)</summary><blockquote> `951-951`: **Consider simplifying wordy phrase for clarity.** Line 951 uses "the seed is initialized prior to execution," which could be more concise. Consider "before execution" as a simpler alternative. Based on static analysis, this is a minor style improvement that improves readability without affecting technical accuracy. </blockquote></details> <details> <summary>docs/user/ppl/functions/string.md (2)</summary><blockquote> `88-94`: **Consider simplifying wordy phrases in LIKE and ILIKE descriptions.** Lines 92 and 123 both use "often used in conjunction with," which could be more concise. Consider using "used with" instead for better readability. These are minor style improvements suggested by static analysis. The current phrasing is clear but slightly verbose. Also applies to: 122-126 --- `527-548`: **Clarify terminology: avoid mixing "regexp" and "regex" in comments.** The REGEXP_REPLACE function documentation mixes terminology: the function name is correct as `regexp_replace`, but comments should consistently use "regex" (the general term) rather than sometimes using "regexp." While the current mixing is not incorrect (PCRE = Perl Compatible Regular Expressions), standardizing on "regex" in explanatory text would improve clarity. For example, the introductory "Usage:" line could clarify that the pattern is a regex, and the note at the end could refer consistently to "regex patterns" rather than mixing terms. </blockquote></details> <details> <summary>docs/user/ppl/cmd/chart.md (1)</summary><blockquote> `10-12`: **Complex syntax block may benefit from visual clarity.** The syntax on line 11 is comprehensive but quite dense, making it challenging to parse at a glance:chart [limit=(top|bottom) ] [useother=] [usenull=] [nullstr=] [otherstr=] <aggregation_function> [ by <row_split> <column_split> ] | [over <row_split> ] [ by <column_split>]
Consider breaking this into multiple lines or adding visual structure to highlight the primary patterns (basic form vs. `by` variant vs. `over/by` variant). This would improve readability for users quickly scanning the syntax. <details> <summary>📖 Suggested syntax block restructuring for clarity</summary>chart [limit=(top|bottom) ] [useother=]
[usenull=] [nullstr=] [otherstr=]
<aggregation_function>
[ by <row_split> [<column_split>] ]
| [over <row_split>] [by <column_split>]This multi-line format makes it clearer that users can either use `by` syntax or `over...by` syntax. </details> </blockquote></details> <details> <summary>docs/user/ppl/cmd/ad.md (1)</summary><blockquote> `27-27`: **Resolve duplicate heading "### Parameters" (MD024).** Both variants have identically named parameter sections at lines 27 and 55. While contextually appropriate (each variant has distinct parameters), markdownlint flags this as MD024 (no-duplicate-heading). Consider restructuring to use hierarchical headings such as `#### Time-Series Parameters` and `#### Non-Time-Series Parameters`, or merge both variants into a single Parameters section with variant-specific subsections. <details> <summary>Example: Using hierarchical headings</summary> ```diff ### Anomaly detection for time-series data ... -### Parameters +#### Time-Series Parameters The fixed-in-time RCF algorithm supports the following parameters. ... ### Anomaly detection for non-time-series data ... -### Parameters +#### Non-Time-Series Parameters The batch RCF algorithm supports the following parameters.Also applies to: 55-55
docs/user/ppl/cmd/explain.md (1)
20-22: Minor style improvement: "able to" → "can".Line 21 uses "able to pushdown" which could be more concise as "can pushdown" per common style guidance.
Proposed edit
-| `<mode>` | Optional | The explain mode. Valid values are: <br> - `standard`: Displays the logical and physical plan along with pushdown information (query domain-specific language [DSL]). Available in both v2 and v3 engines. <br> - `simple`: Displays the logical plan tree without attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). <br> - `cost`: Displays the standard information plus plan cost attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). <br> - `extended`: Displays the standard information plus the generated code. If the whole plan is able to pushdown, it is equal to the standard mode. Requires the v3 engine (`plugins.calcite.enabled` = `true`). <br><br> Default is `standard`. | +| `<mode>` | Optional | The explain mode. Valid values are: <br> - `standard`: Displays the logical and physical plan along with pushdown information (query domain-specific language [DSL]). Available in both v2 and v3 engines. <br> - `simple`: Displays the logical plan tree without attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). <br> - `cost`: Displays the standard information plus plan cost attributes. Requires the v3 engine (`plugins.calcite.enabled` = `true`). <br> - `extended`: Displays the standard information plus the generated code. If the whole plan can pushdown, it is equal to the standard mode. Requires the v3 engine (`plugins.calcite.enabled` = `true`). <br><br> Default is `standard`. |docs/user/ppl/cmd/rare.md (1)
6-8: Fix Markdown blockquote formatting (MD028).Blank lines inside blockquotes must contain the
>character to keep the quote together. Line 7 is a blank line separating two blockquote blocks. Add>to line 7 to maintain blockquote continuity.Proposed fix
> **Note**: The command returns up to 10 results for each distinct combination of values in the group-by fields. - +> > **Note**: The `rare` command is not rewritten to [query domain-specific language (DSL)](https://docs.opensearch.org/latest/query-dsl/). It is only executed on the coordinating node.docs/user/ppl/cmd/search.md (1)
54-56: Minor: Reduce repetition of "Time modifiers" at sentence boundaries.Lines 54–55 contain "Time modifiers filter search results…" immediately followed by "Time modifiers support the following formats." Consider rewording one sentence to avoid the repetition, for example: "Time modifiers filter search results by a time range using the implicit
@timestampfield and support the following formats."scripts/docs_exporter/README.md (2)
3-3: Minor: Capitalize "Markdown" as a proper noun."markdown" should be capitalized to "Markdown" as it refers to the formatting language name.
📝 Suggested fix
-Exports PPL documentation to the OpenSearch documentation website. Auto-injects Jekyll front-matter, converts SQL CLI tables to markdown, fixes relative links, and adds copy buttons. +Exports PPL documentation to the OpenSearch documentation website. Auto-injects Jekyll front-matter, converts SQL CLI tables to Markdown, fixes relative links, and adds copy buttons.
74-75: Minor: Capitalize "Markdown" as a proper noun.Same capitalization issue as above.
📝 Suggested fix
- Injects Jekyll front-matter (title, parent, nav_order, etc.) -- Converts SQL CLI table output to markdown tables +- Converts SQL CLI table output to Markdown tablesdocs/user/ppl/cmd/syntax.md (1)
67-79: Consider using ## headings for examples for consistency.Other command documentation files in this PR (e.g., timechart.md, addcoltotals.md) use
## Example N:headings, while this file uses bold text (**Example N:**). Consider aligning with the heading style for consistency and better navigation.📝 Suggested changes
## Examples -**Example 1: Search through accounts index** +## Example 1: Search through accounts index In the following query, the `search` command refers to an `accounts` index as the source and uses the `fields` and `where` commands for the conditions: -**Example 2: Get all documents** +## Example 2: Get all documents To get all documents from the `accounts` index, specify it as the `source`: -**Example 3: Get documents that match a condition** +## Example 3: Get documents that match a conditiondocs/user/ppl/cmd/top.md (1)
20-25: Vary the sentence structure in the top-options parameter description.The three consecutive parameter definitions in line 23 each begin with "Default is", which reduces readability. Rephrase for better flow.
♻️ Proposed refactor
-| `top-options` | Optional | `showcount`: Whether to create a field in the output that represents a count of the tuple of values. Default is `true`.<br>`countfield`: The name of the field that contains the count. Default is `count`.<br>`usenull`: Whether to output `null` values. Default is the value of `plugins.ppl.syntax.legacy.preferred`. | +| `top-options` | Optional | `showcount`: Whether to create a field in the output that represents a count of the tuple of values. Defaults to `true`.<br>`countfield`: The name of the field that contains the count; defaults to `count`.<br>`usenull`: Whether to output `null` values, with a default of `plugins.ppl.syntax.legacy.preferred`. |docs/user/ppl/cmd/addtotals.md (1)
14-14: Parameter table notation should match syntax block formatting.The syntax block shows
field-list(unquoted), but the parameters table references it as<field-list>(with angle brackets). Similarly, parameters likerow,col,labelfield,label, andfieldnameappear unquoted in the syntax but are backtick-quoted in the table.Standardize the notation across both sections to avoid reader confusion. If angle brackets (
<...>) denote required placeholders and plain names denote options/literals, this distinction should be explicitly clear and consistently applied.Proposed notation alignment
Option 1: Make syntax match table notation:
-addtotals [field-list] [label=<string>] [labelfield=<field>] [row=<boolean>] [col=<boolean>] [fieldname=<field>] +addtotals [<field-list>] [label=<string>] [labelfield=<field>] [row=<boolean>] [col=<boolean>] [fieldname=<field>]Option 2: Make table match syntax notation (if parameters are case-insensitive or literal keywords):
Update table to referencefield-listwithout angle brackets for consistency, or add a legend explaining bracket conventions.Also applies to: 21-28
docs/user/ppl/cmd/describe.md (1)
78-80: Example 3 lacks runnable code and output; reference-only example may feel incomplete.Unlike Examples 1–2, Example 3 provides only a title and a link to external documentation without a query snippet or expected output. This creates an asymmetry in the documentation structure and may leave readers wanting a hands-on illustration.
Consider adding a minimal example query (e.g.,
describe <prometheus-datasource>.metrics) with a sample result, or explicitly document this as a "See also" reference rather than a numbered example.docs/user/ppl/cmd/trendline.md (1)
18-26: Consider escaping pipe characters in code blocks within markdown tables for consistency and to avoid potential parsing issues.The parameter names
[+|-](line 20) and(sma | wma)(line 22) contain pipe characters inside backticks. While standard markdown parsers handle this correctly, escaping the pipes as[+\|-]and(sma \| wma)improves compatibility with tools that parse tables by splitting on pipe delimiters and follows markdown best practices.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (60)
docs/user/ppl/cmd/ad.mddocs/user/ppl/cmd/addcoltotals.mddocs/user/ppl/cmd/addtotals.mddocs/user/ppl/cmd/append.mddocs/user/ppl/cmd/appendcol.mddocs/user/ppl/cmd/appendpipe.mddocs/user/ppl/cmd/bin.mddocs/user/ppl/cmd/chart.mddocs/user/ppl/cmd/dedup.mddocs/user/ppl/cmd/describe.mddocs/user/ppl/cmd/eval.mddocs/user/ppl/cmd/eventstats.mddocs/user/ppl/cmd/expand.mddocs/user/ppl/cmd/explain.mddocs/user/ppl/cmd/fields.mddocs/user/ppl/cmd/fillnull.mddocs/user/ppl/cmd/flatten.mddocs/user/ppl/cmd/grok.mddocs/user/ppl/cmd/head.mddocs/user/ppl/cmd/index.mddocs/user/ppl/cmd/join.mddocs/user/ppl/cmd/kmeans.mddocs/user/ppl/cmd/lookup.mddocs/user/ppl/cmd/ml.mddocs/user/ppl/cmd/multisearch.mddocs/user/ppl/cmd/parse.mddocs/user/ppl/cmd/patterns.mddocs/user/ppl/cmd/rare.mddocs/user/ppl/cmd/regex.mddocs/user/ppl/cmd/rename.mddocs/user/ppl/cmd/replace.mddocs/user/ppl/cmd/reverse.mddocs/user/ppl/cmd/rex.mddocs/user/ppl/cmd/search.mddocs/user/ppl/cmd/showdatasources.mddocs/user/ppl/cmd/sort.mddocs/user/ppl/cmd/spath.mddocs/user/ppl/cmd/stats.mddocs/user/ppl/cmd/streamstats.mddocs/user/ppl/cmd/subquery.mddocs/user/ppl/cmd/syntax.mddocs/user/ppl/cmd/table.mddocs/user/ppl/cmd/timechart.mddocs/user/ppl/cmd/top.mddocs/user/ppl/cmd/trendline.mddocs/user/ppl/cmd/where.mddocs/user/ppl/functions/aggregations.mddocs/user/ppl/functions/collection.mddocs/user/ppl/functions/condition.mddocs/user/ppl/functions/conversion.mddocs/user/ppl/functions/cryptographic.mddocs/user/ppl/functions/datetime.mddocs/user/ppl/functions/ip.mddocs/user/ppl/functions/json.mddocs/user/ppl/functions/math.mddocs/user/ppl/functions/statistical.mddocs/user/ppl/functions/string.mddocs/user/ppl/functions/system.mdscripts/docs_exporter/README.mdscripts/docs_exporter/export_to_docs_website.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Applied to files:
docs/user/ppl/cmd/index.mdscripts/docs_exporter/README.mddocs/user/ppl/cmd/syntax.md
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
docs/user/ppl/cmd/join.md
🪛 LanguageTool
docs/user/ppl/cmd/explain.md
[style] ~21-~21: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...s the generated code. If the whole plan is able to pushdown, it is equal to the standard m...
(BE_ABLE_TO)
scripts/docs_exporter/README.md
[uncategorized] ~3-~3: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ront-matter, converts SQL CLI tables to markdown, fixes relative links, and adds copy bu...
(MARKDOWN_NNP)
[uncategorized] ~75-~75: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...tc.) - Converts SQL CLI table output to markdown tables - Removes empty tables and their...
(MARKDOWN_NNP)
docs/user/ppl/functions/string.md
[style] ~92-~92: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ... There are two wildcards often used in conjunction with the LIKE operator: * % - The percent ...
(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)
[style] ~123-~123: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...e**. There are two wildcards often used in conjunction with the ILIKE operator: * % - The percent...
(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)
[uncategorized] ~527-~527: Do not mix variants of the same word (‘regexp’ and ‘regex’) within a single text.
Context: ...XP_REPLACE ### Description Usage: regexp_replace(str, pattern, replacement) rep...
(EN_WORD_COHERENCY)
docs/user/ppl/cmd/rename.md
[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...is removed from the search results. * *Renaming an existing field to an existing field...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~29-~29: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...arget-field>` | Required | The name you want to rename to. Must contain the same number...
(REP_WANT_TO_VB)
docs/user/ppl/functions/datetime.md
[grammar] ~140-~140: Ensure spelling is correct
Context: ...n the format yyyy-MM-dd HH:mm:ss or the timeszone is not in (+/-)HH:mm. It also is invali...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~140-~140: This phrase is redundant. Consider using “outside”.
Context: ...h and invalid timezones, which are ones outside of -13:59 and +14:00. Argument type: `...
(OUTSIDE_OF)
[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...
(QB_NEW_EN_HYPHEN)
[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...
(QB_NEW_EN_HYPHEN)
[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...
(QB_NEW_EN_HYPHEN)
[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...
(QB_NEW_EN_HYPHEN)
docs/user/ppl/functions/math.md
[style] ~951-~951: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...N is specified, the seed is initialized prior to execution. One implication of this beha...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
docs/user/ppl/cmd/search.md
[style] ~55-~55: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... using the implicit @timestamp field. Time modifiers support the following formats...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[grammar] ~695-~695: Use a hyphen to join words.
Context: ...----+ ``` ### Time rounding Use time rounding expressions to filter events re...
(QB_NEW_EN_HYPHEN)
docs/user/ppl/cmd/top.md
[style] ~23-~23: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...null: Whether to output nullvalues. Default is the value ofplugins.ppl.syntax.leg...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
docs/user/ppl/functions/ip.md
[grammar] ~65-~65: Use a hyphen to join words.
Context: ...options is an optional String of comma separated fields to output: the selectio...
(QB_NEW_EN_HYPHEN)
docs/user/ppl/cmd/patterns.md
[style] ~68-~68: Opting for a less wordy alternative here may improve the clarity of your writing.
Context: ...bled, the Calcite engine's label mode not only labels the text pattern but also assigns numbered placeholders to variable token...
(NOT_ONLY_ALSO)
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/multisearch.md
41-41: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
docs/user/ppl/cmd/ad.md
55-55: Multiple headings with the same content
(MD024, no-duplicate-heading)
docs/user/ppl/cmd/trendline.md
20-20: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
22-22: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
docs/user/ppl/cmd/table.md
21-21: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
docs/user/ppl/functions/datetime.md
164-164: Multiple headings with the same content
(MD024, no-duplicate-heading)
184-184: Multiple headings with the same content
(MD024, no-duplicate-heading)
204-204: Multiple headings with the same content
(MD024, no-duplicate-heading)
224-224: Multiple headings with the same content
(MD024, no-duplicate-heading)
244-244: Multiple headings with the same content
(MD024, no-duplicate-heading)
264-264: Multiple headings with the same content
(MD024, no-duplicate-heading)
284-284: Multiple headings with the same content
(MD024, no-duplicate-heading)
304-304: Multiple headings with the same content
(MD024, no-duplicate-heading)
324-324: Multiple headings with the same content
(MD024, no-duplicate-heading)
344-344: Multiple headings with the same content
(MD024, no-duplicate-heading)
692-692: Multiple headings with the same content
(MD024, no-duplicate-heading)
docs/user/ppl/cmd/lookup.md
51-51: Multiple headings with the same content
(MD024, no-duplicate-heading)
73-73: Multiple headings with the same content
(MD024, no-duplicate-heading)
docs/user/ppl/cmd/rare.md
7-7: Blank line inside blockquote
(MD028, no-blanks-blockquote)
docs/user/ppl/cmd/join.md
218-218: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
219-219: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
220-220: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
221-221: Table column count
Expected: 2; Actual: 5; Too many cells, extra data will be missing
(MD056, table-column-count)
docs/user/ppl/functions/json.md
58-58: Spaces inside code span elements
(MD038, no-space-in-code)
59-59: Spaces inside code span elements
(MD038, no-space-in-code)
docs/user/ppl/cmd/syntax.md
69-69: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
79-79: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
89-89: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
89-89: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
90-90: Bare URL used
(MD034, no-bare-urls)
90-90: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
91-91: Bare URL used
(MD034, no-bare-urls)
91-91: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
92-92: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
93-93: Bare URL used
(MD034, no-bare-urls)
93-93: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
95-95: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
104-104: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
104-104: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
105-105: Bare URL used
(MD034, no-bare-urls)
docs/user/ppl/cmd/ml.md
51-51: Multiple headings with the same content
(MD024, no-duplicate-heading)
73-73: Multiple headings with the same content
(MD024, no-duplicate-heading)
docs/user/ppl/cmd/fields.md
21-21: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
docs/user/ppl/functions/ip.md
65-65: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
| ## Parameters | ||
|
|
||
| The `eval` command supports the following parameters. | ||
|
|
||
| | Parameter | Required/Optional | Description | | ||
| | --- | --- | --- | | ||
| | `<field>` | Required | The name of the field to create or update. If the field does not exist, a new field is added. If it already exists, its value is overwritten. | | ||
| | `<expression>` | Required | The expression to evaluate. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Parameter table formatting appears incomplete.
The parameter table on lines 20–23 has a trailing space issue on line 23 (<expression> row has extra spaces/content after the Description column). While this may render correctly in Markdown, ensure the table structure is clean:
| Parameter | Required/Optional | Description |
| --- | --- | --- |
| `<field>` | Required | The name of the field to create or update. If the field does not exist, a new field is added. If it already exists, its value is overwritten. |
| `<expression>` | Required | The expression to evaluate. | The trailing spaces on line 23 should be removed for consistency.
🧹 Proposed fix for table formatting
| Parameter | Required/Optional | Description |
| --- | --- | --- |
| `<field>` | Required | The name of the field to create or update. If the field does not exist, a new field is added. If it already exists, its value is overwritten. |
-| `<expression>` | Required | The expression to evaluate. |
+| `<expression>` | Required | The expression to evaluate. |🤖 Prompt for AI Agents
In @docs/user/ppl/cmd/eval.md around lines 16 - 23, The markdown table row for
the `<expression>` parameter has trailing whitespace after the final pipe which
can cause inconsistent rendering; edit the table so the `<expression>` row
exactly matches the other rows (remove the extra spaces after the final `|` and
ensure the row is `| `<expression>` | Required | The expression to evaluate.
|`), and verify the header and separator rows (`| Parameter | Required/Optional
| Description |` and `| --- | --- | --- |`) are properly aligned with no extra
trailing spaces.
| ```sql | ||
| # This FAILS - same value for mixed-type fields | ||
| source=accounts | fillnull value=0 firstname, age | ||
| # ERROR: fillnull failed: replacement value type INTEGER is not compatible with field 'firstname' (type: VARCHAR). The replacement value type must match the field type. | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect language marker for PPL syntax in Limitations example.
Line 191 marks the code block as sql, but the example uses PPL syntax (source=accounts |). This should be marked as ppl.
🐛 Proposed fix
- ```sql
+ ```ppl
# This FAILS - same value for mixed-type fields
source=accounts | fillnull value=0 firstname, age
# ERROR: fillnull failed: replacement value type INTEGER is not compatible with field 'firstname' (type: VARCHAR). The replacement value type must match the field type.
- ```
+ ```🤖 Prompt for AI Agents
In @docs/user/ppl/cmd/fillnull.md around lines 191 - 195, Change the code block
language marker for the example that uses PPL syntax (the block starting with
the comment "# This FAILS - same value for mixed-type fields" and the line
"source=accounts | fillnull value=0 firstname, age") from ```sql to ```ppl so
the snippet is correctly highlighted as PPL; leave the block contents and
closing backticks unchanged.
| | Parameter | Required/Optional | Description | | ||
| | --- | --- | --- | | ||
| | `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. | | ||
| | `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. | | ||
| | `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | | ||
| | `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. | | ||
| | `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. | | ||
| | `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). | | ||
| | `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that the max parameter has no effect for semi and anti join types.
The documentation should note that while max is listed as an optional parameter applying to all join types, it has no effect for semi and anti join types. These types only use the right dataset to determine record existence and filter accordingly, regardless of the max value.
Proposed documentation update
| `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). |
+ | `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). **Note:** This parameter has no effect for `semi` and `anti` join types, which only filter records based on existence of matches in the right dataset. |Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @docs/user/ppl/cmd/join.md around lines 72 - 80, Update the description for
the `max` parameter to explicitly state that it has no effect for `semi` and
`anti` join types; modify the `<max>` table row (the "max | Optional | ..."
entry) to add a sentence like "Ignored for `semi` and `anti` joins, which only
test existence and do not honor result limits." so readers know `max` applies to
other join types but is ignored for `semi`/`anti`.
| - `dataSourceName` must be an established dataSource on OpenSearch GeoSpatial plugin, detail of configuration can be found: https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/ | ||
| - `ip` can be an IPv4 or an IPv6 address | ||
| - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" | ||
| - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix markdown indentation and hyphenation.
Line 65 has two issues:
- Markdown indentation: Remove the leading space before the
-to align with the list items above (lines 33–34). - Grammar: Change
comma separatedtocomma-separated(hyphenate the compound adjective).
✏️ Proposed fix
- - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location"
+ - `options` is an optional String of comma-separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" 📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - `options` is an optional String of comma separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" | |
| - `options` is an optional String of comma-separated fields to output: the selection of fields is subject to dataSourceProvider's schema. For example, the list of fields in the provided `geolite2-city` dataset includes: "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" |
🧰 Tools
🪛 LanguageTool
[grammar] ~65-~65: Use a hyphen to join words.
Context: ...options is an optional String of comma separated fields to output: the selectio...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
65-65: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
🤖 Prompt for AI Agents
In @docs/user/ppl/functions/ip.md at line 65, The list item describing `options`
has a leading space before the hyphen and uses "comma separated" without a
hyphen; remove the extra leading space so the `-` aligns with the other list
items and change "comma separated" to "comma-separated" in the sentence that
starts with `- ` and mentions `options` and the example fields
("country_iso_code", "country_name", ...).
| **Argument type:** `STRING ` | ||
| **Return type:** `BOOLEAN ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces inside code spans (MD038).
Lines 58–59 contain extra spaces within the inline code elements. Trim these to comply with markdown linting standards.
🐛 Proposed fix
- **Argument type:** `STRING `
- **Return type:** `BOOLEAN `
+ **Argument type:** `STRING`
+ **Return type:** `BOOLEAN`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Argument type:** `STRING ` | |
| **Return type:** `BOOLEAN ` | |
| **Argument type:** `STRING` | |
| **Return type:** `BOOLEAN` |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
58-58: Spaces inside code span elements
(MD038, no-space-in-code)
59-59: Spaces inside code span elements
(MD038, no-space-in-code)
🤖 Prompt for AI Agents
In @docs/user/ppl/functions/json.md around lines 58 - 59, Remove the trailing
spaces inside the inline code spans for the type annotations—change the
backtick-wrapped tokens shown as `STRING ` and `BOOLEAN ` to `STRING` and
`BOOLEAN` respectively (i.e., trim the extra spaces inside the code spans) so
the markdown no longer contains spaces inside the inline code elements.
| # Filter to only specified directories if provided | ||
| if only_dirs: | ||
| md_files = [ | ||
| f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs | ||
| ] | ||
|
|
||
| # Filter to only specified directories if provided | ||
| if only_dirs: | ||
| md_files = [ | ||
| f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate filtering block should be removed.
The only_dirs filtering logic is duplicated. Lines 322-325 and lines 328-331 perform the exact same filtering operation.
🔧 Suggested fix
# Filter to only specified directories if provided
if only_dirs:
md_files = [
f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs
]
- # Filter to only specified directories if provided
- if only_dirs:
- md_files = [
- f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs
- ]
-
# Group files by directory for local nav_order📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Filter to only specified directories if provided | |
| if only_dirs: | |
| md_files = [ | |
| f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs | |
| ] | |
| # Filter to only specified directories if provided | |
| if only_dirs: | |
| md_files = [ | |
| f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs | |
| ] | |
| # Filter to only specified directories if provided | |
| if only_dirs: | |
| md_files = [ | |
| f for f in md_files if f.relative_to(source_dir).parts[0] in only_dirs | |
| ] | |
🤖 Prompt for AI Agents
In @scripts/docs_exporter/export_to_docs_website.py around lines 321 - 331,
Remove the duplicated only_dirs filtering block: there are two identical if
only_dirs: md_files = [f for f in md_files if f.relative_to(source_dir).parts[0]
in only_dirs] sections; keep a single instance and delete the redundant copy so
md_files is filtered only once using only_dirs, referencing the variables
md_files, only_dirs, and source_dir in the existing function.
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/join.md:
- Around line 72-81: Update the `type` parameter's valid-values list in the
table so it explicitly includes `inner` (e.g., change the existing "left, outer
(same as left), semi, anti, and performance-sensitive types (right, full, and
cross)" string to also list `inner`), ensuring the documented valid values match
the stated default; edit the row for the `type` parameter in the join table (the
`type` parameter entry) to add `inner` to the comma-separated valid values.
- Around line 41-47: The documentation omits "inner" from the joinType
valid-values list while stating the default is inner; update the joinType row to
explicitly include "inner" among valid values (e.g., `inner`, `left`, `semi`,
`anti`, and performance-sensitive types `right`, `full`, `cross`) or
alternatively change the description to clarify that "inner" is the implicit
default behavior—modify the `joinType` entry in the table to reflect this and
ensure the default remains stated as `inner`.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/user/ppl/cmd/join.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
docs/user/ppl/cmd/join.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
docs/user/ppl/cmd/join.md (3)
218-221: Pipe escaping in table cells is correct.The unescaped pipes that triggered markdownlint MD056 errors in the prior review have been properly escaped with
\|in the table rows, aligning with repository documentation standards. This fix resolves the column count parsing issues.
84-97: Configuration section is well-structured.The new Configuration section clearly documents the
plugins.ppl.join.subsearch_maxoutsetting with appropriate defaults and includes a concrete example for updating it. This aligns with the documentation standards established across the PR.
1-207: Documentation structure aligns well with PR standards.The file has been thoughtfully restructured with clear Syntax sections (Basic and Extended), dedicated parameter tables, a Configuration section, consistently-captioned examples, and a structured Limitations section. This standardized format is consistent with other PPL command documentation updates in this PR.
| | Parameter | Required/Optional | Description | | ||
| | --- | --- | --- | | ||
| | `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. | | ||
| | `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. | | ||
| | `joinType` | Optional | The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | | ||
| | `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. | | ||
| | `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add inner to the list of valid join types.
Line 45 states the default is inner but the valid values list omits it. Either inner should be explicitly listed as a valid value, or the default should be clarified if inner is derived behavior.
- | `joinType` | Optional | The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
+ | `joinType` | Optional | The type of join to perform. Valid values are `inner`, `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |🤖 Prompt for AI Agents
In @docs/user/ppl/cmd/join.md around lines 41 - 47, The documentation omits
"inner" from the joinType valid-values list while stating the default is inner;
update the joinType row to explicitly include "inner" among valid values (e.g.,
`inner`, `left`, `semi`, `anti`, and performance-sensitive types `right`,
`full`, `cross`) or alternatively change the description to clarify that "inner"
is the implicit default behavior—modify the `joinType` entry in the table to
reflect this and ensure the default remains stated as `inner`.
| | Parameter | Required/Optional | Description | | ||
| | --- | --- | --- | | ||
| | `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. | | ||
| | `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. | | ||
| | `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | | ||
| | `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. | | ||
| | `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. | | ||
| | `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). | | ||
| | `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. | | ||
| | `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add inner to the list of valid join types in extended syntax.
Line 76 states the default is inner but the valid values list omits it. For consistency with the basic syntax parameters, inner should be explicitly listed.
- | `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
+ | `type` | Optional | The join type when using extended syntax. Valid values are `inner`, `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Parameter | Required/Optional | Description | | |
| | --- | --- | --- | | |
| | `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. | | |
| | `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. | | |
| | `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | | |
| | `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. | | |
| | `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. | | |
| | `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). | | |
| | `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. | | |
| | `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. | | |
| | Parameter | Required/Optional | Description | | |
| | --- | --- | --- | | |
| | `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. | | |
| | `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. | | |
| | `type` | Optional | The join type when using extended syntax. Valid values are `inner`, `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. | | |
| | `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. | | |
| | `overwrite` | Optional | Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`. | | |
| | `max` | Optional | The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited). | | |
| | `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. | | |
| | `right` | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`. | |
🤖 Prompt for AI Agents
In @docs/user/ppl/cmd/join.md around lines 72 - 81, Update the `type`
parameter's valid-values list in the table so it explicitly includes `inner`
(e.g., change the existing "left, outer (same as left), semi, anti, and
performance-sensitive types (right, full, and cross)" string to also list
`inner`), ensuring the documented valid values match the stated default; edit
the row for the `type` parameter in the join table (the `type` parameter entry)
to add `inner` to the comma-separated valid values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @docs/user/ppl/functions/datetime.md:
- Line 140: The sentence describing convert_tz has redundant wording: change
"which are ones outside of -13:59 and +14:00" to the simplified phrase "which
are outside -13:59 and +14:00" in the documentation entry for convert_tz (the
Usage line referencing convert_tz(timestamp, from_timezone, to_timezone)) so the
sentence reads clearly and removes the redundant "ones" and "of".
- Line 1135: Replace the phrase "time of day value" with the hyphenated compound
"time-of-day value" in the sentence describing hour(time) so the compound
adjective is clear; update the sentence that reads "Different from the time of
day value, the time value has a large range..." to use "Different from the
time-of-day value..." while keeping the rest of the text intact and preserving
the reference to hour(time).
- Line 1162: Update the HOUR_OF_DAY description to hyphenate "time-of-day"
consistently: change the sentence "Different from the time of day value, the
time value has a large range..." to use "time-of-day" (e.g., "Different from the
time-of-day value, the time value has a large range...") and ensure the function
name hour_of_day(time) / HOUR_OF_DAY remains unchanged.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/user/ppl/cmd/addcoltotals.mddocs/user/ppl/functions/condition.mddocs/user/ppl/functions/datetime.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/user/ppl/cmd/addcoltotals.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
🪛 LanguageTool
docs/user/ppl/functions/datetime.md
[style] ~140-~140: This phrase is redundant. Consider using “outside”.
Context: ...h and invalid timezones, which are ones outside of -13:59 and +14:00. Argument type: `...
(OUTSIDE_OF)
[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...
(QB_NEW_EN_HYPHEN)
[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...
(QB_NEW_EN_HYPHEN)
[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...
(QB_NEW_EN_HYPHEN)
[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/functions/condition.md
218-218: Multiple headings with the same content
(MD024, no-duplicate-heading)
docs/user/ppl/functions/datetime.md
164-164: Multiple headings with the same content
(MD024, no-duplicate-heading)
184-184: Multiple headings with the same content
(MD024, no-duplicate-heading)
204-204: Multiple headings with the same content
(MD024, no-duplicate-heading)
224-224: Multiple headings with the same content
(MD024, no-duplicate-heading)
244-244: Multiple headings with the same content
(MD024, no-duplicate-heading)
264-264: Multiple headings with the same content
(MD024, no-duplicate-heading)
284-284: Multiple headings with the same content
(MD024, no-duplicate-heading)
304-304: Multiple headings with the same content
(MD024, no-duplicate-heading)
324-324: Multiple headings with the same content
(MD024, no-duplicate-heading)
344-344: Multiple headings with the same content
(MD024, no-duplicate-heading)
692-692: Multiple headings with the same content
(MD024, no-duplicate-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (2)
docs/user/ppl/functions/condition.md (2)
214-238: Excellent documentation structure and content quality.The file demonstrates consistent, well-structured documentation across all condition functions. Usage syntax is properly code-formatted, argument/return types are clearly labeled, and examples include expected output tables for clarity. The IFNULL nested pattern section (lines 214–238) is particularly well-documented with contextual notes about OpenSearch version compatibility.
The backtick placement issue at line 217 from the previous review has been correctly resolved.
1-829: Address duplicate heading linting warning.The static analysis tool flagged line 218's
### Exampleheading as a duplicate of earlier heading content in the file (MD024). While repeated### Exampleheadings are standard in technical documentation and functionally harmless, this warning suggests the project's linter is configured to flag them. Consider verifying with project contributors whether this is a style preference to address or an expected linting exception for documentation files.
| Usage: convert_tz(timestamp, from_timezone, to_timezone) constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timeszone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00. | ||
| Argument type: TIMESTAMP/STRING, STRING, STRING | ||
| Return type: TIMESTAMP | ||
| Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete the previously requested grammar fix for line 140.
The past review requested simplifying the redundant phrasing in this sentence. While the spelling error ("timeszone" → "timezone") was correctly fixed, the redundant phrase "which are ones outside of" should be simplified to "which are outside" as originally suggested.
Proposed fix
- CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00.
+ CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are outside -13:59 and +14:00.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are ones outside of -13:59 and +14:00. | |
| Usage: `convert_tz(timestamp, from_timezone, to_timezone)` constructs a local timestamp converted from the from_timezone to the to_timezone. CONVERT_TZ returns null when any of the three function arguments are invalid, i.e. timestamp is not in the format yyyy-MM-dd HH:mm:ss or the timezone is not in (+/-)HH:mm. It also is invalid for invalid dates, such as February 30th and invalid timezones, which are outside -13:59 and +14:00. |
🧰 Tools
🪛 LanguageTool
[style] ~140-~140: This phrase is redundant. Consider using “outside”.
Context: ...h and invalid timezones, which are ones outside of -13:59 and +14:00. Argument type: `...
(OUTSIDE_OF)
🤖 Prompt for AI Agents
In @docs/user/ppl/functions/datetime.md at line 140, The sentence describing
convert_tz has redundant wording: change "which are ones outside of -13:59 and
+14:00" to the simplified phrase "which are outside -13:59 and +14:00" in the
documentation entry for convert_tz (the Usage line referencing
convert_tz(timestamp, from_timezone, to_timezone)) so the sentence reads clearly
and removes the redundant "ones" and "of".
| Usage: hour(time) extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. | ||
| Argument type: STRING/TIME/TIMESTAMP | ||
| Return type: INTEGER | ||
| Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use hyphens to join compound adjectives.
The phrase "time of day value" should use hyphens to clarify that "time-of-day" is a single compound modifier: "the time value has a large range... so the return value of hour(time) can be also greater than 23."
Proposed fix
- Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.
+ Usage: `hour(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Usage: `hour(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. | |
| Usage: `hour(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour(time) can be also greater than 23. |
🧰 Tools
🪛 LanguageTool
[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...
(QB_NEW_EN_HYPHEN)
[grammar] ~1135-~1135: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
In @docs/user/ppl/functions/datetime.md at line 1135, Replace the phrase "time
of day value" with the hyphenated compound "time-of-day value" in the sentence
describing hour(time) so the compound adjective is clear; update the sentence
that reads "Different from the time of day value, the time value has a large
range..." to use "Different from the time-of-day value..." while keeping the
rest of the text intact and preserving the reference to hour(time).
| Usage: hour_of_day(time) extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. | ||
| Argument type: STRING/TIME/TIMESTAMP | ||
| Return type: INTEGER | ||
| Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply the same hyphenation fix to the HOUR_OF_DAY description.
For consistency and grammar correctness, apply the same hyphenation to "time-of-day" here as well.
Proposed fix
- Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.
+ Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time of day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. | |
| Usage: `hour_of_day(time)` extracts the hour value for time. Different from the time-of-day value, the time value has a large range and can be greater than 23, so the return value of hour_of_day(time) can be also greater than 23. |
🧰 Tools
🪛 LanguageTool
[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ... value for time. Different from the time of day value, the time value has a large...
(QB_NEW_EN_HYPHEN)
[grammar] ~1162-~1162: Use a hyphen to join words.
Context: ...lue for time. Different from the time of day value, the time value has a large ra...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
In @docs/user/ppl/functions/datetime.md at line 1162, Update the HOUR_OF_DAY
description to hyphenate "time-of-day" consistently: change the sentence
"Different from the time of day value, the time value has a large range..." to
use "time-of-day" (e.g., "Different from the time-of-day value, the time value
has a large range...") and ensure the function name hour_of_day(time) /
HOUR_OF_DAY remains unchanged.
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/user/ppl/cmd/search.md (2)
53-56: Reduce repetitive language in the time modifiers introduction.Lines 55-56 both open with "Time modifiers," creating redundancy. Consolidate or rephrase the second sentence to avoid repetition.
📝 Suggested fix
### Time modifiers -Time modifiers filter search results by a time range using the implicit `@timestamp` field. Time modifiers support the following formats. +Time modifiers filter search results by a time range using the implicit `@timestamp` field. The following formats are supported:
693-703: Add hyphen to compound modifier "Time-rounding".The heading at line 695 should use a hyphenated form for the compound adjective.
📝 Suggested fix
-### Time rounding +### Time-rounding
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/user/ppl/cmd/rex.mddocs/user/ppl/cmd/search.mddocs/user/ppl/cmd/stats.mddocs/user/ppl/cmd/timechart.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
🪛 LanguageTool
docs/user/ppl/cmd/search.md
[style] ~55-~55: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... using the implicit @timestamp field. Time modifiers support the following formats...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[grammar] ~695-~695: Use a hyphen to join words.
Context: ...----+ ``` ### Time rounding Use time rounding expressions to filter events re...
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
docs/user/ppl/cmd/stats.md (1)
6-8: Cross-reference link is correct. The section "Comparing stats, eventstats, and streamstats" exists in streamstats.md at the exact anchor path referenced in the link, so the internal documentation link will resolve properly.docs/user/ppl/cmd/search.md (1)
520-529: Remove this comment—the table alignment is correct.The table formatting in lines 520-529 shows properly aligned query output. All columns maintain consistent column widths and separator positions across the header, separator, and data rows (11 dashes / 5 dashes / 8 dashes in each segment). The content aligns correctly within these boundaries.
docs/user/ppl/cmd/rex.md (1)
1-298: Documentation is well-structured and pedagogically sound.The reformat aligns well with the PR objectives. The document follows a clear progression from basic concepts through increasingly complex examples, with explicit coverage of edge cases and error scenarios. The consistent formatting across all nine examples (descriptive headings, explanatory text, PPL queries, and output) enhances readability. Error documentation in Examples 8 and 9 is particularly strong—showing both invalid queries and the exact error messages users will encounter provides immediate, actionable guidance.
The relative path reference to
../../admin/settings.rstcorrectly resolves todocs/user/admin/settings.rst, and the command is properly linked in the PPL index with appropriate description.
Description
When the docs exporter script is applied now, the differences from the docs website are: links to sections not yet ported to documentation-website are commented out in the docs website (functions, admin), whitespace differences, different nav order for some command files, and some differences in escape characters for angle brackets and asterisks. Everything except the commented out admin links will be addressed in the next documentation-website PR (opensearch-project/documentation-website#11747).
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.