Skip to content

Conversation

@rjrudin
Copy link
Contributor

@rjrudin rjrudin commented Jan 23, 2026

Despite the large number of files modified, most of the modifications to each file are small. And a number of files were moved as part of getting rid of the "java17-tests" subproject, as those tests were just moved into the existing "flux-cli" subproject.

rjrudin and others added 30 commits August 7, 2025 11:27
This currently has to be manually tested via the client-project example, where we're able to ensure that the pom file for flux-api does not bring in later versions of Jackson that cause Spark to break.
All connection params can now be set via system properties.  Not documenting this yet, as the main use case is for an internal project.
…em-props

MLE-23506 Exposing connection params as system properties
MLE-22594 Fixing Polaris issue in docker-compose
This is for ragplus, so that a custom Spark session - specifically with a Spark listener - can be used to execute the command.

This also ended up being a better approach for our "options tests" that just need to parse options but not execute the command. So about a dozen tests were modified to use this approach. They also needed a connection string added since the command is being validated now too.
MLE-23618 Added new Main method to return a Command object
Removed OBE config for code coverage, verified it's not needed.
MLE-23618 Bumping Gradle wrapper to latest version
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Adding .json and .xml as ignores
Required for internal "box" app. No tests, as the main use case for this is to disable the Spark UI, which is only really verifiable via log messages.
MLE-23884 Allowing for SparkConf to be provided
This appears to be the better way to apply Spark config params, such that we likely don't need both -B and -C any more. I opened a ticket to figure that one out.
MLE-23884 Better approach for configuring SparkConf
Lot of little changes, but overhaul, a simpler set of dependencies as we don't have to jump through as many hoops to avoid transitive vulnerabilities.

This will likely have a test failure due to the deprecated fileRows.documentType option, as the pipeline for that PR is stuck right now. Will ignore that test failure for now.
Should make Black Duck happy.
We don't need the Java 11 / 17 split anymore.

Also fixed a handful of Sonar warnings. No functional changes though.
Release notes indicate this should be safe, and Copilot was in favor of it as well as a way of minimizing CVEs.

Also fixed a typo in the SSL class.
Bumps [rexml](https://github.com/ruby/rexml) from 3.3.9 to 3.4.2.
- [Release notes](https://github.com/ruby/rexml/releases)
- [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md)
- [Commits](ruby/rexml@v3.3.9...v3.4.2)

---
updated-dependencies:
- dependency-name: rexml
  dependency-version: 3.4.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Spark 4.0.1 wants 2.18.2, verified all is fine via the example client project.

Also bumped a couple more transitive dependencies to avoid Black Duck warnings.
Eliminating some Black Duck issues
Tweaked Jenkinsfile as well, should only need one mlWaitTillReady call
This also removes a feature from 1.2.0 that was misguided, where the Spark master URL was overridden based on partition count. The master URL defines the number of cores available for processing partitions in parallel. Setting it to the number of partitions wasn't going to be harmful - it was still limited by how many cores are available - but it's misleading to the user. That feature and the associated docs / tests have been removed as well.

Generally, `local[*]` is going to be ideal for all use cases, where Spark uses all available cores. An advanced user does not need docs on how to change that because they'll have a keen understanding of the Spark master URL already.
Also using the correct Jira ID for the previous PR so that I can close that ticket out.
Also bumped up versions for the shadow jar plugin, couple other bits of tidying up
Black Duck is being annoying and cannot figure out what version of the connector is being used, so trying this as a hack.
This reverts commit 12a6ee7. That commit didn't fix the Black Duck problems.
Also removed two public methods in the Main class that were intended to be used in flux-box, but aren't needed anymore.
Removing CopyrightTest too, it's redundant now with the copyright checker.
This isn't 100% reliable, as I don't know that we'll always receive an SdkClientException. But it will help in scenarios where that exception is tucked in a ConnectException.
Applying this to develop branch as the Tika version is slighty different.
Also verified that the modification to Jackson dependencies is no longer needed for ml-gradle to work, as it's using Jackson 2.20.x as well.
Copilot AI review requested due to automatic review settings January 23, 2026 16:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request merges release/2.0.0 into main, upgrading Flux from version 1.4.0 to 2.0.0. The major changes include upgrading to Spark 4.1.1, replacing single-letter command options with more descriptive long-form options, and various API improvements.

Changes:

  • Upgraded Apache Spark from 3.5.6 to 4.1.1 and Gradle from 8.11 to 8.14.3
  • Replaced single-letter options (e.g., -P, -C, -M, -R, -X, -E, -L, -S) with long-form alternatives (--spark-prop, --spark-conf, --doc-metadata, etc.)
  • Updated default behavior for chunk storage in splitting functionality (now defaults to sidecar documents)
  • Added support for AWS profile-based authentication and additional S3 configuration options
  • Removed Java 17-specific test modules and consolidated test infrastructure

Reviewed changes

Copilot reviewed 198 out of 207 changed files in this pull request and generated no comments.

Show a summary per file
File Description
gradle.properties Updated version to 2.0.0 and upgraded dependency versions including Spark, Hadoop, and langchain4j
gradle/wrapper/gradle-wrapper.properties Upgraded Gradle from 8.11 to 8.14.3
settings.gradle Removed flux-tests-api and flux-java17-tests modules
gradlew, gradlew.bat Updated wrapper scripts to use jar execution method
test-app/build.gradle Updated ml-gradle plugin and added buildscript configuration
flux-cli/src/main/java/com/marklogic/flux/impl/* Replaced single-letter options with descriptive alternatives and refactored Spark configuration handling
flux-cli/src/main/java/com/marklogic/flux/api/* Added new API methods for S3 region/profile support and URI template warnings
flux-embedding-model-*/build.gradle Updated shadow plugin version and dependency versions
docs/** Updated documentation to reflect new option names and Spark 4.1.1 changes
flux-cli/src/test/java/** Updated tests to use new option names and removed Java 17-specific tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rjrudin rjrudin merged commit 284cddf into main Jan 23, 2026
4 checks passed
@rjrudin rjrudin deleted the release/2.0.0 branch January 23, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants