-
Notifications
You must be signed in to change notification settings - Fork 3
MLE-26538 Merge release/2.0.0 into main #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MLE-12345 Merge master into dev
This currently has to be manually tested via the client-project example, where we're able to ensure that the pom file for flux-api does not bring in later versions of Jackson that cause Spark to break.
MLE-23420 Fixing classpath issue
All connection params can now be set via system properties. Not documenting this yet, as the main use case is for an internal project.
…em-props MLE-23506 Exposing connection params as system properties
MLE-22594 Fixing Polaris issue in docker-compose
This is for ragplus, so that a custom Spark session - specifically with a Spark listener - can be used to execute the command. This also ended up being a better approach for our "options tests" that just need to parse options but not execute the command. So about a dozen tests were modified to use this approach. They also needed a connection string added since the command is being validated now too.
MLE-23618 Added new Main method to return a Command object
Removed OBE config for code coverage, verified it's not needed.
MLE-23618 Bumping Gradle wrapper to latest version
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
PDP-536: Adding copyright check
Adding .json and .xml as ignores
PDP-536 Testing out copyright check
Required for internal "box" app. No tests, as the main use case for this is to disable the Spark UI, which is only really verifiable via log messages.
MLE-23884 Allowing for SparkConf to be provided
This appears to be the better way to apply Spark config params, such that we likely don't need both -B and -C any more. I opened a ticket to figure that one out.
MLE-23884 Better approach for configuring SparkConf
Lot of little changes, but overhaul, a simpler set of dependencies as we don't have to jump through as many hoops to avoid transitive vulnerabilities. This will likely have a test failure due to the deprecated fileRows.documentType option, as the pipeline for that PR is stuck right now. Will ignore that test failure for now.
Should make Black Duck happy.
We don't need the Java 11 / 17 split anymore. Also fixed a handful of Sonar warnings. No functional changes though.
Release notes indicate this should be safe, and Copilot was in favor of it as well as a way of minimizing CVEs. Also fixed a typo in the SSL class.
Bumps [rexml](https://github.com/ruby/rexml) from 3.3.9 to 3.4.2. - [Release notes](https://github.com/ruby/rexml/releases) - [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md) - [Commits](ruby/rexml@v3.3.9...v3.4.2) --- updated-dependencies: - dependency-name: rexml dependency-version: 3.4.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Spark 4.0.1 wants 2.18.2, verified all is fine via the example client project. Also bumped a couple more transitive dependencies to avoid Black Duck warnings.
Eliminating some Black Duck issues
Tweaked Jenkinsfile as well, should only need one mlWaitTillReady call
This also removes a feature from 1.2.0 that was misguided, where the Spark master URL was overridden based on partition count. The master URL defines the number of cores available for processing partitions in parallel. Setting it to the number of partitions wasn't going to be harmful - it was still limited by how many cores are available - but it's misleading to the user. That feature and the associated docs / tests have been removed as well. Generally, `local[*]` is going to be ideal for all use cases, where Spark uses all available cores. An advanced user does not need docs on how to change that because they'll have a keen understanding of the Spark master URL already.
Also using the correct Jira ID for the previous PR so that I can close that ticket out.
Relying on manual testing for this.
Also bumped up versions for the shadow jar plugin, couple other bits of tidying up
Black Duck is being annoying and cannot figure out what version of the connector is being used, so trying this as a hack.
This reverts commit 12a6ee7. That commit didn't fix the Black Duck problems.
Also removed two public methods in the Main class that were intended to be used in flux-box, but aren't needed anymore.
Removing CopyrightTest too, it's redundant now with the copyright checker.
This isn't 100% reliable, as I don't know that we'll always receive an SdkClientException. But it will help in scenarios where that exception is tucked in a ConnectException.
Applying this to develop branch as the Tika version is slighty different.
Also verified that the modification to Jackson dependencies is no longer needed for ml-gradle to work, as it's using Jackson 2.20.x as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request merges release/2.0.0 into main, upgrading Flux from version 1.4.0 to 2.0.0. The major changes include upgrading to Spark 4.1.1, replacing single-letter command options with more descriptive long-form options, and various API improvements.
Changes:
- Upgraded Apache Spark from 3.5.6 to 4.1.1 and Gradle from 8.11 to 8.14.3
- Replaced single-letter options (e.g.,
-P,-C,-M,-R,-X,-E,-L,-S) with long-form alternatives (--spark-prop,--spark-conf,--doc-metadata, etc.) - Updated default behavior for chunk storage in splitting functionality (now defaults to sidecar documents)
- Added support for AWS profile-based authentication and additional S3 configuration options
- Removed Java 17-specific test modules and consolidated test infrastructure
Reviewed changes
Copilot reviewed 198 out of 207 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| gradle.properties | Updated version to 2.0.0 and upgraded dependency versions including Spark, Hadoop, and langchain4j |
| gradle/wrapper/gradle-wrapper.properties | Upgraded Gradle from 8.11 to 8.14.3 |
| settings.gradle | Removed flux-tests-api and flux-java17-tests modules |
| gradlew, gradlew.bat | Updated wrapper scripts to use jar execution method |
| test-app/build.gradle | Updated ml-gradle plugin and added buildscript configuration |
| flux-cli/src/main/java/com/marklogic/flux/impl/* | Replaced single-letter options with descriptive alternatives and refactored Spark configuration handling |
| flux-cli/src/main/java/com/marklogic/flux/api/* | Added new API methods for S3 region/profile support and URI template warnings |
| flux-embedding-model-*/build.gradle | Updated shadow plugin version and dependency versions |
| docs/** | Updated documentation to reflect new option names and Spark 4.1.1 changes |
| flux-cli/src/test/java/** | Updated tests to use new option names and removed Java 17-specific tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Despite the large number of files modified, most of the modifications to each file are small. And a number of files were moved as part of getting rid of the "java17-tests" subproject, as those tests were just moved into the existing "flux-cli" subproject.