Skip to content

Conversation

@sarvekshayr
Copy link
Contributor

What changes were proposed in this pull request?

HDDS-11120 (PR #6911) introduced a rich rebalancing status info.

This change causes a compatibility issue when an older server (without this change) is used alongside a newer client (with this change) because protobuf enum value (Type.GetContainerBalancerStatusInfo = 44) does not exist in the older server and fails with Message missing required fields: cmdType error.

ozone admin containerbalancer status 

INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(com.google.protobuf.InvalidProtocolBufferException): Message missing required fields: cmdType
at com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81)
at com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:71)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at org.apache.hadoop.ipc.RpcWritable$ProtobufWrapper.readFrom(RpcWritable.java:125)
at org.apache.hadoop.ipc.RpcWritable$Buffer.getValue(RpcWritable.java:187)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2905)
, while invoking $Proxy18.submitRequest over nodeId=node1,nodeAddress=ccycloud-1.quasar-yvyafo.root.comops.site/10.140.178.0:9860 after 3 failover attempts. Trying to failover after sleeping for 2000ms.

What is the link to the Apache JIRA

HDDS-14335

How was this patch tested?

acceptance (compat-old) CI: https://github.com/sarvekshayr/ozone/actions/runs/20707622585/job/59441789415

…ing required fields: cmdType error with forward client

HDDS-14335. Container Balancer status command fails with Message missing required fields: cmdType error with forward client

HDDS-14335. Compatibility

1.1.0 command does not exist

fix read.robot

fix read.robot - 1

test-fix

fix
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr for the patch.

Comment on lines +62 to 65
if (isRunning && balancerStatusInfo != null) {
Instant startedAtInstant = Instant.ofEpochSecond(balancerStatusInfo.getStartedAt());
LocalDateTime dateTime =
LocalDateTime.ofInstant(startedAtInstant, ZoneId.systemDefault());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's move these two statements inside isVerbose() (since values are only used there), and add balancerStatusInfo != null check in that if. Then we don't need a new outer else if and duplicate println.

// HDDS-11120 - Added a rich rebalancing status info
// Backward compatibility fix - newer clients (2.0 >=) gracefully fallback to the old
// API when connecting to older servers (< 2.0) that don't support the new enum value.
if (e.getMessage() != null && e.getMessage().contains("missing required fields")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check for the underlying InvalidProtocolBufferException instead? I'm not sure the text message will always be the same.

Comment on lines +1053 to +1056
ContainerBalancerStatusInfoResponseProto response =
submitRequest(Type.GetContainerBalancerStatusInfo,
builder -> builder.setContainerBalancerStatusInfoRequest(request))
.getContainerBalancerStatusInfoResponse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please reduce indent of "line continuation" from 8 to 4 while already touching these lines.

@adoroszlai adoroszlai changed the title HDDS-14335. Container Balancer status command fails with Message missing required fields: cmdType error with forward client HDDS-14335. Container Balancer status fails due to InvalidProtocolBufferException with old SCM Jan 5, 2026
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarvekshayr This is not required to be fixed, as support compatibility from lower version to higher version. But not support higher version client to lower version server.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the earlier PR that introduced rich status made some wrong decisions. My suggestion would be to just use the original ContainerBalancerStatusRequestProto and ContainerBalancerStatusResponseProto when verbose mode or the history option is not enabled. Otherwise use the new ContainerBalancerStatusInfoRequestProto and its response proto.

We can change the logic in ContainerBalancerStatusSubcommand to do this. Will be simpler than calling the new unsupported method and then having to handle the exception.

@sarvekshayr
Copy link
Contributor Author

I think the earlier PR that introduced rich status made some wrong decisions. My suggestion would be to just use the original ContainerBalancerStatusRequestProto and ContainerBalancerStatusResponseProto when verbose mode or the history option is not enabled. Otherwise use the new ContainerBalancerStatusInfoRequestProto and its response proto.

We can change the logic in ContainerBalancerStatusSubcommand to do this. Will be simpler than calling the new unsupported method and then having to handle the exception.

I'll file a separate JIRA for this.

@sarvekshayr sarvekshayr closed this Jan 5, 2026
@adoroszlai
Copy link
Contributor

adoroszlai commented Jan 5, 2026

This is not required to be fixed, as support compatibility from lower version to higher version. But not support higher version client to lower version server.

Why do you think so? We have cross-compatibility tests in both ways.

# old cluster with clients: same version and current version
for cluster_version in ${old_versions}; do
export OZONE_VERSION=${cluster_version}
export COMPOSE_FILE=old-cluster.yaml:clients.yaml
test_cross_compatibility ${cluster_version} ${current_version}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants