Skip to content

Conversation

@bufdev
Copy link
Member

@bufdev bufdev commented Nov 24, 2025

This would basically both deprecate usage of google.type.Decimal and actually enforce valid values with Protovalidate.

@bufdev bufdev changed the title Add decimal to StringRules WIP: Add decimal to StringRules Nov 24, 2025
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedNov 26, 2025, 7:59 PM

@hudlow
Copy link

hudlow commented Nov 24, 2025

@bufdev Is it safe to assume that the goal is to align these rules to what could be represented in a SQL DECIMAL/NUMERIC column?

If so (and based on the desire to possibly extend to the full range of google.type.Decimal representations), I would expect that these validations would need to happen on a normalized representation such that 00001.10000, 0.0011e3, 1100e-3 would all validate as precision = 2, scale = 1 — is that right?

@bufdev
Copy link
Member Author

bufdev commented Nov 24, 2025

Is it safe to assume that the goal is to align these rules to what could be represented in a SQL DECIMAL/NUMERIC column?

Yes

If so (and based on the desire to possibly extend to the full range of google.type.Decimal representations), I would expect that these validations would need to happen on a normalized representation such that 00001.10000, 0.0011e3, 1100e-3 would all validate as precision = 2, scale = 1 — is that right?

Probably not? I don't know, what do SQL databases generally accept? 00001.10000 is an interesting one, I would say this is just invalid - leading zeroes feels like an error. 0.10000 means something different than 0.1 in math, so I'd say the former has more precision. I would want to know what is generally accepted.

@hudlow
Copy link

hudlow commented Nov 25, 2025

From what I can tell, normalization prior to conversion is typical. Unfortunately, I don't have access to the relevant sections of ISO 9075, but this is runnable against all the engines on sqlfiddle.com and they all normalize the values without losing any magnitudinal information:

CREATE TABLE numbers (
  num NUMERIC(2, 1)
);

INSERT INTO numbers(num) VALUES (1.1);
INSERT INTO numbers(num) VALUES (0001.1000);
INSERT INTO numbers(num) VALUES (0.0011e3);
INSERT INTO numbers(num) VALUES (1100e-3);

SELECT * FROM numbers;

00001.10000 is an interesting one, I would say this is just invalid - leading zeroes feels like an error. 0.10000 means something different than 0.1 in math

Yeah, it's strange. google.type.Decimal isn't really specified, and is of no help here in terms of precedent.

Of course, one or more leading zeroes is a common signifier for octal; for example, ECMAScript and Go both interpret 077 (or 0077, etc) as 63. In ECMAScript, 077.0 is a syntax error; in Go, somewhat appallingly, it's 77.0. Go comes by this honestly though—C behaves the same way. Rust ignores leading zeroes in integer and float literals.

The inconsistencies here are an argument for rejecting numbers with leading zeroes on the grounds that someone might expect you to interpret them as octal and someone else might not.

My own pet peeve in the numeric literal space is the lack of overall limits on digit count. I think any spec that says if I give you an unbounded number of zeroes before or after the decimal place you have to parse it—or you have to select your own limit—is saliently deficient.

As far as treating trailing zeroes after the decimal point as indicators of precision, as is done for empirical quantities in science... it certainly bothers me when software calls things significant digits that aren't, but I can't recall ever encountering a programming language that actually treats numeric literals according to the rules of significant digits (where trailing zeroes after the decimal point are significant and trailing zeroes in a number without a decimal point are insignificant).

One reason for this is probably that you can't truncate digits at each step in a mathematical operation, so a programming language that followed the rules of significant digits would have to track the precision of numeric values in parallel to their magnitude.

In a greenfield context, I wouldn't be opposed to validating the significant digits of numeric literals with a predefined precision, but I think the precedent in the SQL-adjacent space of decimal literals has everything to do with whether a value's magnitude can be expressed without respect to inferring anything about its precision from the literal representation.

@bufdev
Copy link
Member Author

bufdev commented Nov 25, 2025

Overly restrictive better than too loose to start if there's a choice - we can always loosen. Time to market is most important.

@hudlow
Copy link

hudlow commented Nov 25, 2025

Overly restrictive better than too loose to start if there's a choice - we can always loosen.

I'd say any change in strictness for a validation rule is a breaking change.

@bufdev
Copy link
Member Author

bufdev commented Nov 26, 2025

Loosening is fine.

@hudlow
Copy link

hudlow commented Nov 26, 2025

I took a swing at an implementation, but hoping someone (@timostamm?) can help fill in some gaps for me:

  1. I'm a little fuzzy on the testing regime.
  2. I'm not at all sure I put the right rules in the right places for the meta-validations of rule values. I couldn't really find much precedent for doing this. @bufdev had employed a pattern that appeared to keep the meta-validation separate, but I couldn't get it to work in practice or correlate it to other precedent.

Comment on lines 3773 to 3787
(predefined).cel = {
id: "string.decimal.precision_minimum"
message: "precision must be at least 1"
expression: "!has(rules.decimal.precision) || rules.decimal.precision > 0"
},
(predefined).cel = {
id: "string.decimal.scale_without_precision"
message: "if scale is set, precision must also be set"
expression: "!has(rules.decimal.scale) || has(rules.decimal.precision)"
},
(predefined).cel = {
id: "string.decimal.scale_less_than_precision"
message: "scale must be less than precision"
expression: "!has(rules.decimal.precision) || !has(rules.decimal.scale) || rules.decimal.scale < rules.decimal.precision"
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timostamm Do we have precedent for validations that do not use this and consequently could be checked at compile time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this, rule, and rules are declared, implementations set the appropriate CEL type. See https://github.com/bufbuild/protovalidate-go/blob/v1.0.1/cache.go#L160-L162. When the implementation compiles the expression, it can raise type errors.

@timostamm
Copy link
Member

testing

This repository defines the validation rules and provides a test suite for implementations.
It does not test that validate.proto is structurally sound, or that its CEL expressions compile.

meta-validations of rule values

As in "if scale is set, precision must also be set"? The rule (buf.validate.message).oneof is a bit similar. It lets the user specify field names. If an unknown field name is specified, implementations must raise a compile error. This validation isn't specified in CEL, it's up to the implementation. Conformance tests.

// // point, and at most four digits before the decimal point.
// // "1000", "100.1", and "100" are valid, whereas "10000" and "1.111" are not.
// string value = 1 [
// (buf.validate.field).decimal.precision = 6,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(buf.validate.field).decimal is not a field.

(buf.validate.field).string is field StringRules string = 14.
(buf.validate.field).string.len is field uint64 len 19.
(buf.validate.field).string.decimal is field DecimalRules decimal = 36.

@hudlow
Copy link

hudlow commented Nov 26, 2025

Update:

  • Per @timostamm's advice, I've punted on meta-validation for now.
  • I added conformance tests.
  • (buf.validate.field).string.decimal = {} validates a decimal number is well formed:
    • valid: 0, 12345, 12345.00000, 12345.12345
    • invalid: 00, 01, 01.2, 0x0, 1e2, 1., .1, 1.2.3
  • (buf.validate.field).string.decimal.precision = 4 validates a number has <= 4 digits:
    • valid: 1234, 1.234, 0.000
    • invalid: 12345, 1.2345, 1234.0, 0.0000
  • (buf.validate.field).string.decimal.scale = 2 validates a number has <= 2 digits after the decimal point:
    • valid: 0, 0.1, 12345.67, 0.00
    • invalid: 1.000, 12345.678
  • (buf.validate.field).string.decimal = { precision: 6, scale: 2 } additionally validates that a number has no more than <= (6-2=)4 digits before the decimal point. This is consistent with SQL limitations for decimal numbers with a defined precision and scale:
    • valid: 1234, 0.00, 1234.56
    • invalid: 12345, 0.000
  • Ran the new conformance tests in protovalidate-es and all seems well.

@bufdev out of band, I think you expressed some doubt as to this as a string format. Some thoughts:

  • There is a structural decimal format as a part of the AEP project offering an alternative vision of how you could do this, but it's severely limited in precision because of the choice to use an int64 as the significand — it doesn't seem like a good fit for parity with SQL (where some databases support a maximum precision of 38, and others support precision in the thousands or tens of thousands).
  • You could make a case for a structural decimal number consisting of a pure-digit-string significand and an int32 exponent, but I think most people would find this much more annoying to consume.
  • There is also an argument for requiring trailing zeroes to indicate the scale which would have the nice effect of requiring numbers of a given scale to be fully-normalized such that equality of strings would indicate equality of magnitude and scale (albeit not necessarily precision).
    • Or you could require no trailing zeroes which would fully normalize magnitude.
    • In the end, it doesn't seem worth it to me to pursue either, because I think some people will find either annoying, and the benefit is minimal.

@hudlow hudlow requested a review from timostamm November 26, 2025 22:31
Comment on lines +3928 to +3929
//
// TODO: Extend to the possible representations that google.type.Decimal allows?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain we'll ever want to do this, but if we do, I think we'll want to move parsing to the libraries which would increase the scope a lot. I suspect we won't want to do that now in any case.

@bufdev
Copy link
Member Author

bufdev commented Dec 1, 2025

Do not merge for now

@hudlow hudlow marked this pull request as draft December 1, 2025 15:51
@hudlow
Copy link

hudlow commented Dec 1, 2025

Converting to draft per @bufdev's comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants