-
Notifications
You must be signed in to change notification settings - Fork 36
Eliminate caddq intrinsics #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
00b155f to
3819863
Compare
CBMC Results (ML-DSA-87)Full Results (174 proofs)
|
CBMC Results (ML-DSA-44)Full Results (174 proofs)
|
CBMC Results (ML-DSA-65)Full Results (174 proofs)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46204 cycles |
46203 cycles |
1.00 |
ML-DSA-44 sign |
131289 cycles |
131278 cycles |
1.00 |
ML-DSA-44 verify |
47763 cycles |
47762 cycles |
1.00 |
ML-DSA-65 keypair |
81015 cycles |
81014 cycles |
1.00 |
ML-DSA-65 sign |
215763 cycles |
215783 cycles |
1.00 |
ML-DSA-65 verify |
80054 cycles |
80051 cycles |
1.00 |
ML-DSA-87 keypair |
132159 cycles |
132161 cycles |
1.00 |
ML-DSA-87 sign |
276888 cycles |
276854 cycles |
1.00 |
ML-DSA-87 verify |
130426 cycles |
130402 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
114189 cycles |
114160 cycles |
1.00 |
ML-DSA-44 sign |
418072 cycles |
417949 cycles |
1.00 |
ML-DSA-44 verify |
122294 cycles |
122254 cycles |
1.00 |
ML-DSA-65 keypair |
195495 cycles |
195504 cycles |
1.00 |
ML-DSA-65 sign |
682472 cycles |
682465 cycles |
1.00 |
ML-DSA-65 verify |
197737 cycles |
197733 cycles |
1.00 |
ML-DSA-87 keypair |
322648 cycles |
322653 cycles |
1.00 |
ML-DSA-87 sign |
864619 cycles |
864668 cycles |
1.00 |
ML-DSA-87 verify |
328624 cycles |
328682 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34471 cycles |
34561 cycles |
1.00 |
ML-DSA-44 sign |
120257 cycles |
119604 cycles |
1.01 |
ML-DSA-44 verify |
38102 cycles |
38161 cycles |
1.00 |
ML-DSA-65 keypair |
61645 cycles |
61342 cycles |
1.00 |
ML-DSA-65 sign |
202965 cycles |
201886 cycles |
1.01 |
ML-DSA-65 verify |
62950 cycles |
63038 cycles |
1.00 |
ML-DSA-87 keypair |
94655 cycles |
93985 cycles |
1.01 |
ML-DSA-87 sign |
237727 cycles |
239107 cycles |
0.99 |
ML-DSA-87 verify |
94851 cycles |
96550 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
237495 cycles |
229189 cycles |
1.04 |
ML-DSA-44 sign |
617962 cycles |
646441 cycles |
0.96 |
ML-DSA-44 verify |
221067 cycles |
226888 cycles |
0.97 |
ML-DSA-65 keypair |
392612 cycles |
411260 cycles |
0.95 |
ML-DSA-65 sign |
1045982 cycles |
1058663 cycles |
0.99 |
ML-DSA-65 verify |
377333 cycles |
393295 cycles |
0.96 |
ML-DSA-87 keypair |
648120 cycles |
682350 cycles |
0.95 |
ML-DSA-87 sign |
1340435 cycles |
1396069 cycles |
0.96 |
ML-DSA-87 verify |
621700 cycles |
651014 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93617 cycles |
93653 cycles |
1.00 |
ML-DSA-44 sign |
333332 cycles |
333354 cycles |
1.00 |
ML-DSA-44 verify |
99744 cycles |
99709 cycles |
1.00 |
ML-DSA-65 keypair |
160092 cycles |
160242 cycles |
1.00 |
ML-DSA-65 sign |
545851 cycles |
546031 cycles |
1.00 |
ML-DSA-65 verify |
160873 cycles |
160833 cycles |
1.00 |
ML-DSA-87 keypair |
268252 cycles |
267347 cycles |
1.00 |
ML-DSA-87 sign |
707830 cycles |
706548 cycles |
1.00 |
ML-DSA-87 verify |
270627 cycles |
270921 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69051 cycles |
69255 cycles |
1.00 |
ML-DSA-44 sign |
187895 cycles |
188098 cycles |
1.00 |
ML-DSA-44 verify |
69083 cycles |
69380 cycles |
1.00 |
ML-DSA-65 keypair |
119654 cycles |
120115 cycles |
1.00 |
ML-DSA-65 sign |
299489 cycles |
301522 cycles |
0.99 |
ML-DSA-65 verify |
115283 cycles |
115505 cycles |
1.00 |
ML-DSA-87 keypair |
203725 cycles |
204908 cycles |
0.99 |
ML-DSA-87 sign |
392930 cycles |
396816 cycles |
0.99 |
ML-DSA-87 verify |
195673 cycles |
197182 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
56503 cycles |
59724 cycles |
0.95 |
ML-DSA-44 sign |
180796 cycles |
193453 cycles |
0.93 |
ML-DSA-44 verify |
61187 cycles |
65079 cycles |
0.94 |
ML-DSA-65 keypair |
98682 cycles |
104370 cycles |
0.95 |
ML-DSA-65 sign |
298537 cycles |
315933 cycles |
0.94 |
ML-DSA-65 verify |
100423 cycles |
106176 cycles |
0.95 |
ML-DSA-87 keypair |
156536 cycles |
162885 cycles |
0.96 |
ML-DSA-87 sign |
364760 cycles |
379531 cycles |
0.96 |
ML-DSA-87 verify |
156758 cycles |
164743 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42082 cycles |
41564 cycles |
1.01 |
ML-DSA-44 sign |
134163 cycles |
133585 cycles |
1.00 |
ML-DSA-44 verify |
45157 cycles |
44717 cycles |
1.01 |
ML-DSA-65 keypair |
73088 cycles |
72591 cycles |
1.01 |
ML-DSA-65 sign |
214510 cycles |
214322 cycles |
1.00 |
ML-DSA-65 verify |
73546 cycles |
73308 cycles |
1.00 |
ML-DSA-87 keypair |
108117 cycles |
108001 cycles |
1.00 |
ML-DSA-87 sign |
252050 cycles |
253603 cycles |
0.99 |
ML-DSA-87 verify |
111866 cycles |
109742 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 3819863 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-65 keypair |
75829 cycles |
72591 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135619 cycles |
134864 cycles |
1.01 |
ML-DSA-44 sign |
526267 cycles |
523274 cycles |
1.01 |
ML-DSA-44 verify |
148308 cycles |
147587 cycles |
1.00 |
ML-DSA-65 keypair |
226650 cycles |
226332 cycles |
1.00 |
ML-DSA-65 sign |
860097 cycles |
860452 cycles |
1.00 |
ML-DSA-65 verify |
234863 cycles |
234687 cycles |
1.00 |
ML-DSA-87 keypair |
370322 cycles |
370327 cycles |
1.00 |
ML-DSA-87 sign |
1078704 cycles |
1078650 cycles |
1.00 |
ML-DSA-87 verify |
381978 cycles |
382103 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157726 cycles |
166712 cycles |
0.95 |
ML-DSA-44 sign |
550116 cycles |
582606 cycles |
0.94 |
ML-DSA-44 verify |
169086 cycles |
179293 cycles |
0.94 |
ML-DSA-65 keypair |
267836 cycles |
285346 cycles |
0.94 |
ML-DSA-65 sign |
902166 cycles |
964659 cycles |
0.94 |
ML-DSA-65 verify |
274146 cycles |
292689 cycles |
0.94 |
ML-DSA-87 keypair |
447769 cycles |
479766 cycles |
0.93 |
ML-DSA-87 sign |
1157051 cycles |
1244761 cycles |
0.93 |
ML-DSA-87 verify |
457856 cycles |
490453 cycles |
0.93 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68281 cycles |
68206 cycles |
1.00 |
ML-DSA-44 sign |
201979 cycles |
201988 cycles |
1.00 |
ML-DSA-44 verify |
70738 cycles |
70695 cycles |
1.00 |
ML-DSA-65 keypair |
121375 cycles |
121162 cycles |
1.00 |
ML-DSA-65 sign |
330717 cycles |
331255 cycles |
1.00 |
ML-DSA-65 verify |
118005 cycles |
118031 cycles |
1.00 |
ML-DSA-87 keypair |
198121 cycles |
198330 cycles |
1.00 |
ML-DSA-87 sign |
426802 cycles |
426779 cycles |
1.00 |
ML-DSA-87 verify |
194748 cycles |
194253 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72334 cycles |
72197 cycles |
1.00 |
ML-DSA-44 sign |
212135 cycles |
212050 cycles |
1.00 |
ML-DSA-44 verify |
75716 cycles |
75727 cycles |
1.00 |
ML-DSA-65 keypair |
127531 cycles |
127412 cycles |
1.00 |
ML-DSA-65 sign |
350281 cycles |
350180 cycles |
1.00 |
ML-DSA-65 verify |
125483 cycles |
125339 cycles |
1.00 |
ML-DSA-87 keypair |
205301 cycles |
208131 cycles |
0.99 |
ML-DSA-87 sign |
443389 cycles |
448959 cycles |
0.99 |
ML-DSA-87 verify |
205204 cycles |
205063 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120766 cycles |
120426 cycles |
1.00 |
ML-DSA-44 sign |
452473 cycles |
449374 cycles |
1.01 |
ML-DSA-44 verify |
130839 cycles |
131829 cycles |
0.99 |
ML-DSA-65 keypair |
205544 cycles |
208687 cycles |
0.98 |
ML-DSA-65 sign |
729796 cycles |
740391 cycles |
0.99 |
ML-DSA-65 verify |
211042 cycles |
213436 cycles |
0.99 |
ML-DSA-87 keypair |
338441 cycles |
337635 cycles |
1.00 |
ML-DSA-87 sign |
929258 cycles |
924886 cycles |
1.00 |
ML-DSA-87 verify |
348567 cycles |
345917 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128278 cycles |
128259 cycles |
1.00 |
ML-DSA-44 sign |
447568 cycles |
447651 cycles |
1.00 |
ML-DSA-44 verify |
138373 cycles |
138315 cycles |
1.00 |
ML-DSA-65 keypair |
220146 cycles |
220341 cycles |
1.00 |
ML-DSA-65 sign |
727221 cycles |
727602 cycles |
1.00 |
ML-DSA-65 verify |
223062 cycles |
223189 cycles |
1.00 |
ML-DSA-87 keypair |
365113 cycles |
365093 cycles |
1.00 |
ML-DSA-87 sign |
926622 cycles |
926051 cycles |
1.00 |
ML-DSA-87 verify |
372778 cycles |
372761 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138533 cycles |
138530 cycles |
1.00 |
ML-DSA-44 sign |
484119 cycles |
484127 cycles |
1.00 |
ML-DSA-44 verify |
148714 cycles |
148699 cycles |
1.00 |
ML-DSA-65 keypair |
242002 cycles |
242316 cycles |
1.00 |
ML-DSA-65 sign |
792696 cycles |
792717 cycles |
1.00 |
ML-DSA-65 verify |
241201 cycles |
241180 cycles |
1.00 |
ML-DSA-87 keypair |
396212 cycles |
396270 cycles |
1.00 |
ML-DSA-87 sign |
1012825 cycles |
1012390 cycles |
1.00 |
ML-DSA-87 verify |
402487 cycles |
402495 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113890 cycles |
113785 cycles |
1.00 |
ML-DSA-44 sign |
356860 cycles |
356400 cycles |
1.00 |
ML-DSA-44 verify |
118531 cycles |
118156 cycles |
1.00 |
ML-DSA-65 keypair |
197180 cycles |
196636 cycles |
1.00 |
ML-DSA-65 sign |
589760 cycles |
589236 cycles |
1.00 |
ML-DSA-65 verify |
194927 cycles |
194738 cycles |
1.00 |
ML-DSA-87 keypair |
323665 cycles |
323344 cycles |
1.00 |
ML-DSA-87 sign |
755812 cycles |
754065 cycles |
1.00 |
ML-DSA-87 verify |
321048 cycles |
320254 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
829325 cycles |
827416 cycles |
1.00 |
ML-DSA-44 sign |
3239633 cycles |
3233101 cycles |
1.00 |
ML-DSA-44 verify |
925127 cycles |
922573 cycles |
1.00 |
ML-DSA-65 keypair |
1413241 cycles |
1410466 cycles |
1.00 |
ML-DSA-65 sign |
5353017 cycles |
5337064 cycles |
1.00 |
ML-DSA-65 verify |
1482432 cycles |
1479411 cycles |
1.00 |
ML-DSA-87 keypair |
2313345 cycles |
2308452 cycles |
1.00 |
ML-DSA-87 sign |
6671028 cycles |
6657983 cycles |
1.00 |
ML-DSA-87 verify |
2417189 cycles |
2413172 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213826 cycles |
212836 cycles |
1.00 |
ML-DSA-44 sign |
762167 cycles |
760705 cycles |
1.00 |
ML-DSA-44 verify |
241958 cycles |
229196 cycles |
1.06 |
ML-DSA-65 keypair |
381627 cycles |
380999 cycles |
1.00 |
ML-DSA-65 sign |
1253488 cycles |
1254188 cycles |
1.00 |
ML-DSA-65 verify |
372913 cycles |
372030 cycles |
1.00 |
ML-DSA-87 keypair |
606826 cycles |
604389 cycles |
1.00 |
ML-DSA-87 sign |
1594099 cycles |
1595105 cycles |
1.00 |
ML-DSA-87 verify |
618467 cycles |
618551 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309195 cycles |
299195 cycles |
1.03 |
ML-DSA-44 sign |
1168415 cycles |
1162268 cycles |
1.01 |
ML-DSA-44 verify |
335171 cycles |
330180 cycles |
1.02 |
ML-DSA-65 keypair |
561211 cycles |
555502 cycles |
1.01 |
ML-DSA-65 sign |
1917406 cycles |
1912815 cycles |
1.00 |
ML-DSA-65 verify |
537657 cycles |
527139 cycles |
1.02 |
ML-DSA-87 keypair |
863436 cycles |
868917 cycles |
0.99 |
ML-DSA-87 sign |
2447930 cycles |
2435700 cycles |
1.01 |
ML-DSA-87 verify |
887441 cycles |
879744 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309195 cycles |
299195 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
272576 cycles |
271747 cycles |
1.00 |
ML-DSA-44 sign |
801220 cycles |
799220 cycles |
1.00 |
ML-DSA-44 verify |
273471 cycles |
272494 cycles |
1.00 |
ML-DSA-65 keypair |
466875 cycles |
469149 cycles |
1.00 |
ML-DSA-65 sign |
1312938 cycles |
1319018 cycles |
1.00 |
ML-DSA-65 verify |
449913 cycles |
451950 cycles |
1.00 |
ML-DSA-87 keypair |
806611 cycles |
805651 cycles |
1.00 |
ML-DSA-87 sign |
1800985 cycles |
1810381 cycles |
0.99 |
ML-DSA-87 verify |
782904 cycles |
783507 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
72bc3f8 to
d186f5e
Compare
This commit replace the eurrently caddq AVX2 implementation to x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
a467f42 to
10f3614
Compare
This commit adds mld_poly_caddq to the benchmark components to evaluate the performance impact of replacing the caddq AVX2 intrinsics with x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113342 cycles |
113370 cycles |
1.00 |
ML-DSA-44 sign |
356021 cycles |
355986 cycles |
1.00 |
ML-DSA-44 verify |
117872 cycles |
118036 cycles |
1.00 |
ML-DSA-65 keypair |
196532 cycles |
196544 cycles |
1.00 |
ML-DSA-65 sign |
589189 cycles |
589033 cycles |
1.00 |
ML-DSA-65 verify |
194577 cycles |
194759 cycles |
1.00 |
ML-DSA-87 keypair |
322408 cycles |
322752 cycles |
1.00 |
ML-DSA-87 sign |
752104 cycles |
753067 cycles |
1.00 |
ML-DSA-87 verify |
319915 cycles |
320159 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212629 cycles |
212540 cycles |
1.00 |
ML-DSA-44 sign |
760055 cycles |
759958 cycles |
1.00 |
ML-DSA-44 verify |
228848 cycles |
228975 cycles |
1.00 |
ML-DSA-65 keypair |
380543 cycles |
380692 cycles |
1.00 |
ML-DSA-65 sign |
1252397 cycles |
1252836 cycles |
1.00 |
ML-DSA-65 verify |
371721 cycles |
371790 cycles |
1.00 |
ML-DSA-87 keypair |
604737 cycles |
604270 cycles |
1.00 |
ML-DSA-87 sign |
1593720 cycles |
1593938 cycles |
1.00 |
ML-DSA-87 verify |
618504 cycles |
618393 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
241958 cycles |
229196 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
461720 cycles |
461204 cycles |
1.00 |
ML-DSA-44 sign |
2132007 cycles |
2133615 cycles |
1.00 |
ML-DSA-44 verify |
546386 cycles |
546329 cycles |
1.00 |
ML-DSA-65 keypair |
773928 cycles |
774556 cycles |
1.00 |
ML-DSA-65 sign |
3496455 cycles |
3505243 cycles |
1.00 |
ML-DSA-65 verify |
849310 cycles |
849774 cycles |
1.00 |
ML-DSA-87 keypair |
1253417 cycles |
1251282 cycles |
1.00 |
ML-DSA-87 sign |
4370207 cycles |
4327691 cycles |
1.01 |
ML-DSA-87 verify |
1368861 cycles |
1367270 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
237495 cycles |
229189 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
6faaac2 to
5b1b8a7
Compare
Signed-off-by: willieyz <willie.zhao@chelpis.com>
5b1b8a7 to
f9a6d30
Compare
poly_caddqwith assembly #491In this PR, we replace the AVX2 intrinsics implementation of
poly_caddqwith a x86_64 assembly version.To estimate the performance impact, we compare the results shown in the two tables below.
Overall, for keypair, sign, and verify (opt), the performance difference is below 1%, which is consistent with the no-opt case.
In the component-level benchmark for mld_poly_caddq, the observed performance differences are at least 17%. After unrolling the loop by a factor of 4, the differences are reduced to approximately 10%.