You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dotnet
run
--project src/TaguchiBench.LiveBenchRunner
--livebench-scripts-path /path/to/livebench/livebench
--llama-server-exe /usr/bin/llama-server
--llama-model /models/Qwen3-30B-A3B-128K-UD-Q5_K_XL.gguf
--model-name optimized-model
--lb-bench-name live_bench
--lb-release 2024-11-25
--lb-num-questions 10
--lb-parallel 10
--lb-max-tokens 4096
--lb-system-prompt /no_think You are an advanced Software Development AI. It is your purpose to create elegant, modular, type safe, and composable solutions. Adhere strictly to your instructions.
--llama-host 127.0.0.1
--llama-port 8088
--n-gpu-layers 1000
--flash-attn
--cache-type-k q8_0
--cache-type-v q8_0
--env-CUDA_VISIBLE_DEVICES 0
Analysis for Metric: 'AverageScore'
S/N Ratio Type Used:Larger is Better
Analysis Warnings (AverageScore - Initial ANOVA):
⚠ Saturated Design or Model: Error Degrees of Freedom is 0. F-values and P-values may not be calculable or reliable. Significance tests will be impacted.
⚠ Error term is invalid (DF=0, MSE=NaN). F-values and P-values cannot be reliably calculated. This might be due to a saturated design or zero error variance.
⚠ Error DF is ≤ 0. ANOVA F-values and P-values cannot be calculated. Statistical significance is unreliable.
⚠ Saturated Design: All available degrees of freedom assigned to factors. Consider pooling or a larger design.
Analysis Warnings (AverageScore - Pooled ANOVA):
⚠ No significant factors initially; pooled 'PresencePenalty' (smallest F-value/contribution).
Optimal Configuration (for 'AverageScore')
PresencePenalty: 1.5# OA Level: 2Temperature: 0.5# OA Level: 1TopP: 0.80# OA Level: 1
Predicted Performance (for 'AverageScore' at Optimal)
Value:63.8212 (Original Scale)
95% CI:[59.3805 - 68.5940]
Initial ANOVA Results ('AverageScore')
Factor/Interaction
Contrib (%)
SS
DF
MS
F-Value
p-Value
Significant (α=0.05)
TopP
56.78
0.0500
1
0.0500
N/A
N/A
No
Temperature
32.19
0.0284
1
0.0284
N/A
N/A
No
PresencePenalty
11.03
0.0097
1
0.0097
N/A
N/A
No
Error
0.00
0.0000
0
N/A
N/A
N/A
N/A
Total
0.0881
3
Contribution Percentage Visualization
pie
title Contribution Percentages
"TopP" : 56.8
"Temperature" : 32.2
"PresencePenalty" : 11.0
Loading
Pooled ANOVA Results ('AverageScore')
Factors Pooled into Error: PresencePenalty
Factor/Interaction
Contrib (%)
SS
DF
MS
F-Value
p-Value
Significant (α=0.05)
TopP
56.78
0.0500
1
0.0500
5.15
0.2643
No
Temperature
32.19
0.0284
1
0.0284
2.92
0.3372
No
Error (Pooled)
11.03
0.0097
1
0.0097
N/A
N/A
N/A
Total
0.0881
3
Contribution Percentage Visualization
pie
title Contribution Percentages
"TopP" : 56.8
"Temperature" : 32.2
Loading
Main Effects (for 'AverageScore')
Parameter
Level Value
Avg S/N Ratio
Avg Raw Metric
PresencePenalty
0.5
36.0500
63.4761
1.5
36.1486
64.1848
Temperature
0.5
36.1835
64.4446
1.0
36.0151
63.2163
TopP
0.80
36.2111
64.6488
0.95
35.9875
63.0121
Main Effect Chart: PresencePenalty
xychart-beta
title "Main Effect: PresencePenalty"
x-axis ["0.5", "1.5"]
y-axis "Raw Metric Value"
line [63.4761, 64.1848]
Loading
Main Effect Chart: Temperature
xychart-beta
title "Main Effect: Temperature"
x-axis ["0.5", "1.0"]
y-axis "Raw Metric Value"
line [64.4446, 63.2163]
Loading
Main Effect Chart: TopP
xychart-beta
title "Main Effect: TopP"
x-axis ["0.80", "0.95"]
y-axis "Raw Metric Value"
line [64.6488, 63.0121]
Loading
Effect Estimates (S/N Scale - for 'AverageScore')
Source
Effect Est.
Abs(Effect)
TopP
-0.2237
0.2237
Temperature
-0.1684
0.1684
PresencePenalty
0.0986
0.0986
Analysis for Metric: 'Time'
S/N Ratio Type Used:Smaller is Better
Analysis Warnings (Time - Initial ANOVA):
⚠ Saturated Design or Model: Error Degrees of Freedom is 0. F-values and P-values may not be calculable or reliable. Significance tests will be impacted.
⚠ Error term is invalid (DF=0, MSE=NaN). F-values and P-values cannot be reliably calculated. This might be due to a saturated design or zero error variance.
⚠ Error DF is ≤ 0. ANOVA F-values and P-values cannot be calculated. Statistical significance is unreliable.
⚠ Saturated Design: All available degrees of freedom assigned to factors. Consider pooling or a larger design.
Analysis Warnings (Time - Pooled ANOVA):
⚠ No significant factors initially; pooled 'PresencePenalty' (smallest F-value/contribution).
Optimal Configuration (for 'Time')
PresencePenalty: 1.5# OA Level: 2Temperature: 1.0# OA Level: 2TopP: 0.80# OA Level: 1
Predicted Performance (for 'Time' at Optimal)
Value:765.2483 (Original Scale)
95% CI:[787.3618 - 743.7558]
Initial ANOVA Results ('Time')
Factor/Interaction
Contrib (%)
SS
DF
MS
F-Value
p-Value
Significant (α=0.05)
TopP
49.49
0.0113
1
0.0113
N/A
N/A
No
Temperature
43.88
0.0100
1
0.0100
N/A
N/A
No
PresencePenalty
6.62
0.0015
1
0.0015
N/A
N/A
No
Error
0.00
0.0000
0
N/A
N/A
N/A
N/A
Total
0.0229
3
Contribution Percentage Visualization
pie
title Contribution Percentages
"TopP" : 49.5
"Temperature" : 43.9
"PresencePenalty" : 6.6
Loading
Pooled ANOVA Results ('Time')
Factors Pooled into Error: PresencePenalty
Factor/Interaction
Contrib (%)
SS
DF
MS
F-Value
p-Value
Significant (α=0.05)
TopP
49.49
0.0113
1
0.0113
7.47
0.2233
No
Temperature
43.88
0.0100
1
0.0100
6.62
0.2359
No
Error (Pooled)
6.62
0.0015
1
0.0015
N/A
N/A
N/A
Total
0.0229
3
Contribution Percentage Visualization
pie
title Contribution Percentages
"TopP" : 49.5
"Temperature" : 43.9
Loading
Main Effects (for 'Time')
Parameter
Level Value
Avg S/N Ratio
Avg Raw Metric
PresencePenalty
0.5
-57.6955
766.9660
1.5
-57.6566
763.5886
Temperature
0.5
-57.7262
769.6829
1.0
-57.6259
760.8717
TopP
0.80
-57.6228
760.5971
0.95
-57.7293
769.9575
Main Effect Chart: PresencePenalty
xychart-beta
title "Main Effect: PresencePenalty"
x-axis ["0.5", "1.5"]
y-axis "Raw Metric Value"
line [766.9660, 763.5886]
Loading
Main Effect Chart: Temperature
xychart-beta
title "Main Effect: Temperature"
x-axis ["0.5", "1.0"]
y-axis "Raw Metric Value"
line [769.6829, 760.8717]
Loading
Main Effect Chart: TopP
xychart-beta
title "Main Effect: TopP"
x-axis ["0.80", "0.95"]
y-axis "Raw Metric Value"
line [760.5971, 769.9575]