iotaledger · oliviasaa · Nov 26, 2025
diff --git a/iips/IIP-rewards/IIP.md b/iips/IIP-rewards/IIP.md
@@ -0,0 +1,118 @@
+---
+iip: 
+title: Score-based Reward Distribution
+description: New reward distribution algorithms based on validator scores.
+author: Olivia Saa (@oliviasaa) <olivia.saa@iota.org>
+discussions-to: 
+status: Draft
+type: 
+layer: 
+created:
+requires: 
+---
+
+
+# Motivation
+
+We proposed in IIP ADD LINK an **automated and standardized system** for monitoring validator behavior, that would culminate in a commonly agreed score that reflects each validator’s performance during an epoch. In the current IIP, we propose how these scores are calculated to directly influence the rewards distributed at the epoch’s end.
+
+Distributed protocols frequently depend on validator-reported metrics that cannot be objectively verified. These *unprovable metrics* introduce a significant vulnerability: malicious validators may strategically distort their reports to influence the aggregated scores used for rewards or penalties.
+
+In a validator set of size $n = 3f + 1$ with up to $f$ Byzantine actors, these challenges require an aggregation method that:
+
+1. Produces consistent and stable scores for honest validators.  
+2. Limits the influence of malicious or noisy outliers without requiring explicit provability.  
+3. Remains simple, deterministic, and compatible with existing consensus fault assumptions.  
+
+This IIP introduces a robust scoring mechanism based on **median order statistics**, ensuring that aggregated scores reflect the honest cluster even in adversarial conditions.
+
+## Specification
+
+### Metrics
+
+During an epoch, each validator will monitor its peers throughout an epoch, collecting performance metrics for every other validator. Regardless of the exact set of metrics used, they are divided into two categories: **provable** and **unprovable** metrics.
+
+While provable metrics allow for objective verification and the submission of proofs for misbehavior, unprovable metrics naturally lack such formal accountability mechanisms. At the end of the epoch, validators will share a common view of provable metrics. Based on IIP ADD LINK, validators will also share a common view of the metrics that all validators report about each other. We begin by defining how such reports can be deterministically aggregated.
+
+### Aggregation of metrics
+
+Let $m$ be a metric of interest. Let $m(i, j)$ denote validator $i$’s subjective report about validator $j$'s metric $m$. For each validator $j$ and metric $m$, collect all received reports:
+
+$R(j) = \{ m(i, j) | \text{ for all validators } i \text{ who sent reports}\}$
+
+The aggregated score for validator $j$ is computed using the **stake weighted median** of the report set:
+
+$m_a(j) = \text{stake weighted median}( R(j) )$
+
+This median-based aggregation is the base for the entirety of the scoring pipeline. No additional filtering, misbehavior detection, or penalization logic is proposed.
+
+### Computing the score
+
+- **Step 1:** For all unprovable metrics, we perform the median-based aggregation defined above, resulting on a single value $m_a(j)$ for each metric and validator.
+
+    Each value $m_a(j)$ is then normalized into a value $m_\text{score}$ between 0 and 1, reflecting how close a validator’s behavior remains to the acceptable range. Note that this normalization is not directly suitable for fixed-point arithmetic and must be appropriately scaled in the implementation.
+
+    - If $m(i) ≤ m_{\text{allowance}}$, then $m_{\text{score}}(i) = 1$,
+    - If $m(i) ≥ m_{\text{max}}$, then  $m_{\text{score}}(i) = 0$,
+    - Otherwise, $m_{\text{score}}(i) = (m_{\text{max}} - m(i))/(m_{\text{max}} - m_{\text{allowance}})$
+
+    where:
+    - $m(i)$ is the measured count for validator $i$,
+    - $m_{\text{allowance}}$ defines the tolerated number of occurrences without penalty,
+    - $m_{\text{max}}$ is the threshold beyond which the score for that metric becomes zero.
+
+
+- **Step 2:** Once all individual metric scores are obtained, they are combined into a final single score for each validator. We do that by first dividing the metrics into "hard punishable" misbehaviors or not.
+
+    Given our set of hard punishable metrics $m^1, ..., m^N$ and our set of not hard punishable metrics $m^{N+1}, ..., m^{N+n}$ we combine $m_{scores}$ as:
+
+    $\text{score}(i) = \text{max score} * \prod_{j=1}^{N} m^j_{scores}(i) * (a + \sum_{j=N+1}^{N+n} w_j * m^j_{scores}(i)) / (a + \sum_{j=N+1}^{N+n} w_j)$
+
+where
+- $w_j$ are weights for the not hardly punishable metrics.
+- $\text{max score}$ is another normalization factor determining the upper bound of the score,
+- $a$ is a baseline term ensuring the score remains positive even when all metrics (except equivocations) are degraded.
+
+The use of the separate formulas in the metric normalization ensures that no metric can contribute a negative value, effectively capping penalties once the threshold is exceeded. This formulation provides a flexible and deterministic scoring system, where all validators can independently compute identical scores given the same inputs. The weighting and allowance parameters can be tuned through protocol updates to maintain desired game-theoretical properties, balancing fairness, robustness, and network performance incentives.
+
+
+## Rationale
+
+### Justification for Median as the Order Statistic
+
+In a system with at most $f$ Byzantine actors among $3f+1$ validators:
+
+- Honest reports about honest validators are relatively similar.
+- Malicious actors may submit extreme or adversarial reports.
+- Order statistics naturally suppress the influence of such outliers.
+
+The **median** possesses the following desirable properties:
+
+1. **Byzantine Robustness**  
+   The median is unaffected by up to $f$ arbitrarily incorrect high or low reports, provided the honest cluster constitutes a majority.  
+   In a $3f+1$ validator model, the median always lies within the honest range.
+
+2. **Deterministic and Simplicity**  
+   The aggregation is easily verifiable, does not require iterative filtering, and introduces no additional complexity into scoring.
+
+3. **Resilience to Noise**  
+   Natural variance in honest subjective reports does not distort the median significantly, preserving stable and predictable output.
+
+The median is therefore the minimal and sufficient mechanism for achieving robust aggregation of unprovable metrics under standard Byzantine assumptions.
+
+### Justification for the scoring formula
+
+- Penalizing part of misbehaviors through multiplicative scoring: By structuring the score as a multiplicative function, where part of the metrics acts as a scaling factor on the auxiliary score, we ensure that any hardly penalizable incident proportionally suppresses all other positive contributions.
+For example, a validator that equivocates repeatedly should not compensate for this behavior through otherwise good performance—its overall score should be significantly reduced regardless of other metrics. This design introduces non-linear penalties, discouraging validators from engaging in behavior that could compromise network integrity, even if other performance aspects remain satisfactory.
+- Linearly Weighted Auxiliary Components: the second term in the formula aggregates less severe but still important aspects of validator behavior through a weighted linear combination.
+Each component contributes proportionally to the total, with penalties that increase smoothly as the corresponding count grows beyond its allowance threshold.
+This linear design simplifies interpretation and configuration, allowing us to tune the sensitivity of each metric independently.
+- Allowance Parameters: Each metric includes an allowance term, defining the level of deviation tolerated before a penalty applies. This mechanism acknowledges the realities of distributed operation—temporary network issues, minor configuration errors, or maintenance downtime—and ensures validators are not unduly penalized for rare, non-malicious events. It also makes the system adaptable: as the network matures or validator requirements evolve, allowances can be adjusted to reflect new reliability expectations.
+- Tunable Weights: Finally, the weighting parameters $w_i$ provide a straightforward method for the protocol to rebalance incentives over time. By adjusting these values, we can emphasize specific aspects of validator performance—such as uptime, accuracy, or consensus integrity—based on empirical observations and evolving network priorities.
+
+# Reference Implementation
+
+An initial set of metrics has already been implemented in the iota repository, along with a simple scoring function that serves as a placeholder for a more complete version. This reference implementation is available in the (already merged) [PR#7604](https://github.com/iotaledger/iota/pull/7604) and [PR#7921](https://github.com/iotaledger/iota/pull/7921). A more complex scoring formula following this IIP is implementated in this (still not merged) [PR#8521](https://github.com/iotaledger/iota/pull/8521).
+
+# Backwards Compatibility
+