Skip to content

Conversation

@GitGab19
Copy link
Member

@GitGab19 GitGab19 commented Dec 10, 2025

This PR introduces a new module called monitoring to the stratum-apps crate.

This new module defines the necessary structures which are used to represent every channels' data already retrievable through the channels_sv2 getter methods.

It also defines traits with methods to serve these data through the HTTP APIs, implemented by our applications.

It also adds support to Prometheus, by exporting metrics at /metrics endpoint.

Closes #105

@GitGab19 GitGab19 force-pushed the monitoring-apis branch 5 times, most recently from 29f48b0 to 9940716 Compare December 10, 2025 22:01
]

# Monitoring HTTP server address for exposing channel data
monitoring_address = "0.0.0.0:9091"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be overthinking it, but I’d suggest binding this to 127.0.0.1 by default. Since we don’t have rate limiting or authentication in place yet, exposing it more broadly could make some endpoints vulnerable to abuse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDC is expected to run on "trusted" LAN environments

but I guess Pool is somewhat vulnerable to this indeed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the Pool's config examples use 127.0.0.1 in 3a030ff

@gimballock

This comment was marked as resolved.

@GitGab19 GitGab19 marked this pull request as draft December 12, 2025 16:23
@GitGab19
Copy link
Member Author

GitGab19 commented Dec 12, 2025

Turned this PR into draft, since I noticed apps are hanging when I try to do ctrl+c.

Furthermore, I'm considering the usage of Axum as a web framework instead of the manual approach I initially took.

@GitGab19 GitGab19 force-pushed the monitoring-apis branch 3 times, most recently from 1b0ff12 to 4a3c1f6 Compare December 12, 2025 19:38
@GitGab19 GitGab19 marked this pull request as ready for review December 12, 2025 19:38
@GitGab19
Copy link
Member Author

@lucasbalieiro @gimballock this PR is ready for review again.

I refactored the http server to use Axum, and I added support to Prometheus, by exporting metrics at the /metrics/ endpoint.

/// Handler for Prometheus metrics endpoint
///
/// Collects fresh data from all monitoring sources and exports as Prometheus metrics
async fn handle_prometheus_metrics(State(state): State<ServerState>) -> Response {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure whether this is intentional, as I’m not fully familiar with how Prometheus handles this data. However, this endpoint does not appear to be cleaned up properly: it continues to retain and serve data for disconnected channels.

This behavior also makes the endpoint susceptible to abuse.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics retain per-channel data (id, user) that should be cleaned up with metric.remove_label_values() when a channel is closed:

  • sv2_client_channel_hashrate
  • sv2_client_shares_accepted_total
  • sv2_client_channel_shares_per_minute

You could collect that info in ChannelManager.remove_downstream() but how do you pass it to the monitoring layer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed improvements about this.

I basically added a reset of the three mentioned metrics at the beginning of handle_prometheus_metrics().

In this way we're always gonna show currently opened channels' data through the Prometheus endpoint as well.

Comment on lines 116 to 117
let extended = self.get_client_extended_channels();
let standard = self.get_client_standard_channels();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these functions loop over downstream clients to count channels and hashrate.
You could combine them to collect these stats in a single pass.
Additionally, if you collect all the clients in one pass and then collect the stats in a second pass then you can avoid having nested locks as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding other helper functions called internally by these two can become something too convoluted for this purpose.

If you have a concrete suggestion to improve this, I'm open to it!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to replace those helpers:

fn get_client_summary(&self) -> ChannelsWithClientSummary {
    // Collect client references
    let clients: Vec<_> = self
        .channel_manager_data
        .safe_lock(|d| {
            d.downstream
                .values()
                .map(|client| client.downstream_data.clone())
                .collect()
        })
        .unwrap_or_default();
        
    // Count channels and hashrate
    let mut extended_channels = 0usize;
    let mut standard_channels = 0usize;
    let mut total_hashrate = 0.0f32;
    for downstream_data in clients {
        downstream_data
            .safe_lock(|dd| {
                for (_channel_id, extended_channel) in dd.extended_channels.iter() {
                    extended_channels += 1;
                    total_hashrate += extended_channel.get_nominal_hashrate();
                }
                for (_channel_id, standard_channel) in dd.standard_channels.iter() {
                    standard_channels += 1;
                    total_hashrate += standard_channel.get_nominal_hashrate();
                }
            })
            .unwrap_or_default();
    }
    ChannelsWithClientSummary {
        total_channels: extended_channels + standard_channels,
        extended_channels,
        standard_channels,
        total_hashrate,
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the problem is the nested lock, we should probably remove it also from the other two method implementations (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the trait, and now it has only one method to implement, which is called get_clients().

And then there's another method already implemented called get_clients_summary() which internally calls the get_clients(), to remove redundancy of the operations.

Let me know your thoughts about this new structure.

@GitGab19 GitGab19 force-pushed the monitoring-apis branch 2 times, most recently from bf17aef to a855e0b Compare December 15, 2025 12:05
@plebhash
Copy link
Member

would it make sense to add some Integration Tests to assert the correctness of these new monitoring APIs?

deploy some apps, do HTTP requests against their endpoints, then parse and assert against the data

I'd say this suggestion is not necessarily a blocker for this PR, since I'm doing some manual curl testing and the APIs seem sane (although I can't say I comprehensively tested everything)

but it might be a good idea for a follow-up, since we might unintentionally break these new APIs in the future, so having some CI feels like a good long-term strategy?

@GitGab19
Copy link
Member Author

GitGab19 commented Jan 4, 2026

would it make sense to add some Integration Tests to assert the correctness of these new monitoring APIs?

deploy some apps, do HTTP requests against their endpoints, then parse and assert against the data

I'd say this suggestion is not necessarily a blocker for this PR, since I'm doing some manual curl testing and the APIs seem sane (although I can't say I comprehensively tested everything)

but it might be a good idea for a follow-up, since we might unintentionally break these new APIs in the future, so having some CI feels like a good long-term strategy?

Opened an issue for it: #161

GitGab19 added a commit to GitGab19/sv2-apps that referenced this pull request Jan 5, 2026
@GitGab19
Copy link
Member Author

GitGab19 commented Jan 5, 2026

@lucasbalieiro I adapted the docker configs in c26abb5.

Please check everything is correct there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monitoring APIs to expose mining data

5 participants