Skip to content

Conversation

@Shourya742
Copy link
Contributor

I closed earlier PR by mistake and removed all the commits: #86.

@Shourya742
Copy link
Contributor Author

@plebhash I’ve adapted the pool according to what we discussed. Let me know what you think. I’ll handle the JDC and Tproxy updates next in this PR.

@Shourya742 Shourya742 force-pushed the 2025-12-12-add-contextual-error branch from 0c2036e to a898fa1 Compare December 13, 2025 11:48
@plebhash
Copy link
Member

@plebhash I’ve adapted the pool according to what we discussed. Let me know what you think. I’ll handle the JDC and Tproxy updates next in this PR.

looks good to me

@Shourya742 Shourya742 marked this pull request as ready for review December 14, 2025 15:50
@Shourya742 Shourya742 force-pushed the 2025-12-12-add-contextual-error branch 2 times, most recently from 90d442c to 31504fc Compare December 17, 2025 10:15
Copy link
Member

@GitGab19 GitGab19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just started the review, but I prefer to have some replies from you before proceeding with the rest, to be sure I got the idea behind the Source introduced here.

@lucasbalieiro
Copy link
Contributor

Started a testing session with the changes of this PR.

msg.channel_id, downstream_id
);
return Err(JDCError::DownstreamNotFound(downstream_id));
return Err(JDCError::fatal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some sort of guideline where we can say what is fatal or not? Or we are on a feeling right now? I think most of them can be deduced from the context, but, just to know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s more like this: if an error is considered fatal, then when it occurs, the associated subsystem will definitely be shut down. It’s up to the implementer to decide whether an error should be treated as fatal, and if so, which subsystem it should affect.

@Shourya742 Shourya742 force-pushed the 2025-12-12-add-contextual-error branch 2 times, most recently from 579ae19 to 7446ef0 Compare December 23, 2025 10:14
return Err(JDCError::LastNewPrevhashNotFound);
return Err(JDCError::fatal(
JDCErrorKind::LastNewPrevhashNotFound,
Impact::Downstream(downstream_id),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact::ChannelManager?

If we don't get any last_new_prev_hash, it means there's something wrong in the ChannelManager.

let Some(last_future_template) = channel_manager_data.last_future_template.clone() else {
error!("No template to share");
return Err(JDCError::FutureTemplateNotPresent);
return Err(JDCError::fatal(JDCErrorKind::FutureTemplateNotPresent, Impact::Downstream(downstream_id)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact::ChannelManager?

If we don't get any future template, it means there's something wrong in the ChannelManager.

let Some(last_new_prev_hash) = channel_manager_data.last_new_prev_hash.clone() else {
error!("No prevhash in system");
return Err(JDCError::LastNewPrevhashNotFound);
return Err(JDCError::fatal(JDCErrorKind::LastNewPrevhashNotFound, Impact::Downstream(downstream_id)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact::ChannelManager?

If we don't get any last_new_prev_hash, it means there's something wrong in the ChannelManager.

let Some(prev_hash) = channel_manager_data.last_new_prev_hash.as_ref() else {
warn!("No prev_hash available yet, ignoring share");
return Err(JDCError::LastNewPrevhashNotFound);
return Err(JDCError::fatal(JDCErrorKind::LastNewPrevhashNotFound, Impact::Downstream(downstream_id)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact::ChannelManager?

If we don't get any last_new_prev_hash, it means there's something wrong in the ChannelManager.

AnyMessage::Extensions(new_require_extensions.into_static().into()).try_into()?;
AnyMessage::Extensions(new_require_extensions.into_static().into())
.try_into()
.map_err(|error| JDCError::fatal(error, Impact::ChannelManager))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have fatal and Impact::ChannelManager here?

Comment on lines 173 to 176
return Err(JDCError::not_fatal(
JDCErrorKind::LastDeclareJobNotFound(msg.request_id),
Impact::ChannelManager,
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

Comment on lines 181 to 184
return Err(JDCError::not_fatal(
JDCErrorKind::LastNewPrevhashNotFound,
Impact::ChannelManager,
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

Comment on lines 215 to 218
return Err(JDCError::not_fatal(
JDCErrorKind::FailedToCreateCustomJob,
Impact::ChannelManager,
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

Comment on lines 221 to 226
let custom_job = custom_job.map_err(|_e| {
JDCError::not_fatal(
JDCErrorKind::FailedToCreateCustomJob,
Impact::ChannelManager,
)
})?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

return Ok(());
}
Err(JDCError::TxDataError)
Err(JDCError::not_fatal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

let Some(token) = token else {
error!("Token not found, template id: {}", msg.template_id);
return Err(JDCError::TokenNotFound);
return Err(JDCError::not_fatal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

let Some(template_message) = template_message else {
error!("Template not found, template id: {}", msg.template_id);
return Err(JDCError::TemplateNotFound(msg.template_id));
return Err(JDCError::not_fatal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not_fatal?

.status_sender
.send(Status {
state: State::UpstreamShutdownFallback(JDCError::Shutdown),
state: State::UpstreamShutdownFallback(JDCErrorKind::Shutdown),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in this case we are not using the new contextual error system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed it thanks.

None,
}

#[derive(Debug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some docs which explain the most common use cases for this values.

For example: when fatal is true and the impact is X, we do this, etc.

pub struct JDCError {
kind: JDCErrorKind,
fatal: bool,
source: Impact,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still have source here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some old commit might be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoot, I forgot to rename the internal type. Let me do that. Thanks.

Comment on lines 79 to 81
pub fn source(&self) -> Impact {
self.source
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to use impact, right?

self.kind
}

pub fn is_fatal_for_source(&self, source: Impact) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same typo here

Copy link
Member

@GitGab19 GitGab19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just finished to look at JDC changes, and I left many comments.

My hot take is that this PR is introducing more granularity in the error handling for sure, but I still think this entire error management is quite convoluted and hard to reason about.

For sure I would add docs to explain how this system entirely works for the sake of maintainability.

Copy link
Member

@GitGab19 GitGab19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just finished to review changes on tProxy, and left many comments on it.

While doing my review, I realized that the use cases for giving context can be resumed in two:

  • take action on a specific component of our apps (fallback to another upstream, restart a component, disconnect a downstream, etc.)
  • don't take action and just log the error

Given this, I think the current approach is not so easy to use (many times I would doubt on the correct Impact value to use, or fatal/not_fatal option to choose) and it should be re-designed a little bit to be more intuitive and clear for everyone.

Here some examples of what I think could be a more clever way to add context to the errors:

  • return Err(TproxyError::restart(Upstream/ChannelManager/etc, error)
  • return Err(TproxyError::fallback(Upstream, error)
  • return Err(TproxyError::disconnect(downstream_id, error)
  • return Err(TproxyError::shutdown(tProxy, error)
  • return Err(TproxyError::log(error)

e
);
TproxyError::ChannelErrorSender
TproxyError::fatal(TproxyErrorKind::ChannelErrorSender, Impact::Downstream(downstream_id.unwrap_or(0)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this unwrap_or(0)?

.map_err(|e| {
error!("Down: Failed to send mining.notify to downstream: {:?}", e);
TproxyError::ChannelErrorSender
TproxyError::fatal(TproxyErrorKind::ChannelErrorSender, Impact::Downstream(downstream_id.unwrap_or(0)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this unwrap_or(0)?

TproxyError::ChannelErrorSender
TproxyError::fatal(
TproxyErrorKind::ChannelErrorSender,
Impact::Downstream(downstream_id.unwrap_or(0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this unwrap_or(0)?

TproxyError::ChannelErrorSender
TproxyError::fatal(
TproxyErrorKind::ChannelErrorSender,
Impact::Downstream(downstream_id.unwrap_or(0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this unwrap_or(0)?

.increment_shares_since_last_update();
}
})
.map_err(|error| TproxyError::fatal(error, Impact::Sv1Server))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact::Downstream?

Don't we want to close the connection with the Downstream in this case?

.recv()
.await
.map_err(TproxyError::ChannelErrorReceiver)?;
.map_err(|error| TproxyError::fatal(error, Impact::ChannelManager))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact:Upstream?

let header = sv2_frame.get_header().ok_or_else(|| {
error!("SV2 frame missing header");
TproxyError::FramingSv2(framing_sv2::Error::MissingHeader)
TproxyError::fatal(framing_sv2::Error::MissingHeader, Impact::ChannelManager)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact:Upstream?

Comment on lines 231 to 233
return Err(TproxyError::fatal(
TproxyErrorKind::UnexpectedMessage(header.ext_type(), header.msg_type()),
Impact::Upstream,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weren't we just ignoring this in the similar case on JDC?

.recv()
.await
.map_err(TproxyError::ChannelErrorReceiver)?;
.map_err(|error| TproxyError::fatal(error, Impact::ChannelManager))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this Impact::Sv1Server?

.status_sender
.send(Status {
state: State::UpstreamShutdown(TproxyError::Shutdown),
state: State::UpstreamShutdown(TproxyErrorKind::Shutdown),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we using this new contextual error system here?

@Shourya742 Shourya742 force-pushed the 2025-12-12-add-contextual-error branch from 4c7cffd to d3dc098 Compare January 4, 2026 08:11
Comment on lines 74 to 77
pub trait CanLog {}
pub trait CanDisconnect {}
pub trait CanFallback {}
pub trait CanShutdown {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions about these traits:

  • why are declaring them inside jd-client crate and not stratum-apps? are we going to re-declare them over and over again them on every app?
  • why are the traits empty? shouldn't they have some methods?
  • what's the rationale for enforcing these traits at compile time? why not just review the code and make sure we don't write anything bad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Yeah, we can declare these inside stratum-apps. This commit was mainly about explaining the approach. I’ll move these marker traits into stratum-apps.

  • These are marker traits, very similar to Send or Sync. They don’t define any methods, they just express a capability or property that the implementor follow.

  • We already have different shutdown semantics across subsystems, some can fallback, some can disconnect, and some can trigger a full application shutdown. From the last dev sync, it was clear there’s still some confusion around subsystem lifecycles. With these traits, we can make those rules explicit in the type system. That way, even if someone writes incorrect code, the compiler will stop them. Relying purely on reviews makes it easy for subtle, unintended behavior to slip in, and this helps us avoid that class of errors altogether.

@Shourya742 Shourya742 force-pushed the 2025-12-12-add-contextual-error branch 5 times, most recently from 9ab00f2 to 1031be4 Compare January 6, 2026 07:02
@Shourya742 Shourya742 force-pushed the 2025-12-12-add-contextual-error branch from 1031be4 to 4cd6f63 Compare January 6, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants