-
Notifications
You must be signed in to change notification settings - Fork 12
Add contextual error #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add contextual error #132
Conversation
|
@plebhash I’ve adapted the pool according to what we discussed. Let me know what you think. I’ll handle the JDC and Tproxy updates next in this PR. |
0c2036e to
a898fa1
Compare
looks good to me |
90d442c to
31504fc
Compare
GitGab19
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just started the review, but I prefer to have some replies from you before proceeding with the rest, to be sure I got the idea behind the Source introduced here.
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
|
Started a testing session with the changes of this PR. |
| msg.channel_id, downstream_id | ||
| ); | ||
| return Err(JDCError::DownstreamNotFound(downstream_id)); | ||
| return Err(JDCError::fatal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have some sort of guideline where we can say what is fatal or not? Or we are on a feeling right now? I think most of them can be deduced from the context, but, just to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s more like this: if an error is considered fatal, then when it occurs, the associated subsystem will definitely be shut down. It’s up to the implementer to decide whether an error should be treated as fatal, and if so, which subsystem it should affect.
579ae19 to
7446ef0
Compare
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
| return Err(JDCError::LastNewPrevhashNotFound); | ||
| return Err(JDCError::fatal( | ||
| JDCErrorKind::LastNewPrevhashNotFound, | ||
| Impact::Downstream(downstream_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact::ChannelManager?
If we don't get any last_new_prev_hash, it means there's something wrong in the ChannelManager.
| let Some(last_future_template) = channel_manager_data.last_future_template.clone() else { | ||
| error!("No template to share"); | ||
| return Err(JDCError::FutureTemplateNotPresent); | ||
| return Err(JDCError::fatal(JDCErrorKind::FutureTemplateNotPresent, Impact::Downstream(downstream_id))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact::ChannelManager?
If we don't get any future template, it means there's something wrong in the ChannelManager.
| let Some(last_new_prev_hash) = channel_manager_data.last_new_prev_hash.clone() else { | ||
| error!("No prevhash in system"); | ||
| return Err(JDCError::LastNewPrevhashNotFound); | ||
| return Err(JDCError::fatal(JDCErrorKind::LastNewPrevhashNotFound, Impact::Downstream(downstream_id))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact::ChannelManager?
If we don't get any last_new_prev_hash, it means there's something wrong in the ChannelManager.
| let Some(prev_hash) = channel_manager_data.last_new_prev_hash.as_ref() else { | ||
| warn!("No prev_hash available yet, ignoring share"); | ||
| return Err(JDCError::LastNewPrevhashNotFound); | ||
| return Err(JDCError::fatal(JDCErrorKind::LastNewPrevhashNotFound, Impact::Downstream(downstream_id))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact::ChannelManager?
If we don't get any last_new_prev_hash, it means there's something wrong in the ChannelManager.
miner-apps/jd-client/src/lib/channel_manager/downstream_message_handler.rs
Outdated
Show resolved
Hide resolved
| AnyMessage::Extensions(new_require_extensions.into_static().into()).try_into()?; | ||
| AnyMessage::Extensions(new_require_extensions.into_static().into()) | ||
| .try_into() | ||
| .map_err(|error| JDCError::fatal(error, Impact::ChannelManager))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have fatal and Impact::ChannelManager here?
miner-apps/jd-client/src/lib/channel_manager/jd_message_handler.rs
Outdated
Show resolved
Hide resolved
| return Err(JDCError::not_fatal( | ||
| JDCErrorKind::LastDeclareJobNotFound(msg.request_id), | ||
| Impact::ChannelManager, | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| return Err(JDCError::not_fatal( | ||
| JDCErrorKind::LastNewPrevhashNotFound, | ||
| Impact::ChannelManager, | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| return Err(JDCError::not_fatal( | ||
| JDCErrorKind::FailedToCreateCustomJob, | ||
| Impact::ChannelManager, | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| let custom_job = custom_job.map_err(|_e| { | ||
| JDCError::not_fatal( | ||
| JDCErrorKind::FailedToCreateCustomJob, | ||
| Impact::ChannelManager, | ||
| ) | ||
| })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| return Ok(()); | ||
| } | ||
| Err(JDCError::TxDataError) | ||
| Err(JDCError::not_fatal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| let Some(token) = token else { | ||
| error!("Token not found, template id: {}", msg.template_id); | ||
| return Err(JDCError::TokenNotFound); | ||
| return Err(JDCError::not_fatal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| let Some(template_message) = template_message else { | ||
| error!("Template not found, template id: {}", msg.template_id); | ||
| return Err(JDCError::TemplateNotFound(msg.template_id)); | ||
| return Err(JDCError::not_fatal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not_fatal?
| .status_sender | ||
| .send(Status { | ||
| state: State::UpstreamShutdownFallback(JDCError::Shutdown), | ||
| state: State::UpstreamShutdownFallback(JDCErrorKind::Shutdown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why in this case we are not using the new contextual error system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed it thanks.
| None, | ||
| } | ||
|
|
||
| #[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some docs which explain the most common use cases for this values.
For example: when fatal is true and the impact is X, we do this, etc.
| pub struct JDCError { | ||
| kind: JDCErrorKind, | ||
| fatal: bool, | ||
| source: Impact, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still have source here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some old commit might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoot, I forgot to rename the internal type. Let me do that. Thanks.
| pub fn source(&self) -> Impact { | ||
| self.source | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to use impact, right?
| self.kind | ||
| } | ||
|
|
||
| pub fn is_fatal_for_source(&self, source: Impact) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same typo here
GitGab19
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just finished to look at JDC changes, and I left many comments.
My hot take is that this PR is introducing more granularity in the error handling for sure, but I still think this entire error management is quite convoluted and hard to reason about.
For sure I would add docs to explain how this system entirely works for the sake of maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just finished to review changes on tProxy, and left many comments on it.
While doing my review, I realized that the use cases for giving context can be resumed in two:
- take action on a specific component of our apps (fallback to another upstream, restart a component, disconnect a downstream, etc.)
- don't take action and just log the error
Given this, I think the current approach is not so easy to use (many times I would doubt on the correct Impact value to use, or fatal/not_fatal option to choose) and it should be re-designed a little bit to be more intuitive and clear for everyone.
Here some examples of what I think could be a more clever way to add context to the errors:
return Err(TproxyError::restart(Upstream/ChannelManager/etc, error)return Err(TproxyError::fallback(Upstream, error)return Err(TproxyError::disconnect(downstream_id, error)return Err(TproxyError::shutdown(tProxy, error)return Err(TproxyError::log(error)
| e | ||
| ); | ||
| TproxyError::ChannelErrorSender | ||
| TproxyError::fatal(TproxyErrorKind::ChannelErrorSender, Impact::Downstream(downstream_id.unwrap_or(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this unwrap_or(0)?
| .map_err(|e| { | ||
| error!("Down: Failed to send mining.notify to downstream: {:?}", e); | ||
| TproxyError::ChannelErrorSender | ||
| TproxyError::fatal(TproxyErrorKind::ChannelErrorSender, Impact::Downstream(downstream_id.unwrap_or(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this unwrap_or(0)?
| TproxyError::ChannelErrorSender | ||
| TproxyError::fatal( | ||
| TproxyErrorKind::ChannelErrorSender, | ||
| Impact::Downstream(downstream_id.unwrap_or(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this unwrap_or(0)?
| TproxyError::ChannelErrorSender | ||
| TproxyError::fatal( | ||
| TproxyErrorKind::ChannelErrorSender, | ||
| Impact::Downstream(downstream_id.unwrap_or(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this unwrap_or(0)?
| .increment_shares_since_last_update(); | ||
| } | ||
| }) | ||
| .map_err(|error| TproxyError::fatal(error, Impact::Sv1Server))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact::Downstream?
Don't we want to close the connection with the Downstream in this case?
| .recv() | ||
| .await | ||
| .map_err(TproxyError::ChannelErrorReceiver)?; | ||
| .map_err(|error| TproxyError::fatal(error, Impact::ChannelManager))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact:Upstream?
| let header = sv2_frame.get_header().ok_or_else(|| { | ||
| error!("SV2 frame missing header"); | ||
| TproxyError::FramingSv2(framing_sv2::Error::MissingHeader) | ||
| TproxyError::fatal(framing_sv2::Error::MissingHeader, Impact::ChannelManager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact:Upstream?
| return Err(TproxyError::fatal( | ||
| TproxyErrorKind::UnexpectedMessage(header.ext_type(), header.msg_type()), | ||
| Impact::Upstream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weren't we just ignoring this in the similar case on JDC?
| .recv() | ||
| .await | ||
| .map_err(TproxyError::ChannelErrorReceiver)?; | ||
| .map_err(|error| TproxyError::fatal(error, Impact::ChannelManager))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this Impact::Sv1Server?
| .status_sender | ||
| .send(Status { | ||
| state: State::UpstreamShutdown(TproxyError::Shutdown), | ||
| state: State::UpstreamShutdown(TproxyErrorKind::Shutdown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we using this new contextual error system here?
4c7cffd to
d3dc098
Compare
| pub trait CanLog {} | ||
| pub trait CanDisconnect {} | ||
| pub trait CanFallback {} | ||
| pub trait CanShutdown {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions about these traits:
- why are declaring them inside
jd-clientcrate and notstratum-apps? are we going to re-declare them over and over again them on every app? - why are the traits empty? shouldn't they have some methods?
- what's the rationale for enforcing these traits at compile time? why not just review the code and make sure we don't write anything bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yeah, we can declare these inside
stratum-apps. This commit was mainly about explaining the approach. I’ll move these marker traits intostratum-apps. -
These are marker traits, very similar to
SendorSync. They don’t define any methods, they just express a capability or property that the implementor follow. -
We already have different shutdown semantics across subsystems, some can fallback, some can disconnect, and some can trigger a full application shutdown. From the last dev sync, it was clear there’s still some confusion around subsystem lifecycles. With these traits, we can make those rules explicit in the type system. That way, even if someone writes incorrect code, the compiler will stop them. Relying purely on reviews makes it easy for subtle, unintended behavior to slip in, and this helps us avoid that class of errors altogether.
9ab00f2 to
1031be4
Compare
1031be4 to
4cd6f63
Compare
I closed earlier PR by mistake and removed all the commits: #86.