-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Introducing eharbor: A Load-Shedding and Request Deduplication #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
eric0fw
wants to merge
1
commit into
main
Choose a base branch
from
eharbor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| | ||
| --- | ||
| layout: post | ||
| title: "eharbor: The Advanced Load-Shedding and Request Deduplication Solution" | ||
| date: 2023-02-28 01:23:45 +0500 | ||
| categories: "Load Shedding" "Request Deduplication" "Distributed Systems" "Server Optimization" "Scalability" "High Availability" "Performance Optimization" "System Architecture" "Backend Development" | ||
| --- | ||
|
|
||
|  | ||
| # `eharbor`: The Advanced Load-Shedding and Request Deduplication Solution | ||
|
|
||
| With the rise of the cloud and the ever-increasing popularity of online content, server systems are often subjected to sudden spikes in traffic. Sudden spikes in traffic can occur with the release of viral content or other unexpected events that result in a high volume of requests in a short time. This sudden influx of traffic can easily overload server systems, leading to decreased performance, reduced reliability, and even system failure. To address this problem, many organizations have turned to overprovisioning resources and using rate-limiting and load-shedding systems to manage and regulate incoming requests' flow effectively. | ||
|
|
||
| However, more than simply managing the flow of requests is required. It is also essential to ensure that requests are not duplicated, which can lead to wasted resources and additional strain on the server; this is where `eharbor` comes in. | ||
|
|
||
| `eharbor` is a powerful solution combining the benefits of load-shedding with request deduplication. This results in an efficient, reliable, and scalable system that can handle even the most demanding workloads. | ||
|
|
||
| ## How `eharbor` Works | ||
|
|
||
|  | ||
|
|
||
| ### Deduplication | ||
|
|
||
| The innovative solution offered by `eharbor` is rooted in its approach to request deduplication. `eharbor` assigns each unique incoming request to a designated conductor process, which is tasked with processing the request and returning the result to the client. Upon receipt of a new request, `eharbor` verifies whether a conductor is already processing a similar or identical request. In such cases, the new request is assigned as a follower to the existing conductor. The follower awaits the result from the conductor, as the request is already underway, so there is no need to repeat it. This process dramatically improves the speed of processing viral content. In a distributed setup, further deduplication can be achieved, which will be covered in greater detail in this blog post. The entire process of assigning roles and identifying duplicated queries is left to the coordinator process. | ||
|
|
||
| ### Load-Shedding | ||
|
|
||
| `eharbor`'s load-shedding capabilities are implemented through its use of a `pobox`, which acts as a buffer between incoming requests and whatever system the conductor is communicating with (probably a Database or an API). The `pobox` ensures that the number of incoming requests does not exceed the defined capacity of the system, allowing the resources of the server to be used efficiently and effectively. | ||
|
|
||
| In instances where the volume of incoming requests exceeds the predetermined capacity, the `pobox` initiates a load-shedding process, which initially entails delaying and, in extreme circumstances, discarding requests once its buffer reaches its maximum capacity. This approach safeguards against overloading the system, guaranteeing its responsiveness and reliability while preserving the effective utilization of its resources. | ||
|
|
||
| ## Illustrating the Benefits of `eharbor` | ||
|
|
||
| #### System Without `eharbor` | ||
|  | ||
| In the present scenario, we observe that Endpoint A is encountering a spike in traffic, which is overwhelming the system's ability to process requests on time. The database cannot cope with the volume of requests, leading to timeouts and increased waiting processes. The accumulation of waiting processes contributes to a rapid increase in system memory utilization, which can result in an Out-of-Memory (OOM) error. | ||
|
|
||
| Moreover, the system saturation by Endpoint A has a cascading effect on Endpoint B, which cannot perform queries to the database. This cascade in Endpoint B exacerbates the memory growth issue, further deteriorating system performance. | ||
|
|
||
| It is imperative to note that this scenario does not consider the utilization of error kernels, which are essential in ensuring optimal system performance during periods of overload. The absence of error kernels results in poor system performance and, in many cases, complete system failure. | ||
|
|
||
| As a means to enhance the current scenario, one could consider creating a separate connection pool to prevent Endpoint A from hindering the operations of Endpoint B. However, it is crucial to note that the sheer magnitude of endpoints, along with the numerous servers within a cluster, can lead to an overwhelming number of connections for the database, leading to its eventual degradation in performance. While implementing a middleware, such as `pgpool`, can offer some relief, it necessitates the deployment of additional servers and a heightened focus on DevOps efforts. Fortunately, `eharbor` presents a compelling alternative to this challenge. | ||
|
|
||
| #### System With `eharbor` as a Bulkhead | ||
|  | ||
| With the implementation of `eharbor` as a bulkhead, the system can effectively manage the flow of incoming requests, limiting the utilization of the connection pool and allowing for other connections to complete efficiently. In a traffic spike, the internal `pobox` buffer within `eharbor` serves as a buffer, preventing excessive memory growth. If the buffer reaches capacity, the `pobox` immediately rejects traffic, avoiding the possibility of timeouts and consequent memory expansion. In this manner, `eharbor` provides a reliable and efficient solution to handle traffic spikes and ensure system stability. | ||
|
|
||
| Furthermore, when deduplication is turned on, `eharbor` can reduce viral traffic spikes and result in much fewer queries for the database. | ||
|
|
||
| ## Distributed Deduplication | ||
|  | ||
|
|
||
| `eharbor` does not inherently include code for constructing a distributed system, but the process of building one with `eharbor` is straightforward. You start by creating two coordinators, referred to as stage 1 and stage 2. Stage 1 serves as the distributed `eharbor` and implements the [`erpc:call/5`](https://www.erlang.org/doc/man/erpc.html#call-5) using a consistent hashing function, such as the one provided by [libring](https://hex.pm/packages/libring). To ensure nodes can find each other, a solution such as [libcluster](https://hex.pm/packages/libcluster) can be employed. To control the deployment process, a feature flag or a DNS TXT record is used. This value allows for a gradual rollout of the distributed system to avoid errors during deployment. | ||
|
|
||
| The nodes must have both `eharbor` and stage 2 installed to function properly. A DNS TXT record is polled to achieve this, and its value is used to update the [persistent_term](https://www.erlang.org/doc/man/persistent_term.html). This value is then converted to a percentage, which determines which nodes will receive the [`erpc:call/5`](https://www.erlang.org/doc/man/erpc.html#call-5) calls and which will process requests locally. At the start, the percentage is set to `0`, effectively implementing a canary and preventing any [`erpc:call/5`](https://www.erlang.org/doc/man/erpc.html#call-5) calls from being made. Once all nodes contain the code for both stages, the percentage can gradually increase to 100%, with performance monitoring for unexpected behavior. | ||
|
|
||
| Upon completion, all unique requests will be fully deduplicated across the entire cluster, significantly reducing the load on the database. If an `erpc:call/5` call fails or times out, the operation can be redone locally, and processing can continue. | ||
|
|
||
| ## Code Samples | ||
|
|
||
| Here is an example of how to use `eharbor`. | ||
|
|
||
| Suppose I have some module `some_module` with a `very_slow/1` function that is causing overload and could benefit from deduplication or load-shedding; this function takes a single parameter which is a unique ID (`timer:sleep/1` is used here to simulate a slow function): | ||
| ```erlang | ||
| -module(some_module). | ||
| -export([very_slow/1]). | ||
|
|
||
| -spec very_slow(UniqueId :: term()) -> ok. | ||
| very_slow(_UniqueId) -> | ||
| timer:sleep(200). | ||
| ``` | ||
|
|
||
| We could rewrite it with `eharbor` like this to achive a better performance: | ||
|
|
||
| ```erlang | ||
| -module(some_module). | ||
| -export([very_slow/1]). | ||
|
|
||
| %% Note that now we can also signal we are at capacity by returning {error, full} | ||
| -spec very_slow(UniqueId :: term()) ok | {error,full}. | ||
| very_slow(UniqueId) -> | ||
| eharbor:run( | ||
| fun(UID) -> | ||
| timer:sleep(100) | ||
| end, | ||
| [UniqueId], | ||
| some_module_very_slow1 | ||
| ). | ||
| ``` | ||
|
|
||
| Here are the default parameters of `eharbor`: | ||
| ```erlang | ||
| #{ | ||
| assign_role_timeout => infinity, | ||
| backlog => 1000, | ||
| breakwater_limit => 2000, | ||
| buffer_insert_timeout => 5000, | ||
| conductor_wait_for_coordinator_followers_timeout => infinity, | ||
| dedup => true, | ||
| error_type => value, | ||
| error_value => {error,full}, | ||
| follower_wait_for_conductor_timeout => infinity, | ||
| group_by_key_fun => fun(X) -> X end, | ||
| name => default, | ||
| ordered => true, | ||
| piers => 100 | ||
| } | ||
| ``` | ||
|
|
||
| For `eharbor` to function correctly, it must have an appropriate configuration `app.config`: | ||
|
|
||
| ```erlang | ||
| [ | ||
| {eharbor, [ | ||
| {coordinators, [ | ||
| %% Don't care about causal ordering between the receiver and sender. | ||
| #{name => some_module_very_slow1, ordered => false} | ||
| ]} | ||
| ]} | ||
| ]. | ||
|
|
||
| ``` | ||
|
|
||
| Or in Elixir `config.exs`: | ||
| ```elixir | ||
| config :eharbor, :coordinators, [ | ||
| # Don't care about causal ordering between the receiver and sender. | ||
| %{name: :some_module_very_slow1, ordered: false} | ||
| ] | ||
| ``` | ||
| That's it! | ||
|
|
||
| ## Conclusion | ||
|
|
||
| `eharbor` is a new solution for request deduplication and load-shedding. Combining the benefits of these two critical technologies, `eharbor` provides organizations with a robust and scalable system that can handle even the most demanding of workloads. Whether you are looking to improve the performance of your existing server systems or to build new, high-performance applications, `eharbor` is the solution for you. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing blog markdown metadata