Skip to content

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Jan 16, 2026

Update kernel base to 6.18.5.

git log --oneline v6.18.4..v6.18.5 | wc
6 35 345

Summary by Sourcery

Update to Linux kernel 6.18.5 while refining scheduler newidle balancing heuristics, fixing NFS local I/O credential handling, and improving MPTCP fast close state tracking.

New Features:

  • Introduce randomized newidle balancing based on historical success rate in the scheduler.

Bug Fixes:

  • Fix NFS local read/write paths to override and revert file credentials for each iteration, avoiding stale cred usage.
  • Ensure MPTCP fastclose state is tracked explicitly on the MPTCP socket and used when disconnecting subflows.

Enhancements:

  • Adjust scheduler newidle cost tracking to maintain per-domain success statistics and more precise decay timing.
  • Initialize per-CPU scheduler random state and hook it into SMP scheduler initialization for use by new features.

Build:

  • Bump kernel sublevel from 6.18.4 to 6.18.5.

Paolo Abeni and others added 6 commits January 16, 2026 16:49
[ Upstream commit 86730ac255b0497a272704de9a1df559f5d6602e ]

After the blamed commit below, if the MPC subflow is already in TCP_CLOSE
status or has fallback to TCP at mptcp_disconnect() time,
mptcp_do_fastclose() skips setting the `send_fastclose flag` and the later
__mptcp_close_ssk() does not reset anymore the related subflow context.

Any later connection will be created with both the `request_mptcp` flag
and the msk-level fallback status off (it is unconditionally cleared at
MPTCP disconnect time), leading to a warning in subflow_data_ready():

  WARNING: CPU: 26 PID: 8996 at net/mptcp/subflow.c:1519 subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13))
  Modules linked in:
  CPU: 26 UID: 0 PID: 8996 Comm: syz.22.39 Not tainted 6.18.0-rc7-05427-g11fc074f6c36 #1 PREEMPT(voluntary)
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  RIP: 0010:subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13))
  Code: 90 0f 0b 90 90 e9 04 fe ff ff e8 b7 1e f5 fe 89 ee bf 07 00 00 00 e8 db 19 f5 fe 83 fd 07 0f 84 35 ff ff ff e8 9d 1e f5 fe 90 <0f> 0b 90 e9 27 ff ff ff e8 8f 1e f5 fe 4c 89 e7 48 89 de e8 14 09
  RSP: 0018:ffffc9002646fb30 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88813b218000 RCX: ffffffff825c8435
  RDX: ffff8881300b3580 RSI: ffffffff825c8443 RDI: 0000000000000005
  RBP: 000000000000000b R08: ffffffff825c8435 R09: 000000000000000b
  R10: 0000000000000005 R11: 0000000000000007 R12: ffff888131ac0000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f88330af6c0(0000) GS:ffff888a93dd2000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f88330aefe8 CR3: 000000010ff59000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   tcp_data_ready (net/ipv4/tcp_input.c:5356)
   tcp_data_queue (net/ipv4/tcp_input.c:5445)
   tcp_rcv_state_process (net/ipv4/tcp_input.c:7165)
   tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1955)
   __release_sock (include/net/sock.h:1158 (discriminator 6) net/core/sock.c:3180 (discriminator 6))
   release_sock (net/core/sock.c:3737)
   mptcp_sendmsg (net/mptcp/protocol.c:1763 net/mptcp/protocol.c:1857)
   inet_sendmsg (net/ipv4/af_inet.c:853 (discriminator 7))
   __sys_sendto (net/socket.c:727 (discriminator 15) net/socket.c:742 (discriminator 15) net/socket.c:2244 (discriminator 15))
   __x64_sys_sendto (net/socket.c:2247)
   do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  RIP: 0033:0x7f883326702d

Address the issue setting an explicit `fastclosing` flag at fastclose
time, and checking such flag after mptcp_do_fastclose().

Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-2-d1f9fd1c36c8@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f1a77dfc3b045c3dd5f6e64189b9f52b90399f07)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
commit e78e70dbf603c1425f15f32b455ca148c932f6c1 upstream.

Pull out the !sd check to simplify code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Chris Mason <clm@meta.com>
Link: https://patch.msgid.link/20251107161739.525916173@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c7ca7e0ff6f0f55ef57c1596286076492f199f9a)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
commit 08d473dd8718e4a4d698b1113a14a40ad64a909b upstream.

Simplify code by adding a few variables.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Chris Mason <clm@meta.com>
Link: https://patch.msgid.link/20251107161739.655208666@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d4ffb9ce8e6501bebbf833cd7ae3f34eab1f76ba)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
commit 33cf66d88306663d16e4759e9d24766b0aaa2e17 upstream.

Add a randomized algorithm that runs newidle balancing proportional to
its success rate.

This improves schbench significantly:

 6.18-rc4:			2.22 Mrps/s
 6.18-rc4+revert:		2.04 Mrps/s
 6.18-rc4+revert+random:	2.18 Mrps/S

Conversely, per Adam Li this affects SpecJBB slightly, reducing it by 1%:

 6.17:			-6%
 6.17+revert:		 0%
 6.17+revert+random:	-1%

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Chris Mason <clm@meta.com>
Link: https://lkml.kernel.org/r/6825c50d-7fa7-45d8-9b81-c6e7e25738e2@meta.com
Link: https://patch.msgid.link/20251107161739.770122091@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 98a26893fad4180d8ea210d8749392790dfddc81)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
commit 3af870aedbff10bfed220e280b57a405e972229f upstream.

Commit f2060bd ("nfs/localio: add refcounting for each iocb IO
associated with NFS pgio header") inadvertantly reintroduced the same
potential for __put_cred() triggering BUG_ON(cred == current->cred) that
commit 992203a ("nfs/localio: restore creds before releasing pageio
data") fixed.

Fix this by saving and restoring the cred around each {read,write}_iter
call within the respective for loop of nfs_local_call_{read,write} using
scoped_with_creds().

NOTE: this fix started by first reverting the following commits:

 94afb627dfc2 ("nfs: use credential guards in nfs_local_call_read()")
 bff3c841f7bd ("nfs: use credential guards in nfs_local_call_write()")
 1d18101a644e ("Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")

followed by narrowly fixing the cred lifetime issue by using
scoped_with_creds().  In doing so, this commit's changes appear more
extensive than they really are (as evidenced by comparing to v6.18's
fs/nfs/localio.c).

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-next/20251205111942.4150b06f@canb.auug.org.au/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7a28d65e4beb7627738b75c6a23f36ae54470f93)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Link: https://lore.kernel.org/r/20260109111950.344681501@linuxfoundation.org
Tested-by: Ronald Warsow <rwarsow@gmx.de>
Tested-by: Slade Watkins <sr@sladewatkins.com>
Tested-by: Achill Gilgenast <achill@achill.org>=
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Brett Mastbergen <bmastbergen@ciq.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Jeffrin Jose T <jeffrin@rajagiritech.edu.in>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dc554c8fb361f13580da3f5a98ad8b494a788666)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
@sourcery-ai
Copy link

sourcery-ai bot commented Jan 16, 2026

Reviewer's Guide

Updates the kernel base from 6.18.4 to 6.18.5, pulling in upstream scheduler randomness-based newidle balancing, fixing NFS local I/O credential scoping, and refining MPTCP fastclose handling and state tracking.

Sequence diagram for scheduler newidle balancing with NI_RANDOM

sequenceDiagram
    participant CPU
    participant this_rq as rq_this
    participant sd as sched_domain
    participant rng as sched_rng

    CPU->>this_rq: sched_balance_newidle(rq_this, rf)
    this_rq->>sd: rcu_dereference_check_sched_domain(this_rq->sd)
    alt no_sched_domain
        this_rq-->>CPU: return (no balancing)
    else has_sched_domain
        this_rq->>sd: check get_rd_overloaded && avg_idle < max_newidle_lb_cost
        alt not_overloaded_or_too_short_idle
            this_rq->>sd: update_next_balance(sd, next_balance)
            this_rq-->>CPU: return
        else overloaded_and_idle_long_enough
            loop over_domains
                this_rq->>sd: check SD_BALANCE_NEWIDLE
                alt SD_BALANCE_NEWIDLE_set
                    sd->>sd: weight = 1
                    alt NI_RANDOM_enabled
                        this_rq->>rng: sched_rng()
                        rng-->>this_rq: random_u32
                        this_rq->>sd: d1k = random_u32 % 1024
                        this_rq->>sd: weight = 1 + sd.newidle_ratio
                        alt d1k > weight
                            this_rq->>sd: update_newidle_stats(sd, success=0)
                            this_rq-->>this_rq: continue (skip balance)
                        else d1k <= weight
                            this_rq->>sd: weight = (1024 + weight/2) / weight
                        end
                    end
                    this_rq->>this_rq: t0 = sched_clock_cpu(this_cpu)
                    this_rq->>this_rq: pulled_task = sched_balance_rq(...)
                    this_rq->>this_rq: t1 = sched_clock_cpu(this_cpu)
                    this_rq->>sd: domain_cost = t1 - t0
                    this_rq->>sd: update_newidle_cost(sd, domain_cost, weight * !!pulled_task)
                end
            end
            this_rq-->>CPU: return
        end
    end
Loading

Sequence diagram for NFS local I/O per-iteration credential override

sequenceDiagram
    participant Worker as nfs_local_call_read
    participant Filp as struct_file
    participant Ops as file_operations

    loop for_each_iter_i
        Worker->>Worker: configure iocb->kiocb.ki_flags (DIRECT or not)
        Worker->>Worker: save_cred = override_creds(Filp.f_cred)
        Worker->>Ops: read_iter(&iocb->kiocb, &iocb->iters[i])
        Ops-->>Worker: status
        Worker->>Worker: revert_creds(save_cred)
        alt status != -EIOCBQUEUED
            Worker->>Worker: handle partial read or errors
        end
    end
Loading

Sequence diagram for MPTCP fastclose and subflow disconnect

sequenceDiagram
    participant App as Application
    participant Msk as mptcp_sock
    participant MPTCP as mptcp_core
    participant Ssk as subflow_sock

    App->>MPTCP: mptcp_do_fastclose(sk)
    MPTCP->>Msk: mptcp_set_state(sk, TCP_CLOSE)
    MPTCP->>Msk: Msk.fastclosing = 1
    MPTCP-->>App: return

    App->>MPTCP: __mptcp_close_ssk(sk, ssk, flags)
    MPTCP->>MPTCP: need_push = compute_push(flags)
    alt !dispose_it
        MPTCP->>MPTCP: __mptcp_retransmit_pending_data(sk)
        MPTCP->>MPTCP: __mptcp_subflow_disconnect(ssk, subflow, Msk.fastclosing)
        MPTCP-->>App: return
    else dispose_it
        MPTCP->>MPTCP: close subflow and cleanup
        MPTCP-->>App: return
    end
Loading

Class diagram for updated scheduler domain and RNG state

classDiagram
    class sched_domain {
        +unsigned int newidle_call
        +unsigned int newidle_success
        +unsigned int newidle_ratio
        +u64 max_newidle_lb_cost
        +unsigned long last_decay_max_lb_cost
        +unsigned long last_balance
        +unsigned int balance_interval
        +unsigned int nr_balance_failed
    }

    class rnd_state {
        <<per_cpu>>
    }

    class rq {
        <<per_cpu_shared_aligned>>
    }

    class sched_features {
        +bool WA_BIAS
        +bool UTIL_EST
        +bool LATENCY_WARN
        +bool NI_RANDOM
    }

    class sched_core_helpers {
        +bool update_newidle_cost(sched_domain sd, u64 cost, unsigned int success)
        +void update_newidle_stats(sched_domain sd, unsigned int success)
        +u32 sched_rng()
        +void sched_init_smp()
    }

    sched_domain --> sched_core_helpers : used_by
    rnd_state --> sched_core_helpers : provides_state_for
    rq --> sched_core_helpers : used_by
    sched_features --> sched_core_helpers : controls_behavior
Loading

Class diagram for updated MPTCP socket state

classDiagram
    class mptcp_sock {
        +unsigned int fastopening : 1
        +unsigned int in_accept_queue : 1
        +unsigned int free_first : 1
        +unsigned int rcvspace_init : 1
        +unsigned int fastclosing : 1
        +u32 notsent_lowat
        +int keepalive_cnt
        +int keepalive_idle
    }

    class mptcp_protocol_helpers {
        +void mptcp_do_fastclose(sock sk)
        +int mptcp_disconnect(sock sk, int flags)
        +void __mptcp_close_ssk(sock sk, sock ssk, int flags)
        +void __mptcp_subflow_disconnect(sock ssk, mptcp_subflow_context subflow, bool fastclosing)
    }

    mptcp_sock <-- mptcp_protocol_helpers : operates_on
Loading

File-Level Changes

Change Details Files
Introduce randomized, success-rate-aware newidle load-balancing in the scheduler and track per-domain newidle statistics.
  • Add per-sched-domain counters for newidle calls, successes, and a rolling success ratio and initialize them with a default 50% success rate
  • Refactor newidle cost update logic to also maintain newidle stats and to use cached jiffies and precomputed decay deadlines
  • Gate newidle balance runs on a new NI_RANDOM scheduler feature that uses a per-CPU PRNG and the domain success ratio to probabilistically skip expensive, low-yield newidle balancing
  • Add a per-CPU random state for the scheduler, initialize it at SMP sched init, and expose a sched_rng helper
kernel/sched/fair.c
kernel/sched/sched.h
kernel/sched/topology.c
kernel/sched/features.h
include/linux/sched/topology.h
kernel/sched/core.c
Fix NFS local I/O to apply file credentials per-iteration instead of once, ensuring correct creds for multi-iter read/write operations.
  • Move override_creds calls from outside the iteration loops to inside, wrapping each read_iter/write_iter call individually
  • Ensure revert_creds is called after each per-iteration I/O submission and remove the final single revert_creds calls after the loops
fs/nfs/localio.c
Adjust MPTCP fastclose handling so that subflow disconnect uses an explicit fastclosing state flag instead of relying on transient flags.
  • Extend mptcp_sock with a fastclosing bit and set it when entering fastclose, clearing it on disconnect reset
  • Change subflow disconnect helper to take a boolean fastclosing parameter and use it to drive fast-close behavior
  • Update callers to use msk->fastclosing when disconnecting subflows instead of passing generic flags
net/mptcp/protocol.c
net/mptcp/protocol.h
Bump kernel sublevel from 6.18.4 to 6.18.5.
  • Update SUBLEVEL in the top-level Makefile from 4 to 5
Makefile

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new per-CPU random state initialization in sched_init_smp() uses prandom_init_once(&sched_rnd_state), but prandom_init_once is designed for a single struct rnd_state, not a __percpu variable; consider explicitly seeding each CPU’s sched_rnd_state instance instead of passing the percpu symbol address.
  • In mptcp, __mptcp_subflow_disconnect() no longer uses the 'flags' argument and only consumes the 'fastclosing' boolean, so you might simplify the signature and call sites further by dropping the unused 'subflow' argument or clarifying why it must be retained.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new per-CPU random state initialization in sched_init_smp() uses prandom_init_once(&sched_rnd_state), but prandom_init_once is designed for a single struct rnd_state, not a __percpu variable; consider explicitly seeding each CPU’s sched_rnd_state instance instead of passing the percpu symbol address.
- In mptcp, __mptcp_subflow_disconnect() no longer uses the 'flags' argument and only consumes the 'fastclosing' boolean, so you might simplify the signature and call sites further by dropping the unused 'subflow' argument or clarifying why it must be retained.

## Individual Comments

### Comment 1
<location> `kernel/sched/core.c:124-127` </location>
<code_context>
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);

 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+DEFINE_PER_CPU(struct rnd_state, sched_rnd_state);

 #ifdef CONFIG_SCHED_PROXY_EXEC
</code_context>

<issue_to_address>
**issue (bug_risk):** Per-CPU rnd_state is only initialized once and likely only for a single CPU instance.

`DEFINE_PER_CPU(struct rnd_state, sched_rnd_state);` combined with a single `prandom_init_once(&sched_rnd_state);` in `sched_init_smp()` initializes only one per-CPU instance (effectively CPU0). Other CPUs will call `prandom_u32_state(this_cpu_ptr(&sched_rnd_state))` with an uninitialized `rnd_state`, leading to undefined behaviour and poor randomness on non-boot CPUs.

Please ensure each CPU’s `sched_rnd_state` is explicitly initialized (e.g. iterate with `for_each_possible_cpu()` and use `prandom_seed()` / `prandom_u32_state_init()`, or give each CPU its own `prandom_init_once` guard) rather than initializing the per-CPU symbol only once globally.
</issue_to_address>

### Comment 2
<location> `kernel/sched/fair.c:12125` </location>
<code_context>
 }

-static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
+static inline void update_newidle_stats(struct sched_domain *sd, unsigned int success)
 {
+	sd->newidle_call++;
</code_context>

<issue_to_address>
**issue (complexity):** Consider separating newidle cost tracking from NI_RANDOM statistics and encapsulating the probability logic in a helper so the code’s responsibilities and control flow are more explicit and easier to follow.

You can reduce the added complexity without changing behavior by:

1. **Decoupling cost/decay from NI_RANDOM stats.**
2. **Encapsulating the NI_RANDOM probability logic into a helper.**
3. **Giving `weight` a single clear meaning at the call site.**

### 1. Split stats from cost tracking

Right now `update_newidle_cost()` both updates stats and cost/decay. You can keep its original responsibility and move NI_RANDOM stats to a separate helper:

```c
static inline void update_newidle_stats(struct sched_domain *sd,
                                        unsigned int success)
{
	sd->newidle_call++;
	sd->newidle_success += success;

	if (sd->newidle_call >= 1024) {
		sd->newidle_ratio   = sd->newidle_success;
		sd->newidle_call   /= 2;
		sd->newidle_success /= 2;
	}
}

static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
{
	unsigned long next_decay = sd->last_decay_max_lb_cost + HZ;
	unsigned long now        = jiffies;

	if (cost > sd->max_newidle_lb_cost) {
		sd->max_newidle_lb_cost   = cost;
		sd->last_decay_max_lb_cost = now;
	} else if (time_after(now, next_decay)) {
		sd->max_newidle_lb_cost    =
			(sd->max_newidle_lb_cost * 253) / 256;
		sd->last_decay_max_lb_cost = now;
		return true;
	}

	return false;
}
```

Call sites then become explicit about *why* stats are updated:

```c
/* decay-only path */
need_decay = update_newidle_cost(sd, 0);

/* newidle balance path */
domain_cost = t1 - t0;
curr_cost  += domain_cost;
t0          = t1;

update_newidle_stats(sd, success_weight * !!pulled_task);
update_newidle_cost(sd, domain_cost);
```

This removes the hidden “if (cost) update_newidle_stats(..)” coupling and makes both responsibilities straightforward.

### 2. Encapsulate NI_RANDOM dice + weight logic

The inlined NI_RANDOM block both decides whether to run and computes the success multiplier, while mutating `weight` in two roles. You can hide that behind a helper with a clear contract:

```c
static inline bool newidle_should_run(struct sched_domain *sd,
				      unsigned int *success_weight)
{
	*success_weight = 1;

	if (!sched_feat(NI_RANDOM))
		return true;

	/*
	 * Throw a 1k sided dice; and only run newidle_balance according
	 * to the observed success rate.
	 */
	u32 d1k         = sched_rng() % 1024;
	unsigned int w  = 1 + sd->newidle_ratio;

	if (d1k > w) {
		update_newidle_stats(sd, 0);
		return false; /* skip balance */
	}

	*success_weight = (1024 + w / 2) / w;
	return true; /* run balance */
}
```

Then the main loop becomes linear and `weight` has a single meaning (“success multiplier”):

```c
for_each_domain(this_cpu, sd) {
	u64 domain_cost;
	unsigned int success_weight = 1;

	update_next_balance(sd, &next_balance);
	if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
		break;

	if (sd->flags & SD_BALANCE_NEWIDLE) {
		if (!newidle_should_run(sd, &success_weight))
			continue;

		pulled_task = sched_balance_rq(this_cpu, this_rq, sd,
					       CPU_NEWLY_IDLE,
					       &continue_balancing);

		t1          = sched_clock_cpu(this_cpu);
		domain_cost = t1 - t0;
		curr_cost  += domain_cost;
		t0          = t1;

		update_newidle_stats(sd, success_weight * !!pulled_task);
		update_newidle_cost(sd, domain_cost);
	}

	if (pulled_task || !continue_balancing)
		break;
}
```

This preserves all existing behavior (same dice, same `newidle_ratio` usage, same success scaling) while:

- Removing mixed responsibilities from `update_newidle_cost`.
- Making the NI_RANDOM behavior self-contained and easier to reason about.
- Avoiding `weight` being reused in two unrelated roles.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Linux kernel from version 6.18.4 to 6.18.5, incorporating 6 upstream commits that address scheduler optimizations, NFS credential handling, and MPTCP protocol state tracking.

Changes:

  • Introduces randomized newidle balancing in the scheduler based on historical success rates to optimize load balancing decisions
  • Fixes NFS local I/O to properly scope credential overrides per iteration rather than across the entire loop
  • Adds explicit fastclose state tracking in MPTCP to fix subflow disconnect behavior

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Makefile Bumps kernel SUBLEVEL from 4 to 5
include/linux/sched/topology.h Adds newidle balancing statistics fields to sched_domain
kernel/sched/sched.h Introduces per-CPU RNG state and helper for scheduler randomization
kernel/sched/core.c Initializes per-CPU RNG state during SMP initialization
kernel/sched/features.h Adds NI_RANDOM feature flag for randomized newidle balancing
kernel/sched/topology.c Initializes newidle statistics to 50% success rate baseline
kernel/sched/fair.c Implements randomized newidle balancing logic and statistics tracking
net/mptcp/protocol.h Adds fastclosing bit field to mptcp_sock structure
net/mptcp/protocol.c Updates fastclose tracking and subflow disconnect logic
fs/nfs/localio.c Moves credential override/revert inside iteration loops

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* rate.
*/
u32 d1k = sched_rng() % 1024;
weight = 1 + sd->newidle_ratio;
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weight calculation adds 1 to newidle_ratio which can result in values up to 1025 (1 + 1024 max). However, the comparison on line 12848 checks if (d1k > weight), where d1k is sched_rng() % 1024 (0-1023). This means when newidle_ratio is at its maximum (1024), weight becomes 1025, making the condition always true since d1k can never exceed 1024. This appears inconsistent with the intended probabilistic behavior. The logic should likely be if (d1k >= weight) to properly handle the boundary case, or the weight calculation needs adjustment.

Suggested change
weight = 1 + sd->newidle_ratio;
weight = 1 + sd->newidle_ratio;
if (weight > 1023)
weight = 1023;

Copilot uses AI. Check for mistakes.
.last_balance = jiffies,
.balance_interval = sd_weight,

/* 50% success rate */
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization values represent a 50% success rate as noted in the comment, but the relationship between these values isn't immediately clear. Consider adding a comment explaining that newidle_ratio = (newidle_success * 1024) / newidle_call, which equals 512 for the 50% baseline, to help future maintainers understand the invariant.

Suggested change
/* 50% success rate */
/* 50% success rate:
* newidle_ratio = (newidle_success * 1024) / newidle_call
* (256 * 1024) / 512 = 512
*/

Copilot uses AI. Check for mistakes.
if (d1k > weight) {
update_newidle_stats(sd, 0);
continue;
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weight calculation implements rounding division but the purpose of the weight transformation is unclear. Consider adding a comment explaining that this converts the success probability into a scaling factor for the statistics update, helping maintainers understand the randomization algorithm.

Suggested change
}
}
/*
* Convert the (1..1024) success probability into a
* scaling factor for the statistics update. We ran
* newidle_balance with probability weight/1024, so
* use ~1024/weight (rounded) to keep the expected
* newidle cost comparable to the non-randomized case.
*/

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants