Skip to content

Comments

RDKB-63415 : Move SelfHeal Scripts to Cron#91

Open
aj970-crypto wants to merge 1 commit intodevelopfrom
topic/RDKB-63415
Open

RDKB-63415 : Move SelfHeal Scripts to Cron#91
aj970-crypto wants to merge 1 commit intodevelopfrom
topic/RDKB-63415

Conversation

@aj970-crypto
Copy link

RDKB-63415 : Move SelfHeal Scripts to Cron

Reason for change: Running the Selfheal scripts as cron jobs instead of background processes when the cron is enabled and running as normal background process when the cron is disabled.
Test Procedure: Check scripts are running in crontab when the cron is enabled.
Risks: Low
Priority: P0

Copilot AI review requested due to automatic review settings February 18, 2026 13:27
@aj970-crypto aj970-crypto requested review from a team as code owners February 18, 2026 13:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the SelfHeal scripts execution mechanism to use cron jobs instead of background processes. The change introduces new functions to start/stop scripts, manage cron entries, and toggle between cron-based and legacy execution modes based on configuration.

Changes:

  • Added utility functions for command execution, script lifecycle management, and cron configuration
  • Introduced manage_self_heal_cron_state function to dynamically switch between cron-based and legacy script execution
  • Added DHCP-related fields to DHCP_MGR_IPV4_MSG structure (unrelated to main PR purpose)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 16 comments.

File Description
source/util_api/ccsp_msg_bus/ccsp_base_api.c Implements core functions for managing self-heal scripts via cron or background processes
source/ccsp/include/ccsp_base_api.h Declares new function prototypes for self-heal script management
source/ccsp/components/include/ipc_msg.h Adds DHCP option fields (unrelated change)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CcspTraceWarning(("%s: %s is not running\n", __FUNCTION__, SELF_HEAL_SCRIPTS[i]));
} else {
CcspTraceWarning(("%s: Stopping %s \n", __FUNCTION__, SELF_HEAL_SCRIPTS[i]));
v_secure_system("kill -9 %s", buf);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using kill -9 with PIDs obtained from pidof is a potential security risk. The PID in 'buf' comes from shell command output without validation, and if 'pidof' returns multiple PIDs (space-separated), only the first will be killed. Additionally, if a PID is reused between the pidof call and the kill command, an unintended process could be terminated. Consider validating the PID format (digits only) and handling multiple PIDs properly, or use pkill/killall with proper signal handling instead.

Copilot uses AI. Check for mistakes.
Comment on lines +3821 to +3823
{"self_heal_connectivity_test.sh", "ConnTest_PingInterval", 60},
{"resource_monitor.sh", "resource_monitor_interval", 15},
{"selfheal_aggressive.sh", "AggressiveInterval", 5}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of scripts in the manage_self_heal_cron_state arrays (lines 3820-3823) differs from the order in the SELF_HEAL_SCRIPTS global array (lines 76-80). While not strictly a bug, this inconsistency could lead to confusion during maintenance. Consider using the same order in both locations for clarity.

Suggested change
{"self_heal_connectivity_test.sh", "ConnTest_PingInterval", 60},
{"resource_monitor.sh", "resource_monitor_interval", 15},
{"selfheal_aggressive.sh", "AggressiveInterval", 5}
{"selfheal_aggressive.sh", "AggressiveInterval", 5},
{"self_heal_connectivity_test.sh", "ConnTest_PingInterval", 60},
{"resource_monitor.sh", "resource_monitor_interval", 15}

Copilot uses AI. Check for mistakes.
Comment on lines 3798 to 3801
if (fgets(buf, sizeof(buf), fp) != NULL && buf[0] != '\0' && buf[0] != '\n') {
result = (unsigned long)atol(buf);
}
pclose(fp);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation detected: lines 3798-3801 use tabs while surrounding code uses spaces. This should follow the consistent indentation style used in the rest of the file.

Suggested change
if (fgets(buf, sizeof(buf), fp) != NULL && buf[0] != '\0' && buf[0] != '\n') {
result = (unsigned long)atol(buf);
}
pclose(fp);
if (fgets(buf, sizeof(buf), fp) != NULL && buf[0] != '\0' && buf[0] != '\n') {
result = (unsigned long)atol(buf);
}
pclose(fp);

Copilot uses AI. Check for mistakes.
Comment on lines +3798 to +3799
if (fgets(buf, sizeof(buf), fp) != NULL && buf[0] != '\0' && buf[0] != '\n') {
result = (unsigned long)atol(buf);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_interval function uses atol() without validating that the syscfg output is a valid number. If syscfg returns non-numeric data or an error message, atol() will return 0, which might not be a valid interval. Consider adding validation to check if the buffer contains only digits (and optional whitespace) before calling atol(), and log a warning if invalid data is detected.

Copilot uses AI. Check for mistakes.
v_secure_system("/usr/ccsp/tad/%s &", SELF_HEAL_SCRIPTS[i]);
}
}
/* Stop all selfeal scripts */
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains a spelling error: "selfeal" should be "selfheal" (missing 'h').

Suggested change
/* Stop all selfeal scripts */
/* Stop all selfheal scripts */

Copilot uses AI. Check for mistakes.
Comment on lines +3809 to +3814
// We always remove the old entry to avoid duplicates
v_secure_system("crontab -l 2>/dev/null | sed '/%s/d' | crontab -", script);
// If should_run is true, we append the new timing
if (should_run) {
v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_cron_entry function performs crontab operations without any error checking. If crontab operations fail (e.g., due to permissions, corrupted crontab, or cron daemon not running), the function silently continues. Consider checking the return value of v_secure_system or implementing some form of error handling to detect and log failures.

Suggested change
// We always remove the old entry to avoid duplicates
v_secure_system("crontab -l 2>/dev/null | sed '/%s/d' | crontab -", script);
// If should_run is true, we append the new timing
if (should_run) {
v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
int ret;
/* We always remove the old entry to avoid duplicates */
ret = v_secure_system("crontab -l 2>/dev/null | sed '/%s/d' | crontab -", script);
if (ret != 0) {
CcspTraceError(("update_cron_entry: failed to remove cron entry for script '%s', ret=%d\n",
script, ret));
}
/* If should_run is true, we append the new timing */
if (should_run) {
ret = v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
if (ret != 0) {
CcspTraceError(("update_cron_entry: failed to add cron entry for script '%s' "
"(interval=%lu), ret=%d\n",
script, interval, ret));
}

Copilot uses AI. Check for mistakes.
CcspTraceWarning(("%s: %s is not running\n", __FUNCTION__, SELF_HEAL_SCRIPTS[i]));
} else {
CcspTraceWarning(("%s: Stopping %s \n", __FUNCTION__, SELF_HEAL_SCRIPTS[i]));
v_secure_system("kill -9 %s", buf);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using SIGKILL (kill -9) to terminate scripts is overly aggressive and doesn't allow for graceful shutdown. Scripts cannot catch SIGKILL to clean up resources, close files, or remove lock files. Consider using SIGTERM first and only escalating to SIGKILL if the process doesn't terminate after a timeout period.

Suggested change
v_secure_system("kill -9 %s", buf);
v_secure_system("kill %s; sleep 5; pidof %s >/dev/null 2>&1 && kill -9 $(pidof %s)", buf, SELF_HEAL_SCRIPTS[i], SELF_HEAL_SCRIPTS[i]);

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +80
char *SELF_HEAL_SCRIPTS[] = {
"selfheal_aggressive.sh",
"self_heal_connectivity_test.sh",
"resource_monitor.sh"
};
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SELF_HEAL_SCRIPTS array is declared as 'char *SELF_HEAL_SCRIPTS[]' which creates mutable string pointers pointing to string literals. This should be 'const char *SELF_HEAL_SCRIPTS[]' to properly indicate that the strings themselves are immutable and to prevent potential undefined behavior from attempts to modify string literals.

Copilot uses AI. Check for mistakes.
/* To Update Crontab (Removes old, Adds new) */
void update_cron_entry(const char *script, unsigned long interval, bool should_run) {
// We always remove the old entry to avoid duplicates
v_secure_system("crontab -l 2>/dev/null | sed '/%s/d' | crontab -", script);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed pattern uses the script name without proper escaping. If a script name contains special regex characters (like dots in the extensions), this could fail to match correctly. Consider using '\Q...\E' or equivalent to escape the script name, or use a more robust approach like 'sed "//usr/ccsp/tad/$script/d"' with proper escaping of forward slashes.

Suggested change
v_secure_system("crontab -l 2>/dev/null | sed '/%s/d' | crontab -", script);
v_secure_system("crontab -l 2>/dev/null | grep -vF '/usr/ccsp/tad/%s' | crontab -", script);

Copilot uses AI. Check for mistakes.
Comment on lines +3811 to +3814
// If should_run is true, we append the new timing
if (should_run) {
v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cron entry format '*/%lu * * * ' may not work correctly for all interval values. The '/N' syntax in cron means "every Nth unit" starting from 0, so intervals don't align with wall-clock times consistently across restarts. For example, */60 in the minute field will only run at minute 0 (once per hour), not every 60 minutes as might be expected. Consider documenting this behavior or using a different scheduling approach if continuous interval-based execution is required.

Suggested change
// If should_run is true, we append the new timing
if (should_run) {
v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
// If should_run is true, we append the new timing
if (should_run) {
/*
* Note on cron semantics:
* - "*/N" in the minute field means "every Nth minute starting from 0",
* not "every N minutes from now". For values >= 60 this can be
* surprising (e.g., "*/60 * * * *" effectively runs at minute 0
* of every hour).
*
* We therefore:
* - Use "*/N * * * *" only for 1 <= N <= 59 (minute-based).
* - For N that are multiples of 60, convert to hour/day based
* expressions where appropriate.
*/
if (interval < 1) {
// Non-positive intervals are treated as "do not schedule".
return;
} else if (interval < 60) {
// Run every <interval> minutes.
v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
} else if (interval % 60 == 0 && interval < 24 * 60) {
// Run every <hours> hours at minute 0.
unsigned long hours = interval / 60;
v_secure_system("(crontab -l 2>/dev/null; echo '0 */%lu * * * /usr/ccsp/tad/%s') | crontab -",
hours, script);
} else if (interval % (24 * 60) == 0) {
// Run every <days> days at midnight.
unsigned long days = interval / (24 * 60);
v_secure_system("(crontab -l 2>/dev/null; echo '0 0 */%lu * * /usr/ccsp/tad/%s') | crontab -",
days, script);
} else {
// Fallback: keep minute-based "*/N" semantics, with the caveat that
// cron counts from minute 0, not from the time this entry is added.
v_secure_system("(crontab -l 2>/dev/null; echo '*/%lu * * * * /usr/ccsp/tad/%s') | crontab -",
interval, script);
}

Copilot uses AI. Check for mistakes.
Signed-off-by: aj970 <akshaya_j@comcast.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant