RDKB-61882: WebUI - HTML Injection in wifi_spectrum_analyzer.jst#101
RDKB-61882: WebUI - HTML Injection in wifi_spectrum_analyzer.jst#101GoutamD2905 merged 1 commit intodevelopfrom
Conversation
Reason for change: WebUI - HTML Injection in wifi_spectrum_analyzer.jst Test Procedure: Test for HTML Injection in wifi_spectrum_analyzer.jst Risks:low Priority: P0 Signed-off-by: pavankumarreddy_balireddy@comcast.com
There was a problem hiding this comment.
Pull request overview
This pull request addresses an HTML injection vulnerability (RDKB-61882) in the WiFi Spectrum Analyzer feature. The PR adds a custom HTML sanitization function to filter user-provided HTML content before writing it to a downloadable file.
Changes:
- Added a
sanitize_html()function that whitelists specific HTML tags (H2, DIV, TABLE, TBODY, TR, TH, TD) and filters dangerous attributes - Applied sanitization to
$_POST['configInfo']before writing to the Wifi_Spectrum_Analyzer_Table.html file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function filterAttributes(attrText) { | ||
| // Parse attributes in a conservative way: name[=value] | ||
| // Keeps spacing as minimal as possible when reconstructing. | ||
| var out = []; | ||
| var re = /([^\s=\/"'>]+)(?:\s*=\s*(?:"([^"]*)"|'([^']*)'|([^\s"'=<>`]+)))?/g; | ||
| var m; | ||
| while ((m = re.exec(attrText)) !== null) { | ||
| var name = (m[1] || '').toLowerCase(); | ||
| var val = (m[2] != null) ? m[2] : (m[3] != null ? m[3] : (m[4] != null ? m[4] : '')); | ||
|
|
||
| // Drop obvious dangerous attributes | ||
| if (name.indexOf('on') === 0) continue; // onclick, onload, ... | ||
| if (name === 'style') continue; // inline CSS often abused | ||
|
|
||
| // Disallow javascript: and data: in URLish attributes | ||
| if (name === 'href' || name === 'src' || name === 'xlink:href') { | ||
| var v = String(val).replace(/^\s+|\s+$/g, '').toLowerCase(); | ||
| if (!v || v.indexOf('javascript:') === 0 || v.indexOf('data:') === 0) continue; | ||
| } | ||
|
|
||
| // Reconstruct attribute (quote with double-quotes) | ||
| if (val === '') out.push(name); | ||
| else out.push(name + '="' + val.replace(/"/g, '"') + '"'); | ||
| } | ||
| return out.length ? ' ' + out.join(' ') : ''; |
There was a problem hiding this comment.
The attribute filtering regex pattern has a potential issue. The pattern uses a greedy quantifier in the outer loop but doesn't properly handle edge cases where attribute values contain special characters. Specifically, unquoted attribute values matching ([^\s"'=<>`]+) could include characters that should be escaped or rejected. Additionally, the regex doesn't handle HTML entity-encoded attribute values, which could be used to bypass the javascript: and data: URL checks.
| var name = (m[1] || '').toLowerCase(); | ||
| var val = (m[2] != null) ? m[2] : (m[3] != null ? m[3] : (m[4] != null ? m[4] : '')); | ||
|
|
||
| // Drop obvious dangerous attributes | ||
| if (name.indexOf('on') === 0) continue; // onclick, onload, ... | ||
| if (name === 'style') continue; // inline CSS often abused | ||
|
|
||
| // Disallow javascript: and data: in URLish attributes | ||
| if (name === 'href' || name === 'src' || name === 'xlink:href') { |
There was a problem hiding this comment.
The dangerous attribute filtering only checks if attribute names start with 'on' (line 52) and blocks 'style', but this misses several attack vectors. Missing checks include: 1) formaction attribute that can redirect form submissions, 2) attributes like 'srcdoc' on iframes that can execute scripts, 3) namespace-prefixed event handlers like 'xml:onload', and 4) the check for event handlers could be bypassed with tab or other whitespace characters in the attribute name before 'on'.
| var name = (m[1] || '').toLowerCase(); | |
| var val = (m[2] != null) ? m[2] : (m[3] != null ? m[3] : (m[4] != null ? m[4] : '')); | |
| // Drop obvious dangerous attributes | |
| if (name.indexOf('on') === 0) continue; // onclick, onload, ... | |
| if (name === 'style') continue; // inline CSS often abused | |
| // Disallow javascript: and data: in URLish attributes | |
| if (name === 'href' || name === 'src' || name === 'xlink:href') { | |
| // Normalize attribute name: | |
| // - strip leading control/whitespace chars | |
| // - lowercase | |
| // - extract local name (after any namespace prefix, e.g. xml:onload -> onload) | |
| var rawName = m[1] || ''; | |
| var name = rawName.replace(/^[\s\x00-\x1F]+/, '').toLowerCase(); | |
| var nameNoNs = (name.indexOf(':') !== -1) ? name.split(':').pop() : name; | |
| var val = (m[2] != null) ? m[2] : (m[3] != null ? m[3] : (m[4] != null ? m[4] : '')); | |
| // Drop obvious dangerous attributes | |
| if (nameNoNs.indexOf('on') === 0) continue; // onclick, onload, xml:onload, ... | |
| if (nameNoNs === 'style') continue; // inline CSS often abused | |
| if (nameNoNs === 'srcdoc') continue; // iframe-like HTML execution | |
| if (nameNoNs === 'formaction') continue; // redirect/override form target | |
| // Disallow javascript: and data: in URLish attributes | |
| if (name === 'href' || name === 'src' || name === 'xlink:href' || nameNoNs === 'formaction') { |
| // Strip the entire element including its content until the matching closing tag | ||
| // (simple depth-1 removal; good enough for this whitelist) | ||
| var closing = "</" + tagName.toLowerCase() + ">"; | ||
| var nextClosing = lowerInput.indexOf(closing, end); | ||
| if (nextClosing !== -1) { | ||
| i = nextClosing + closing.length; |
There was a problem hiding this comment.
The tag stripping logic for disallowed tags has an issue. When a disallowed opening tag is found, the code searches for its closing tag in the lowercase version (lowerInput), but it uses simple string matching which doesn't account for nested tags of the same name. For example, if the input contains nested script tags or comments, the code might match the wrong closing tag, allowing malicious content to pass through. A proper HTML parser or stack-based matching should be used instead.
| // Strip the entire element including its content until the matching closing tag | |
| // (simple depth-1 removal; good enough for this whitelist) | |
| var closing = "</" + tagName.toLowerCase() + ">"; | |
| var nextClosing = lowerInput.indexOf(closing, end); | |
| if (nextClosing !== -1) { | |
| i = nextClosing + closing.length; | |
| // Strip the entire element including its content until the matching closing tag. | |
| // Use simple depth-based matching to correctly handle nested disallowed tags. | |
| var closing = "</" + tagName.toLowerCase() + ">"; | |
| var lower = input.toLowerCase(); | |
| var openTagPattern = "<" + tagName.toLowerCase(); | |
| var depth = 1; | |
| var searchPos = end + 1; | |
| while (depth > 0) { | |
| var nextOpen = lower.indexOf(openTagPattern, searchPos); | |
| var nextClose = lower.indexOf(closing, searchPos); | |
| if (nextClose === -1) { | |
| // No matching closing tag found; give up on depth-based removal | |
| break; | |
| } | |
| if (nextOpen !== -1 && nextOpen < nextClose) { | |
| // Found a nested opening tag of the same disallowed type | |
| depth++; | |
| searchPos = nextOpen + openTagPattern.length; | |
| } else { | |
| // Found a closing tag | |
| depth--; | |
| searchPos = nextClose + closing.length; | |
| } | |
| } | |
| if (depth === 0) { | |
| i = searchPos; |
| // Disallow javascript: and data: in URLish attributes | ||
| if (name === 'href' || name === 'src' || name === 'xlink:href') { | ||
| var v = String(val).replace(/^\s+|\s+$/g, '').toLowerCase(); | ||
| if (!v || v.indexOf('javascript:') === 0 || v.indexOf('data:') === 0) continue; | ||
| } |
There was a problem hiding this comment.
The URL validation in the filterAttributes function only checks if URLs start with 'javascript:' or 'data:' after trimming and converting to lowercase. However, this check can be bypassed using: 1) URL encoding (e.g., 'j%61vascript:'), 2) Unicode/HTML entities, 3) NULL bytes, 4) Alternative protocols like 'vbscript:', or 5) tab/newline characters within the protocol name. A more robust URL validation that handles encoded characters and validates against a whitelist of safe protocols (like http/https) would be more secure.
| var nextClosing = lowerInput.indexOf(closing, end); | ||
| if (nextClosing !== -1) { |
There was a problem hiding this comment.
The sanitization creates a lowercase version of the entire input string for tag matching (line 70), but this approach has a critical flaw. When searching for closing tags in the lowerInput string, the indices found will be used to slice the original input string. However, if the input contains multi-byte characters or certain Unicode characters, the byte offsets in the lowercase version may not match those in the original string, potentially causing incorrect slicing and bypasses of the sanitization logic.
| var nextClosing = lowerInput.indexOf(closing, end); | |
| if (nextClosing !== -1) { | |
| var searchStart = end; | |
| var lowerSlice = input.slice(searchStart).toLowerCase(); | |
| var relativeIndex = lowerSlice.indexOf(closing); | |
| if (relativeIndex !== -1) { | |
| var nextClosing = searchStart + relativeIndex; |
| // Drop obvious dangerous attributes | ||
| if (name.indexOf('on') === 0) continue; // onclick, onload, ... | ||
| if (name === 'style') continue; // inline CSS often abused | ||
|
|
||
| // Disallow javascript: and data: in URLish attributes | ||
| if (name === 'href' || name === 'src' || name === 'xlink:href') { | ||
| var v = String(val).replace(/^\s+|\s+$/g, '').toLowerCase(); | ||
| if (!v || v.indexOf('javascript:') === 0 || v.indexOf('data:') === 0) continue; | ||
| } | ||
|
|
||
| // Reconstruct attribute (quote with double-quotes) | ||
| if (val === '') out.push(name); | ||
| else out.push(name + '="' + val.replace(/"/g, '"') + '"'); | ||
| } | ||
| return out.length ? ' ' + out.join(' ') : ''; |
There was a problem hiding this comment.
The sanitizer allows 'class' and 'id' attributes without any validation (since they don't fall into the filtered categories). Attackers could potentially inject malicious class or id values that could be exploited through existing CSS rules or JavaScript selectors in the application. Consider adding validation or sanitization for these attributes, especially if the application has any CSS or JavaScript that targets elements by class or id.
| while (i < input.length) { | ||
| if (input[i] === '<') { | ||
| var start = i; | ||
| var end = input.indexOf('>', start); | ||
| if (end === -1) { | ||
| // no closing '>' — append the rest and stop | ||
| result += input.slice(i); | ||
| break; | ||
| } | ||
|
|
||
| // Raw tag content between '<' and '>' | ||
| var raw = input.substring(start + 1, end); | ||
| var tagContent = raw.replace(/^\s+|\s+$/g, ''); | ||
| var isClosing = tagContent.charAt(0) === '/'; | ||
|
|
||
| // Separate tag name and attributes (for opening tags) | ||
| var namePart = isClosing ? tagContent.slice(1) : tagContent; | ||
| var spaceIdx = namePart.indexOf(' '); | ||
| var tagName = (spaceIdx === -1 ? namePart : namePart.slice(0, spaceIdx)).toUpperCase(); | ||
| var attrsPart = (spaceIdx === -1 || isClosing) ? '' : namePart.slice(spaceIdx); | ||
|
|
||
| // Detect self-closing "/>" (rare for your table tags but harmless to support) | ||
| var selfClosing = /\/\s*$/.test(tagContent) && !isClosing; | ||
|
|
||
| if (isAllowed(tagName)) { | ||
| var tn = tagName.toLowerCase(); | ||
| if (isClosing) { | ||
| result += "</" + tn + ">"; | ||
| } else { | ||
| var attrsOut = ''; | ||
| if (KEEP_ATTRS) { | ||
| attrsOut = STRIP_DANGEROUS ? filterAttributes(attrsPart) : (attrsPart || ''); | ||
| } | ||
| // Normalize: ensure a leading space before attributes when present and not already spaced | ||
| if (attrsOut && !STRIP_DANGEROUS) { | ||
| // If original attrsPart doesn't start with space, add one | ||
| if (!/^\s/.test(attrsOut)) attrsOut = ' ' + attrsOut; | ||
| } | ||
| result += "<" + tn + (attrsOut || '') + (selfClosing ? "/>" : ">"); | ||
| } | ||
| i = end + 1; | ||
| continue; | ||
| } | ||
|
|
||
| // Not allowed tag | ||
| if (!isClosing) { | ||
| // Strip the entire element including its content until the matching closing tag | ||
| // (simple depth-1 removal; good enough for this whitelist) | ||
| var closing = "</" + tagName.toLowerCase() + ">"; | ||
| var nextClosing = lowerInput.indexOf(closing, end); | ||
| if (nextClosing !== -1) { | ||
| i = nextClosing + closing.length; | ||
| continue; | ||
| } | ||
| } | ||
|
|
||
| // For disallowed closing tags or unmatched structures, just skip the tag itself | ||
| i = end + 1; | ||
| } else { | ||
| result += input[i++]; | ||
| } | ||
| } |
There was a problem hiding this comment.
The sanitizer doesn't handle HTML comments, CDATA sections, DOCTYPE declarations, or processing instructions. An attacker could potentially inject malicious content wrapped in HTML comments or use other HTML constructs to bypass the tag-based filtering. These constructs should either be explicitly stripped or properly validated.
| // For disallowed closing tags or unmatched structures, just skip the tag itself | ||
| i = end + 1; | ||
| } else { | ||
| result += input[i++]; |
There was a problem hiding this comment.
Text content between tags is passed through without any escaping or entity encoding. If user-controlled data exists in the text content and contains characters like <, >, &, or quotes, these should be HTML-entity encoded to prevent interpretation as HTML markup. Currently, only the tags themselves are filtered, not the text content between them.
| result += input[i++]; | |
| var ch = input[i++]; | |
| if (ch === '&') { | |
| result += '&'; | |
| } else if (ch === '<') { | |
| // This should not normally occur here because '<' starts the tag branch above, | |
| // but we encode it defensively. | |
| result += '<'; | |
| } else if (ch === '>') { | |
| result += '>'; | |
| } else if (ch === '"') { | |
| result += '"'; | |
| } else if (ch === "'") { | |
| result += '''; | |
| } else { | |
| result += ch; | |
| } |
| // keepAttrs: true -> keep attributes for allowed tags (default) | ||
| // stripDangerous: true -> remove on* handlers and javascript: urls | ||
| var KEEP_ATTRS = true; | ||
| var STRIP_DANGEROUS = true; |
There was a problem hiding this comment.
The function uses constants KEEP_ATTRS and STRIP_DANGEROUS that are always set to true but never actually used as configuration. These appear to be residual configuration flags that aren't actually configurable. Either make these proper function parameters if they need to be configurable, or remove them and simplify the logic since they're always true.
Reason for change: WebUI - HTML Injection in wifi_spectrum_analyzer.jst
Test Procedure: Test for HTML Injection in wifi_spectrum_analyzer.jst
Risks:low
Priority: P0
Signed-off-by: pavankumarreddy_balireddy@comcast.com