http: trim whitespace from header field name#5844
http: trim whitespace from header field name#5844jasnell wants to merge 1 commit intonodejs:masterfrom
Conversation
The http-parser tolerates extraneous whitespace in header field names on incoming messages. This whitespace should be trimmed when those header fields are passed on to IncomingMessage.
| // always joined. | ||
| IncomingMessage.prototype._addHeaderLine = function(field, value, dest) { | ||
| field = field.toLowerCase(); | ||
| field = field.toLowerCase().trim(); |
There was a problem hiding this comment.
I think the trim() should be moved to the default: case to avoid unnecessary perf hits for the common header names.
There was a problem hiding this comment.
+1 for trimming in the default case.
There was a problem hiding this comment.
Unfortunately we can't. Consider, Content-Type for instance. If I move the trim() into the default: clause then send the following request:
POST / HTTP/1.1
Content-Type: text/plain
Content-Type : text/foo
The headers that end up reported on the server side are:
{
'content-type': ['text/plain', 'text/foo']
}However, if I send the second content type without the whitespace, it comes to:
POST / HTTP/1.1
Content-Type: text/plain
Content-Type: text/foo
{ 'content-type': 'text/plain' }One thing I could do is a quick charAt check on the field name is see if it's \s or \t and trim only if it is.
Modifying the http-parser to do the trimming turns out to be quite a bit more involved. I'll keep investigating that change. @indutny may have some ideas.
There was a problem hiding this comment.
@jasnell why is it hard to do it in http-parser? It seems to me that it is just a matter of adding two extra states.
There was a problem hiding this comment.
The parser state model is fairly complicated. I get a bit hesitant when it
comes to messing with it. It's the right approach, for sure tho. I just
wanted to make sure that someone more familiar with that bit of code
weighed in on it.
On Mar 25, 2016 6:24 AM, "Fedor Indutny" notifications@github.com wrote:
In lib/_http_incoming.js
#5844 (comment):@@ -121,7 +121,7 @@ IncomingMessage.prototype._addHeaderLines = function(headers, n) {
// and drop the second. Extended header fields (those beginning with 'x-') are
// always joined.
IncomingMessage.prototype._addHeaderLine = function(field, value, dest) {
- field = field.toLowerCase();
- field = field.toLowerCase().trim();
@jasnell https://github.com/jasnell why is it hard to do it in
http-parser? It seems to me that it is just a matter of adding two extra
states.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/pull/5844/files/2f45d2eda4f16435bf844ebf4f28e84bf159e2cb#r57443293
|
I wonder if it would be possible to have the http parser do the trimming to avoid having the perf hit at all? |
|
Generally looks good, but I'm not sure about something.
Why? |
|
@mscdex ... it potentially can be moved into http-parser, yes, I'll investigate further. @benjamingr ... because technically whitespace between the field name and |
|
I can see a scenario where this would be a breaking change for someone but it does sound convoluted. Makes sense, and I see how it could be causing bugs - +1 with the trim only when needed nit. |
|
@indutny ... do you have any thoughts on how we can modify http-parser to strip the header field name whitespace without it being too complicated of a change? |
|
Agreed with @mscdex that trimming should probably happen in the parser, should be much more efficient. If not, please trim before toLowerCase (so that toLowerCase may operate on less characters). |
|
@jasnell I'm slightly curious about reasoning behind nodejs/http-parser@48a4364f . Looks like it should be reverted. |
|
@jasnell sorry, I meant part of it that allows whitespace in the middle of header field. |
Skip whitespace from the left and the right sides of the header field. See: nodejs/node#5844
|
Alternative fix for http-parser: https://github.com/nodejs/http-parser/pull/295/files . On our benchmarks it seems that parser becomes 4% less efficient in a presence of leading/trailing whitespace in the field. |
|
Without the whitespace performance drop is about 10%, so I guess we should benchmark this PR too and see which one will perform better. |
|
Or maybe just skip whitespace in |
|
It's also going to be a breaking change for some. Trimming leading
|
|
Isn't this against RFC7230 which says whitespace is optional (e.g. not forbidden): |
|
@silverwind ... this deals with whitespace before the colon, not after. RFC7230 forbids whitespace before the colon. leading and trailing whitespace in the header field value is ok. @indutny ... btw, I did see your other proposed approach but haven't had time to review it. I'll be returning to this later this week. |
|
@jasnell yeah, just noticed my error. Still I'd be cautious about this perf hit. I'll re-label as major because it's definitely a breaking one. |
|
Has anyone actually done perf testing on this yet? |
|
@Fishrock123 .. not yet. I'll be returning to this likely tomorrow. That said, the change is likely better done within the http-parser as @indutny has suggested so I will likely close this PR in favor of that other change. |
|
Closing this in favor of making the change in http_parser. That said, the proposed change in http_parser had some issues of it's own. Will still need to work those out but it's not likely to land before v6 |
|
@jasnell what's the rationale behind this? As I mentioned it seems that the performance hit in http-parser is quite high. Is JS variant even slower than that? |
Pull Request check-list
make -j8 test(UNIX) orvcbuild test nosign(Windows) pass withthis change (including linting)?
test (or a benchmark) included?
existing APIs, or introduces new ones)?
Affected core subsystem(s)
http
Description of change
The http-parser tolerates extraneous whitespace in header field
names on incoming messages. This whitespace should be trimmed
when those header fields are passed on to IncomingMessage.
/cc @indutny @nodejs/http