Add parser for flawfinder by davidCarlos · Pull Request #31 · fedora-static-analysis/firehose

davidCarlos · 2017-03-30T16:47:37Z

I'm still working on this pull request (missing tests). soon i will remove the 'WIP' tag.

davidmalcolm · 2017-03-30T17:20:49Z

firehose/parsers/flawfinder.py

+            weakness_messages.append(message)
+            # TODO: Flawfinder can returns more than one CWE,
+            # when this rappends, get_cwe cannot return a valid value
+            # Fix this.


Do we need to support more than one CWE per issue in the schema? Currently <cwe> is optional within <issue>; should it be zeroOrMore?

I was going to ask this. I don't know what the best pratice in this case. When flawfinder returns more than one CWE, what should i do?

Can you post some examples of what the output looks like?

My current opinion is that we should change the schema and model.py to support multiple CWEs per issue.

./docs/examples/rtsp.c:170: [4] (buffer) sscanf:
The scanf() family's %s operation, without a limit specification, permits
buffer overflows (CWE-120, CWE-20). Specify a limit to %s, or use a
different input function.
./docs/examples/synctime.c:155: [4] (buffer) sscanf:
The scanf() family's %s operation, without a limit specification, permits
buffer overflows (CWE-120, CWE-20). Specify a limit to %s, or use a
different input function.

In my opinion, Add multiple CWEs would generate a more realistic output, once one warning could be related to more than one CWE.

I can send another PR doing this.

Thanks. I'll update the schema accordingly.

I will first make this change in models, then i finish the flawfinder parser.

@davidmalcolm I have a doubt. Projects that already use firehose, and pass to Issue model a single cwe using a integer value. Should we maintain the compatibility? (Allows de user to pass a integer or a list of integers). Or we keep only a list of integers?

This change impacts the others parsers too.

davidmalcolm · 2017-03-31T13:06:34Z

As noted in #33 we don't have a good design yet for how to handle multiple CWEs in one issue.

For now, maybe have the parser capture the first CWE (if any), and note the limitation as a bug, marking it as a dependency of #33.

Thanks.

davidCarlos · 2017-04-05T03:00:52Z

Hey @davidmalcolm , i have updated this PR. I added some tests, and have included flawfinder in the documentation. Now the parser capture the first CWE, as we had discussed. I created the issue #35, to map the multiple cwes problem.

davidmalcolm

Thanks; this is looking good. I have some concerns about version-handling, and would prefer it if the parser captured the severity of issues (and maybe categories also).

Sorry if this seems nit-picky.

davidmalcolm · 2017-04-05T12:52:35Z

firehose/parsers/flawfinder.py

+    """
+
+    generator = Generator(name='flawfinder',
+                          version=flawfinder())


Note that a Generator's version can be None. It looks like the actual version is embedded in the first line of the output:

Flawfinder version 1.31, (C) 2001-2014 David A. Wheeler.

so presumably parse_file should be looking for lines like that and parsing them.

Fixed. Now i get the version from the Flawfinder output.

davidmalcolm · 2017-04-05T12:54:20Z

firehose/parsers/flawfinder.py

+def get_cwe(line):
+    """Extract CWE from flawfinder alarms
+    It's possible that a alarm not have a correlated CWE
+


Should have a comment here referencing issue #35, and that it's just the first CWE if there's more than one.

davidmalcolm · 2017-04-05T12:58:17Z

firehose/parsers/flawfinder.py

+    try:
+        return check_output(["flawfinder", "--version"]).strip()
+    except Exception:
+        return '1.31'


As noted above, I'd much prefer it if parse_file parsed the version header in the output.

If the output file doesn't contain a version string, then I think it's better for the parser to leave the Generator's version as None, rather than trying to scrape it from flawfinder --version (or guessing). Rationale: I use firehose in mock-with-analysis, which does the analysis in a chroot. There might be a different version of flawfinder in the chroot compared to what's in the host $PATH.

So I think it's better to get rid of this function and do the version-handling in parse_file.

davidmalcolm · 2017-04-05T13:05:53Z

firehose/parsers/flawfinder.py

+            line = loaded_file.readline()
+
+    counter = 0
+    for weakness in weakness_paths:


Rather than manually incrementing counter, could this code use Python's enumerate builtin? (and maybe use idx rather then counter?)

davidmalcolm · 2017-04-05T13:11:17Z

firehose/parsers/flawfinder.py

+
+        issue = Issue(cwe, None, location,
+                      Message(text=weakness_messages[counter]), notes=None,
+                      trace=None, severity=None, customfields=None)


Re severity: Flawfinder's docs say:

The risk level is shown inside square brackets and varies from 0, very little risk, to 5, great risk.

It would be good to capture that number as Issue.severity (as a string, without the square brackets):
http://firehose.readthedocs.io/en/latest/data-model.html#firehose.model.severity

davidmalcolm · 2017-04-05T13:15:47Z

firehose/parsers/flawfinder.py

+        else:
+            cwe = int(weakness_cwes[counter])
+
+        issue = Issue(cwe, None, location,


The None here is the Issue.testid:
http://firehose.readthedocs.io/en/latest/data-model.html#firehose.model.testid

Perhaps the (shell) system: stuff should be captured here. If I'm reading the Flawfinder docs correctly:

Each entry has a filename, a colon, a line number, a
risk level in brackets (where 5 is the most risky), a category, the name of the function, and a description of

....then Flawfinder's "category" here is "shell" and the name of the function is "system". I'm not yet sure how that should map to a firehose Issue.testid.

@davidmalcolm how should we handle this? Apprently flawfinder does not have a testid value.

Can we use flawfinder's "category" here for the FIrehose "testid"? Looking at the sample output, that would give values like "shell", "format", "buffer", "race", which seems sane.

davidCarlos · 2017-04-14T15:27:05Z

Hey @davidmalcolm , i updated this PR with the last revision.

davidmalcolm

Thanks for the updates.

I've got a few more nitpicks (sorry), covering code I had trouble following.

davidmalcolm · 2017-04-14T16:28:14Z

firehose/parsers/flawfinder.py

+
+    :arg1: file passed by cli
+    :returns: file loaded in memory
+    """


This doesn't actually load a file into memory; it merely returns a file object representing a file opened for reading. The code would be clearer if you got rid of this function and replace users with just the open builtin.

davidmalcolm · 2017-04-14T16:31:03Z

firehose/parsers/flawfinder.py

+    """ Parser loaded flawfinder output
+
+    :loaded_file: flawfinder report, loaded in memory
+    :returns: Firehose Analysis object, representing the final XML.


"loaded_file" is actually a file-like object, not an in-memory string. Please rename to "infile", or similar ("input" is a builtint IIRC)

davidmalcolm · 2017-04-14T16:35:46Z

firehose/parsers/flawfinder.py

+            data["message"] = message;
+            line = message_line
+            flawfinder_data.append(data)
+        else:


I'm finding it hard to follow this two-stage parsing. If I'm reading things right, the code first reads the lines, building up a list of dicts (flawfinder_data). There's then a second loop over that list, creating Issue objects and adding them to analysis.results.

Wouldn't it be much simpler to have one loop, and append Issue objects directly to analysis.results, without the data dict i.e. work directly with an Issue object instead.

Hope that makes sense; I may have misread this code.

davidmalcolm · 2017-04-14T16:39:25Z

firehose/parsers/flawfinder.py

+                int(line[cwe_first_int_char])
+                cwe_first_int_char += 1
+        except Exception:
+            cwe = line[(cwe_init_str_pos + 4):(cwe_first_int_char)]


Wouldn't this be much simpler with a regex e.g.

re.findall(line, "CWE-([0-9]+)")

or somesuch? (which I think would give a list of tuples of strings).

Could then rename the function to find_cwes and have it return a list of ints, and do the dropping of multiples in the caller.

Can also write a unit test for the CWE parser that way.

davidmalcolm · 2017-04-14T16:40:15Z

firehose/parsers/flawfinder.py

+
+
+def flawfinder(first_line):
+    """Retrieve flawfinder version from report.


I find the current name to be unclear; please rename to parse_first_line (as that's what it does).

Maybe rename to get_flawfinder_version should be better, what do you think @davidmalcolm ?

davidmalcolm · 2017-04-14T16:42:26Z

tests/parsers/test_flawfinder_parser.py

+
+class TestParseXml(unittest.TestCase):
+    def parse_example(self, filename):
+        report = load_file(os.path.join(os.path.dirname(__file__),


Given that it's a file-like object, maybe turn into:

path = os.path.join(...) with open(path) as f: return parse_file (f)

davidCarlos · 2017-04-14T23:44:52Z

@davidmalcolm thanks for the great revision, i had updated the PR.

davidmalcolm

Thanks for the updated version.

This is looking much better, but I have a few more nitpicks (sorry).

davidmalcolm · 2017-04-15T12:05:17Z

firehose/parsers/flawfinder.py

+        arg_file = sys.argv[1]
+        report = open(arg_file, 'r')
+    except IOError:
+        print("File note found")


Typo: "note" should be "not".
But an IOError is not necessarily "file not found"; it could be a permissions error, or an attempt to open a directory rather than a file, etc. Maybe just lose this clause?

davidmalcolm · 2017-04-15T12:19:03Z

firehose/parsers/flawfinder.py

+            (prog.search(message_line) and prog2.search(message_line)):
+                issue_message += " " + message_line
+                try:
+                    cwe = re.findall("CWE-([0-9]+)", message_line)[0]


I don't like using try/except for parsing; I prefer to use non-exception program flow (rationale: sometimes we end up catching a different exception than the one we thought we were, so it can conceal bugs).

Anything that matches the ([0-9]+) is known to be convertible to int, so how about:

# list of ints cwes = []

before the "while message_line" loop, and:

cwes.extend([int(m.group(0)) for m in re.findall("CWE-([0-9]+)", message_line)])

here, and then after the "message_line" loop:

if cwes: first_cwe = cwes[0] else first_cwe = None

to avoid the two try/except clauses.

Please reinstate the comment about issue #35

davidmalcolm · 2017-04-15T12:23:11Z

firehose/parsers/flawfinder.py

+        try:
+            testid = re.findall(r"\(([a-z]+)\)", line)[0]
+        except IndexError:
+            pass


If we just want the first match, just use re.search, which returns None for the "not-found" case ( https://docs.python.org/3.4/library/re.html#re.search ).

davidCarlos · 2017-04-15T23:35:32Z

@davidmalcolm I had updated the PR

davidmalcolm

Thanks for your improvements here.

Ironically, with the code cleanups you've done, you've simplified things enough that I can now see a couple of other issues that were hiding.

davidmalcolm · 2017-04-17T15:11:12Z

firehose/parsers/flawfinder.py

+            testid = re.search("\(([a-z]+)\)", line).group(1)
+            issue_path = line.split(":")[0]
+            issue_line = line.split(":")[1]
+


Thanks for doing the earlier cleanups; the code is clearer now.

With that, the handling of prog, prog2 and the testid looks a bit unusual; presumably these always appear on the same line, in a consistent way. Given that, I think it would be improved by using a single regular expression; something like:

# We want to match a line like: # ./docs/examples/asiohiper.cpp:78: [4] (shell) system: filename_group = "(.+)" linenum_group = "([0-9]+) severity_group = "\[([0-9])\]" category_group = "$(.+)$" funcname_group = "(\S+)" ws = '\s+' PAT= (filename_group + ':' + linenum_group + ':' + ws + severity_group + ws + category_group + ws + funcname_group)

It should then be possible to do:

m = re.match(PAT, message_line) if m: # do stuff with m.groups()

davidmalcolm · 2017-04-17T15:12:24Z

firehose/parsers/flawfinder.py

+
+            issue = Issue(first_cwe, testid, location,
+                          Message(text=issue_message), notes=None,
+                          trace=None, severity=None, customfields=None)


Please set the severity field to the number in square parens (keep it as a string, rather than calling int on it).

davidmalcolm · 2017-04-17T15:12:50Z

firehose/parsers/flawfinder.py

+            cwes = []
+            while message_line != "\n" and not \
+            (prog.search(message_line) and prog2.search(message_line)):
+                issue_message += " " + message_line


You could reuse the PAT from above here.

- Add only one cwe in Firehose report - Retrieve flawfinder version from report. - Use enumerate instead of counter. - Add tests - Add comment about multiple cwes. - Fix regex.

davidCarlos · 2017-04-18T00:28:26Z

@davidmalcolm I had updated the PR

davidmalcolm

Thanks for all the updates; this looks good.

davidmalcolm reviewed Mar 30, 2017

View reviewed changes

davidCarlos mentioned this pull request Mar 31, 2017

WIP: Include in Firehose models multipe CWEs #33

Open

davidCarlos force-pushed the flawfinder branch 2 times, most recently from 76666dd to d10b185 Compare April 5, 2017 02:27

davidCarlos changed the title ~~[WIP]: Add parser to flawfinder~~ Add parser to flawfinder Apr 5, 2017

davidCarlos force-pushed the flawfinder branch from 71f6eb9 to 4b812a5 Compare April 5, 2017 02:52

davidCarlos force-pushed the flawfinder branch from 4b812a5 to 262832e Compare April 5, 2017 03:19

davidmalcolm reviewed Apr 5, 2017

View reviewed changes

davidmalcolm changed the title ~~Add parser to flawfinder~~ Add parser for flawfinder Apr 5, 2017

davidCarlos force-pushed the flawfinder branch from eb9f8a3 to 8386eb3 Compare April 14, 2017 15:25

davidmalcolm requested changes Apr 14, 2017

View reviewed changes

davidCarlos force-pushed the flawfinder branch 2 times, most recently from 863d9a5 to 8bf2e8a Compare April 14, 2017 23:42

davidCarlos force-pushed the flawfinder branch 3 times, most recently from 4c64931 to 65ac128 Compare April 15, 2017 00:12

davidmalcolm requested changes Apr 15, 2017

View reviewed changes

davidCarlos force-pushed the flawfinder branch from 65ac128 to 888cac4 Compare April 15, 2017 23:34

davidmalcolm requested changes Apr 17, 2017

View reviewed changes

Add parser to flawfinder

e90ebf5

- Add only one cwe in Firehose report - Retrieve flawfinder version from report. - Use enumerate instead of counter. - Add tests - Add comment about multiple cwes. - Fix regex.

davidCarlos force-pushed the flawfinder branch from 888cac4 to e90ebf5 Compare April 18, 2017 00:28

davidmalcolm approved these changes Apr 18, 2017

View reviewed changes

davidmalcolm merged commit b912b9f into fedora-static-analysis:master Apr 18, 2017



		def flawfinder(first_line):
		"""Retrieve flawfinder version from report.

Comments

Conversation

davidCarlos commented Mar 30, 2017

Uh oh!

davidmalcolm Mar 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidCarlos Mar 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidmalcolm commented Mar 31, 2017

Uh oh!

davidCarlos commented Apr 5, 2017

Uh oh!

davidmalcolm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidCarlos commented Apr 14, 2017

Uh oh!

davidmalcolm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidCarlos commented Apr 14, 2017

Uh oh!

davidmalcolm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

davidmalcolm Mar 30, 2017 •

edited

Loading

davidCarlos Mar 30, 2017 •

edited

Loading