Skip to content
/ VDGraph Public

VDGraph is a queryable graph construction and analysis tool to reveal vulnerability paths in maven based Java projects.

Notifications You must be signed in to change notification settings

nislab/VDGraph

Repository files navigation

VDGraph

Starting repository for building a KG from a SBOM and OSV-scanner output. The prerequisites are outlined in detail in SETUP.md.

Clone and build projects

The following steps are designed to generate and eventually query a list of many KGs from a specified list of projects:

  1. For whatever projects you want to work with, put its GitHub link on a new line in project_list.csv. Delete any entries from project_list.csv you're not interested in cloning.

  2. Run clone.sh, which will clone the projects of interest into a directory named projects using the entries from project_list.csv.

    • You should only need to do this step if there are new projects you want to clone. Note: run chmod 755 clone.sh in terminal if you get a permission denied error. Same applies to all shell scripts.
  3. Build all projects to generate their SBOM from pom.xml files. Run the following command inside a project's root directory to generate an SBOM: mvn org.cyclonedx:cyclonedx-maven-plugin:2.7.8:makeAggregateBom; Note that for this to work, all repositories inside projects need to be able to be built with Maven.

  4. Next, scan your projects for vulnerabilities using OSV-Scanner.

    • The command osv-scanner --sbom /path/to/SBOM --format json > projectName_osv_scan-tag.json will generate an SCA output using the SBOM as a reference while the command osv-scanner -r --format json /path/to/project/ > projectName_osv_tag.json will have OSV-scanner itself scan the directories for dependencies.

For each project, /docs folder should contain under project's name the sbom file in json format and osv scanner report in json format. Next we generate Neo4J files (nodes and relationships) from these two documents. The code for parsing these files into Neo4J will be made public upon publication of our work.

At this time, you may access the project_csv folder contents, which are the generated neo4J import files. For each of the projects in the test set, the files include node and relationship information.

  1. Put whatever queries you want to run on the project KGs in query_list.csv. Delete any entries from query_list.csv you're not interested in running.

    • Make sure to format your queries in query_list.csv correctly. The format for the file is as follows:
    query_name1,"query1"
    quer_name2,"query2"
    
    • It's recommended to first test and experiment with queries on a sample project in Neo4J desktop to make sure they work as intended before executing them on all the projects.
  2. Run query.sh with one command line argument. The argument should either be "bom-tag" or "scan-tag". bom-tag option will use OSV-Scanner output generated using SBOM and scan-tag will use OSV-Scanner output generated using the project build (pom.xml).

    • For each project, query.sh will import the .csv files in project_csvs, start the database, run all the queries in query_list.csv, outputs the results into query_results, and stop the database. It saves the import logs, runtimes, and actual query results.
    • Note that the database gets wiped clean each time for our implementation.
    • You should only need to do this step if there are new projects you want to process or new queries you want to execute.
    • If for any reason your process gets interrupted or stuck, remove the import logs under /query_results/bom-tag/import_logs/ directory belonging to the last project, and re-run the script. The process will continue from the last project.
  3. For updating projects to the most recent version without re-cloning, use ./update.sh. Be aware that your modifications in the projects folder will be reset.

Neo4J Terminal

Once Neo4J has been installed, there will be a default Neo4J database which we will use. To import data and wipe the previous data inside, run:

sudo /usr/bin/neo4j-admin database import full --nodes=<file1.csv> --nodes=<file2.csv> --relationships=<file3.csv> --relationships=<file4.csv> --multiline-fields=true --overwrite-destination 

The amount of files may vary.

To start the server, run:

sudo /usr/bin/neo4j-admin server start

To do a query, run:

cypher-shell -u neo4j -p neo4jpassword "<your_query>"

To stop the server, run

sudo /usr/bin/neo4j-admin server stop

Neo4J Desktop

Alternatively you can use the application Neo4J Desktop. If you ran the SBOM script with the -f flag, all the necessary files to create a project's KG in Neo4J should be in its output directory. The .csv files contain the data, the apoc.conf file is a configuration file, and the import_commands.txt contain Neo4J commands to import the data.

To create the KG in Neo4J desktop, we follow a similar process to ZhenPeng's tutorial

To summarize:

  1. Download and launch Neo4J Desktop
  2. Create a new project and "Local DBMS"
  3. Install the APOC plugin for the DBMS
  4. Open the DBMS conf directory and copy the apoc.conf file into it.
  5. Open the DBMS import directory and copy all the .csv files into it.
  6. Start and open the DBMS
  7. Run the first block of commands to create the constraints (technically optional but helps us ensure we don't have duplicate nodes)
  8. Run the second block of commands to import the .csv files.
  9. You're done! The last deletion command is just there for debugging purposes.

About

VDGraph is a queryable graph construction and analysis tool to reveal vulnerability paths in maven based Java projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages