Starting repository for building a KG from a SBOM and OSV-scanner output. The prerequisites are outlined in detail in SETUP.md.
The following steps are designed to generate and eventually query a list of many KGs from a specified list of projects:
-
For whatever projects you want to work with, put its GitHub link on a new line in
project_list.csv. Delete any entries fromproject_list.csvyou're not interested in cloning. -
Run
clone.sh, which will clone the projects of interest into a directory namedprojectsusing the entries fromproject_list.csv.- You should only need to do this step if there are new projects you want to clone.
Note: run
chmod 755 clone.shin terminal if you get a permission denied error. Same applies to all shell scripts.
- You should only need to do this step if there are new projects you want to clone.
Note: run
-
Build all projects to generate their SBOM from pom.xml files. Run the following command inside a project's root directory to generate an SBOM:
mvn org.cyclonedx:cyclonedx-maven-plugin:2.7.8:makeAggregateBom;Note that for this to work, all repositories insideprojectsneed to be able to be built with Maven. -
Next, scan your projects for vulnerabilities using OSV-Scanner.
- The command
osv-scanner --sbom /path/to/SBOM --format json > projectName_osv_scan-tag.jsonwill generate an SCA output using the SBOM as a reference while the commandosv-scanner -r --format json /path/to/project/ > projectName_osv_tag.jsonwill have OSV-scanner itself scan the directories for dependencies.
- The command
For each project, /docs folder should contain under project's name the sbom file in json format and osv scanner report in json format. Next we generate Neo4J files (nodes and relationships) from these two documents. The code for parsing these files into Neo4J will be made public upon publication of our work.
At this time, you may access the project_csv folder contents, which are the generated neo4J import files. For each of the projects in the test set, the files include node and relationship information.
-
Put whatever queries you want to run on the project KGs in
query_list.csv. Delete any entries fromquery_list.csvyou're not interested in running.- Make sure to format your queries in
query_list.csvcorrectly. The format for the file is as follows:
query_name1,"query1" quer_name2,"query2"- It's recommended to first test and experiment with queries on a sample project in Neo4J desktop to make sure they work as intended before executing them on all the projects.
- Make sure to format your queries in
-
Run
query.shwith one command line argument. The argument should either be "bom-tag" or "scan-tag". bom-tag option will use OSV-Scanner output generated using SBOM and scan-tag will use OSV-Scanner output generated using the project build (pom.xml).- For each project,
query.shwill import the .csv files inproject_csvs, start the database, run all the queries inquery_list.csv, outputs the results intoquery_results, and stop the database. It saves the import logs, runtimes, and actual query results. - Note that the database gets wiped clean each time for our implementation.
- You should only need to do this step if there are new projects you want to process or new queries you want to execute.
- If for any reason your process gets interrupted or stuck, remove the import logs under /query_results/bom-tag/import_logs/ directory belonging to the last project, and re-run the script. The process will continue from the last project.
- For each project,
-
For updating projects to the most recent version without re-cloning, use ./update.sh. Be aware that your modifications in the projects folder will be reset.
Once Neo4J has been installed, there will be a default Neo4J database which we will use. To import data and wipe the previous data inside, run:
sudo /usr/bin/neo4j-admin database import full --nodes=<file1.csv> --nodes=<file2.csv> --relationships=<file3.csv> --relationships=<file4.csv> --multiline-fields=true --overwrite-destination
The amount of files may vary.
To start the server, run:
sudo /usr/bin/neo4j-admin server start
To do a query, run:
cypher-shell -u neo4j -p neo4jpassword "<your_query>"
To stop the server, run
sudo /usr/bin/neo4j-admin server stop
Alternatively you can use the application Neo4J Desktop. If you ran the SBOM script with the -f flag, all the necessary files to create a project's KG in Neo4J should be in its output directory. The .csv files contain the data, the apoc.conf file is a configuration file, and the import_commands.txt contain Neo4J commands to import the data.
To create the KG in Neo4J desktop, we follow a similar process to ZhenPeng's tutorial
To summarize:
- Download and launch Neo4J Desktop
- Create a new project and "Local DBMS"
- Install the APOC plugin for the DBMS
- Open the DBMS conf directory and copy the
apoc.conffile into it. - Open the DBMS import directory and copy all the
.csvfiles into it. - Start and open the DBMS
- Run the first block of commands to create the constraints (technically optional but helps us ensure we don't have duplicate nodes)
- Run the second block of commands to import the
.csvfiles. - You're done! The last deletion command is just there for debugging purposes.