DLinker is an RDF data linking tool.
- Depend of four hyperparameters(measure_level, alpha_predicate, alpha and phi)
- the strength of the similarity search measure 'measure_level'
- acceptance threshold for similar predicates 'alpha_predicate'
- acceptance threshold for similar literals 'alpha'
- number of accepted similarity pairs 'phi'
- put validation parameter after introduice the file ine validation path 'validation'
- HOBBIT and SPATEN(url) :
sh ./job.sh --input_path ./inputs/spaten_hobbit/ --output ./outputs/spaten_hobbit --alpha_predicate 1 --alpha 0.3 --phi 1 --measure_level 0 --validation ./validations/spaten_hobbit/valid_same_as.nt- Precision : 1.0
- Recall : 1.0
- F-measure : 1.0
- Doremus data (url) :
sh ./job.sh --input_path ./inputs/doremus/ --output ./outputs/doremus/ --alpha_predicate 1 --alpha 0.88 --phi 2 --measure_level 2 --validation ./validations/doremus/valid_same_as.ttl- Precision : 0.966
- Recall : 1.0
- F-measure : 0.983
- SPIMBENCH data(url) :
sh ./job.sh --input_path ./inputs/spimbench/ --output ./outputs/spimbench/ --alpha_predicate 1 --alpha 1 --phi 1 --measure_level 1 --validation ./validations/spimbench/valid_same_as.ttl- Precision : 0.786
- Recall : 1.0
- F-measure : 0.880
Make sure you have this in the './outputs/spimbench/similars_predicates.csv' the content below :
predicate_1,value_1,predicate_2,value_2,similarities
<http://www.bbc.co.uk/ontologies/creativework/title>,http://www.bbc.co.uk/ontologies/creativework/title,<http://www.bbc.co.uk/ontologies/creativework/title>,http://www.bbc.co.uk/ontologies/creativework/title,1.0
- Take only pairs of files in the inputs path('./inputs/') with any names. Example : 'source.ttl' and 'target.ttl'
- Compute pairs predicates
- Compute similars literals
- Once the data is in place the whole thing can be launched with the sbatch file('./job.sh') without forgot the hyperparameters
- Place results in the output path('./outputs')
- Place valid pairs in the file '/validations/valid_same_as.ttl' and call in argument with 'validation'
- Compute score similarity from this python script ('./score_computation.py')
DLinker is implemented with below elements to work properly :
- [Python >=3.8] - Awesome Language who is an interpreted, multi-paradigm and multi-platform programming language.!
- [Visual Studio Code Editor] - awesome text editor
- [markdown-it] - Markdown parser done right. Fast and easy to extend.
Python version (>=3.8) to run. Spacy version (>=3.4.1) to run.
Install the dependencies and devDependencies and start the server.
pip install spacy#!/bin/bash
i=1;
params=``
for param in "$@"
do
i=$((i + 1));
params=`echo $params $param`
done
# echo "All params : ". $params
python3.8 ./candidate_entities_pairs.py $params
python3.8 ./score_computation.py $params
Expected Output after running on HOBBIT AND SPATEN datasets :
sh ./job.sh --input_path ./inputs/spaten_hobbit/ --output ./outputs/spaten_hobbit --alpha_predicate 1 --alpha 0.3 --phi 1 --measure_level 0 --validation ./validations/spaten_hobbit/valid_same_as.nt
@prefix owl: <http://www.w3.org/2002/07/owl#> .
<http://www.spaten.com/trace-data#162531> owl:sameAs <http://www.hobbit.e79638702-1458-413d-a054-06ba82203597> .
<http://www.spaten.com/trace-data#207815> owl:sameAs <http://www.hobbit.ea985b39b-df18-43d2-aac1-21e41e04c910> .
<http://www.spaten.com/trace-data#21948> owl:sameAs <http://www.hobbit.ea0e27f34-3e0a-48ff-9a27-b4f3052187e4> .
<http://www.spaten.com/trace-data#332418> owl:sameAs <http://www.hobbit.e6ec56f99-f954-431c-b644-49e3f52c4608> .
<http://www.spaten.com/trace-data#402929> owl:sameAs <http://www.hobbit.e7dd07dff-791f-4611-af13-d2ab783b8880> .
<http://www.spaten.com/trace-data#44152> owl:sameAs <http://www.hobbit.e0cadb061-dfc0-4992-b9ec-77b03e869c19> .
...