diff --git a/README.md b/README.md
index 1bc6aa74c..7c40e9e30 100644
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@
DiffDetective is an open-source Java library for variability-aware source code differencing and the **analysis of version histories of software product lines**. This means that DiffDetective can **turn a generic differencer into a variability-aware differencer** by means of a pre- or post-processing. DiffDetective is centered around **formally verified** data structures for variability (variation trees) and variability-aware diffs (variation diffs). These data structures are **generic**, and DiffDetective currently implements **C preprocessor support** to parse respective annotations when used to implement variability. The picture below depicts the process of variability-aware differencing.
-
+
Given two states of a C-preprocessor annotated source code file (left), for example before and after a commit, DiffDetective constructs a variability-aware diff (right) that distinguishes changes to source code from changes to variability annotations. DiffDetective can construct such a variation diff either, by first using a generic differencer, and separating the information (center path), or by first parsing both input versions to an abstract representation, a variation tree (center top and bottom), and constructing a variation diff using a tree differencing algorithm in a second step.
@@ -80,6 +80,18 @@ Additionally, there is a screencast available on YouTube, guiding you through th
[](https://www.youtube.com/watch?v=q6ight5EDQY)
+## Supported Differencing Algorithms
+
+In principle, any generic differencing algorithm (i.e, any algorithm that may operate on text or trees) can be made variability-aware with DiffDetective, as explained in our demo paper (see below). Some algorithms are integrated directly in the DiffDetective library, while others come as additional Maven projects.
+
+### Shipped with DiffDetective
+- Git Diff as implemented by [JGit](https://github.com/eclipse-jgit/jgit)
+- [GumTree](https://github.com/GumTreeDiff/gumtree), and all algorithms and matching engines supported by the GumTree library
+
+### Extra Modules
+- [TrueDiff](https://gitlab.rlp.net/plmz/truediff): Support for TrueDiff comes as [a separate Maven project](https://github.com/VariantSync/TrueDiffDetective).
+
+
## Publications
### Variability-Aware Differencing with DiffDetective (FSE 2024, ⭐ [Best Demo Paper](https://2024.esec-fse.org/info/awards) ⭐)
@@ -167,9 +179,10 @@ Edge-typed variation diffs and the replication package are implemented in a fork
DiffDetective was extended and used within bachelor's and master's theses:
+- _Unparsing von Datenstrukturen zur Analyse von C-Präprozessor-Variabilität_, Eugen Shulimov, Bachelor's Thesis, 2025, [DOI 10.17619/UNIPB/1-2385](http://doi.org/10.17619/UNIPB/1-2385), (german): Eugen added an unparser for variation trees, essentially inverting the horizontal arrows in our commuting diagram at the top of this README file. The unparser for variation diffs reuses the unparser for variation trees by projecting a variation diff to its two variation trees (before and after the change), unparsing the trees, and then diffing the obtained text files to eventually compute a text-based diff.
- _Constructing Variation Diffs Using Tree Diffing Algorithms_, Benjamin Moosherr, Bachelor's Thesis, 2023, [DOI 10.18725/OPARU-50108](https://dx.doi.org/10.18725/OPARU-50108): Benjamin added support for tree-differencing and integrated the GumTree differencer ([Github](https://github.com/GumTreeDiff/gumtree), [Paper](https://doi.org/10.1145/2642937.2642982)). In his thesis, Benjamin also reviewed a range of quality metrics for tree-diffs with focus on their applicability for rating variability-aware diffs. The [org.variantsync.diffdetective.experiments.thesis_bm](src/main/java/org/variantsync/diffdetective/experiments/thesis_bm) package implements the corresponding empirical study and may serve as an example on how to use the tree-differencing.
- _Reverse Engineering Feature-Aware Commits From Software Product-Line Repositories_, Lukas Bormann, Bachelor's Thesis, 2023, [10.18725/OPARU-47892](https://dx.doi.org/10.18725/OPARU-47892): Lukas implemented an algorithm for feature-based commit-untangling, which turns variation diff into a series of smaller diffs, each of which contains an edit to a single feature or feature formula. This work was later refined in our publication _Views on Edits to Variational Software_ illustrated above.
-- _Inspecting the Evolution of Feature Annotations in Configurable Software_, Lukas Güthing, Master's Thesis, 2023: Lukas implemented different edge-types for associating variability annotations within variation diffs. He published his work later at VaMoS 2024 under the title _Explaining Edits to Variability Annotations in Evolving Software Product Lines_, illustrated above.
+- _Inspecting the Evolution of Feature Annotations in Configurable Software_, Lukas Güthing, Master's Thesis, 2023: Lukas implemented different edge-types for associating variability annotations within variation diffs. He published his work later at VaMoS 2024 under the title _Explaining Edits to Variability Annotations in Evolving Software Product Lines_, illustrated above. His work can be found in a [fork][forklg] of DiffDetective.
- _Empirical Evaluation of Feature Trace Recording on the Edit History of Marlin_, Sören Viegener, Bachelor's Thesis, 2021, [DOI 10.18725/OPARU-38603](http://dx.doi.org/10.18725/OPARU-38603): In his thesis, Sören started the DiffDetective project and implemented the first version of an algorithm, which parses text-based diffs to C-preprocessor files to variation diffs. He also came up with an initial classification of edits, which we wanted to reuse to evaluate [Feature Trace Recording](https://variantsync.github.io/FeatureTraceRecording/), a method for deriving variability annotations from annotated patches.
[documentation]: https://variantsync.github.io/DiffDetective/docs/javadoc
diff --git a/default.nix b/default.nix
index 9503cc2f2..17640a7d1 100644
--- a/default.nix
+++ b/default.nix
@@ -26,7 +26,7 @@
},
doCheck ? true,
buildGitHubPages ? true,
- dependenciesHash ? "sha256-OdagSk6jYCkkw/kPoOJlma9yEK7hMBcNkuxE6qt0ra8=",
+ dependenciesHash ? "sha256-xQG7IjBROSXfMIe7kvU8fXfKShdqKwVaJR0y97jsZWU=",
}:
pkgs.stdenvNoCC.mkDerivation rec {
pname = "DiffDetective";
diff --git a/docs/datasets/eugen-bachelor-thesis.md b/docs/datasets/eugen-bachelor-thesis.md
new file mode 100644
index 000000000..52f415472
--- /dev/null
+++ b/docs/datasets/eugen-bachelor-thesis.md
@@ -0,0 +1,5 @@
+Project name | Domain | Source code available (\*\*y\*\*es/\*\*n\*\*o)? | Is it a git repository (\*\*y\*\*es/\*\*n\*\*o)? | Repository URL | Clone URL | Estimated number of commits
+-------------------|-------------------------|-------------------------------------------------|--------------------------------------------------|--------------------------------------------------------------|----------------------------------------------------|---------------------------------
+berkeley-db-libdb | database system | y | y | https://github.com/berkeleydb/libdb | https://github.com/berkeleydb/libdb.git | 7
+sylpheed | e-mail client | y | y | https://github.com/jan0sch/sylpheed | https://github.com/jan0sch/sylpheed.git | 2,682
+vim | text editor | y | y | https://github.com/vim/vim | https://github.com/vim/vim.git | 17,109
diff --git a/docs/teaser.png b/docs/teaser.png
deleted file mode 100644
index 24c43deb2..000000000
Binary files a/docs/teaser.png and /dev/null differ
diff --git a/pom.xml b/pom.xml
index a598614bd..50c0f52b1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -7,7 +7,7 @@
org.variantsyncdiffdetective
- 2.3.0
+ 2.4.0UTF-8
@@ -163,7 +163,7 @@
org.apache.commonscommons-lang3
- 3.17.0
+ 3.18.0
diff --git a/replication/thesis-es/Dockerfile b/replication/thesis-es/Dockerfile
new file mode 100644
index 000000000..4520b8042
--- /dev/null
+++ b/replication/thesis-es/Dockerfile
@@ -0,0 +1,57 @@
+# syntax=docker/dockerfile:1
+
+FROM alpine:3.15
+# PACKAGE STAGE
+
+# Prepare the compile environment. JDK is automatically installed
+RUN apk add maven
+
+# Create and navigate to a working directory
+WORKDIR /home/user
+
+COPY local-maven-repo ./local-maven-repo
+
+# Copy the source code
+COPY src ./src
+# Copy the pom.xml if Maven is used
+COPY pom.xml .
+# Execute the maven package process
+RUN mvn package || exit
+
+FROM alpine:3.15
+
+# Create a user
+RUN adduser --disabled-password --home /home/sherlock --gecos '' sherlock
+
+RUN apk add --no-cache --upgrade bash
+RUN apk add --update openjdk17
+
+# Change into the home directory
+WORKDIR /home/sherlock
+
+# Copy the compiled JAR file from the first stage into the second stage
+# Syntax: COPY --from=STAGE_ID SOURCE_PATH TARGET_PATH
+WORKDIR /home/sherlock/holmes
+COPY --from=0 /home/user/target/diffdetective-*-jar-with-dependencies.jar ./DiffDetective.jar
+WORKDIR /home/sherlock
+RUN mkdir results
+
+# Copy the setup
+COPY docs holmes/docs
+
+# Copy the docker resources
+COPY docker/* ./
+COPY replication/thesis-es/docker/* ./
+RUN mkdir DiffDetectiveMining
+
+# Adjust permissions
+RUN chown sherlock:sherlock /home/sherlock -R
+RUN chmod +x execute.sh
+RUN chmod +x entrypoint.sh
+RUN chmod +x fix-perms.sh
+
+# Set the entrypoint
+ENTRYPOINT ["./entrypoint.sh", "./execute.sh"]
+
+# Set the user
+USER sherlock
diff --git a/replication/thesis-es/README.md b/replication/thesis-es/README.md
new file mode 100644
index 000000000..dbac2923f
--- /dev/null
+++ b/replication/thesis-es/README.md
@@ -0,0 +1,49 @@
+
+[][documentation]
+[](INSTALL.md)
+[][website]
+[](../../LICENSE.LGPL3)
+
+# Unparsing Experiment
+This is an experiment for the bachelor thesis by Eugen Shulimov which tests the unparser for variation trees and diffs.
+
+### Prerequisite
+All following commands assume that working directory of your terminal is the `thesis-es` directory. Please switch directories, if this is not the case:
+```shell
+cd DiffDetective/replication/thesis-es
+```
+
+### Build the Docker container
+Start the docker deamon.
+Clone this repository.
+Open a terminal and navigate to the root directory of this repository.
+To build the Docker container you can run the `build` script corresponding to your operating system.
+#### Windows:
+`.\build.bat`
+#### Linux/Mac (bash):
+`./build.sh`
+
+### Start the experiment
+To execute the experiment you can run the `execute`script corresponding to your operating system.
+
+#### Windows:
+`.\execute.bat
+#### Linux/Mac (bash):
+`./execute.sh
+
+> If you want to stop the execution, you can call the provided script for stopping the container in a separate terminal.
+> When restarted, the execution will continue processing by restarting at the last unfinished repository.
+> #### Windows:
+> `.\stop-execution.bat`
+> #### Linux/Mac (bash):
+> `./stop-execution.sh`
+
+You might see warnings or errors reported from SLF4J like `Failed to load class "org.slf4j.impl.StaticLoggerBinder"` which you can safely ignore.
+
+### View the results in the [results][resultsdir] directory
+All raw results are stored in the [results][resultsdir] directory.
+
+[documentation]: https://variantsync.github.io/DiffDetective/docs/javadoc/
+[website]: https://variantsync.github.io/DiffDetective/
+
+[resultsdir]: results
diff --git a/replication/thesis-es/build.bat b/replication/thesis-es/build.bat
new file mode 100644
index 000000000..73bc8d9ab
--- /dev/null
+++ b/replication/thesis-es/build.bat
@@ -0,0 +1,19 @@
+@echo off
+setlocal
+
+set "targetSubPath=thesis-es"
+
+rem Get the current directory
+for %%A in ("%CD%") do set "currentDir=%%~nxA"
+
+rem Check if the current directory ends with the target sub-path
+
+if "%currentDir:~-9%"=="%targetSubPath%" (
+ cd ..\..
+ docker build -t diff-detective-unparse -f replication\thesis-es\Dockerfile .
+ @pause
+) else (
+ echo error: the script must be run from inside the thesis-es directory, i.e., DiffDetective\replication\%targetSubPath%
+)
+endlocal
+
diff --git a/replication/thesis-es/build.sh b/replication/thesis-es/build.sh
new file mode 100644
index 000000000..19bb7b741
--- /dev/null
+++ b/replication/thesis-es/build.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+
+# We have to switch to the root directory of the project and build the Docker image from there,
+# because Docker only allows access to the files in the current file system subtree (i.e., no access to ancestors).
+# We have to do this to get access to 'src', 'docker', 'local-maven-repo', etc.
+# For resiliency against different working directories during execution of this
+# script we calculate the correct path using the special bash variable
+# BASH_SOURCE.
+cd "$(dirname "${BASH_SOURCE[0]}")/../.." || exit
+
+docker build -t diff-detective-unparse -f replication/thesis-es/Dockerfile .
diff --git a/replication/thesis-es/docker/DOCKER.md b/replication/thesis-es/docker/DOCKER.md
new file mode 100644
index 000000000..966ebc0d5
--- /dev/null
+++ b/replication/thesis-es/docker/DOCKER.md
@@ -0,0 +1,6 @@
+# Docker Files
+
+This directory contains the files that are required to run the Docker container.
+
+## Execution
+The [`execute.sh`](execute.sh) script can be adjusted to run the program that should be executed by the Docker container.
\ No newline at end of file
diff --git a/replication/thesis-es/docker/execute.sh b/replication/thesis-es/docker/execute.sh
new file mode 100644
index 000000000..a4e2ce4dc
--- /dev/null
+++ b/replication/thesis-es/docker/execute.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+
+cd /home/sherlock/holmes || exit
+
+echo "Running the experiment."
+java -cp DiffDetective.jar org.variantsync.diffdetective.experiments.thesis-es.Main
+
+echo "Collecting results."
+cp -r results/* ../results/
+echo "The results are located in the 'results' directory."
+
diff --git a/replication/thesis-es/execute.bat b/replication/thesis-es/execute.bat
new file mode 100644
index 000000000..2467d7379
--- /dev/null
+++ b/replication/thesis-es/execute.bat
@@ -0,0 +1,15 @@
+@echo off
+setlocal
+
+set "targetSubPath=thesis-es"
+
+rem Get the current directory
+for %%A in ("%CD%") do set "currentDir=%%~nxA"
+
+rem Check if the current directory ends with the target sub-path
+if "%currentDir:~-9%"=="%targetSubPath%" (
+docker run --rm -v "%cd%\results":"/home/sherlock/results" diff-detective-unparse %*
+) else (
+ echo error: the script must be run from inside the thesis-es directory, i.e., DiffDetective\replication\%targetSubPath%
+)
+endlocal
diff --git a/replication/thesis-es/execute.sh b/replication/thesis-es/execute.sh
new file mode 100644
index 000000000..c0b2fba9b
--- /dev/null
+++ b/replication/thesis-es/execute.sh
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+# Assure that the script is only called from the folder cotaining this script
+cd "$(dirname "${BASH_SOURCE[0]}")" || exit
+
+if [[ $# -gt 0 ]]; then
+echo "Executing $1"
+fi
+docker run --rm -v "$(pwd)/results":"/home/sherlock/results" diff-detective-unparse "$@"
diff --git a/replication/thesis-es/results/.gitignore b/replication/thesis-es/results/.gitignore
new file mode 100644
index 000000000..86d0cb272
--- /dev/null
+++ b/replication/thesis-es/results/.gitignore
@@ -0,0 +1,4 @@
+# Ignore everything in this directory
+*
+# Except this file
+!.gitignore
\ No newline at end of file
diff --git a/replication/thesis-es/stop-execution.bat b/replication/thesis-es/stop-execution.bat
new file mode 100644
index 000000000..009494d8f
--- /dev/null
+++ b/replication/thesis-es/stop-execution.bat
@@ -0,0 +1,3 @@
+@echo "Stopping all running simulations. This will take a moment..."
+@FOR /f "tokens=*" %%i IN ('docker ps -a -q --filter "ancestor=diff-detective-unparse"') DO docker stop %%i
+@echo "...done."
\ No newline at end of file
diff --git a/replication/thesis-es/stop-execution.sh b/replication/thesis-es/stop-execution.sh
new file mode 100644
index 000000000..0ebe6a83a
--- /dev/null
+++ b/replication/thesis-es/stop-execution.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+echo "Stopping Docker container. This will take a moment..."
+docker stop "$(docker ps -a -q --filter "ancestor=diff-detective-unparse")"
+echo "...done."
diff --git a/src/main/java/org/variantsync/diffdetective/analysis/PreprocessingAnalysis.java b/src/main/java/org/variantsync/diffdetective/analysis/PreprocessingAnalysis.java
index 1ea16ece0..75c967adb 100644
--- a/src/main/java/org/variantsync/diffdetective/analysis/PreprocessingAnalysis.java
+++ b/src/main/java/org/variantsync/diffdetective/analysis/PreprocessingAnalysis.java
@@ -3,24 +3,25 @@
import java.util.Arrays;
import java.util.List;
-import org.variantsync.diffdetective.variation.diff.transform.VariationDiffTransformer;
+import org.variantsync.diffdetective.variation.diff.VariationDiff;
+import org.variantsync.diffdetective.variation.diff.transform.Transformer;
import org.variantsync.diffdetective.variation.DiffLinesLabel;
public class PreprocessingAnalysis implements Analysis.Hooks {
- private final List> preprocessors;
+ private final List>> preprocessors;
- public PreprocessingAnalysis(List> preprocessors) {
+ public PreprocessingAnalysis(List>> preprocessors) {
this.preprocessors = preprocessors;
}
@SafeVarargs
- public PreprocessingAnalysis(VariationDiffTransformer... preprocessors) {
+ public PreprocessingAnalysis(Transformer>... preprocessors) {
this.preprocessors = Arrays.asList(preprocessors);
}
@Override
public boolean analyzeVariationDiff(Analysis analysis) {
- VariationDiffTransformer.apply(preprocessors, analysis.getCurrentVariationDiff());
+ Transformer.apply(preprocessors, analysis.getCurrentVariationDiff());
analysis.getCurrentVariationDiff().assertConsistency();
return true;
}
diff --git a/src/main/java/org/variantsync/diffdetective/diff/git/GitDiffer.java b/src/main/java/org/variantsync/diffdetective/diff/git/GitDiffer.java
index a79f8936a..d1b2523fe 100644
--- a/src/main/java/org/variantsync/diffdetective/diff/git/GitDiffer.java
+++ b/src/main/java/org/variantsync/diffdetective/diff/git/GitDiffer.java
@@ -21,9 +21,11 @@
import org.variantsync.diffdetective.variation.DiffLinesLabel;
import org.variantsync.diffdetective.variation.diff.VariationDiff;
import org.variantsync.diffdetective.variation.diff.parse.VariationDiffParser;
+import org.variantsync.diffdetective.variation.tree.source.GitSource;
import java.io.*;
import java.nio.charset.StandardCharsets;
+import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
@@ -111,14 +113,12 @@ public static CommitDiffResult createCommitDiff(
final CanonicalTreeParser currentTreeParser = new CanonicalTreeParser();
final CanonicalTreeParser prevTreeParser = new CanonicalTreeParser();
try (ObjectReader reader = repository.getGitRepo().getRepository().newObjectReader()) {
- try {
- currentTreeParser.reset(reader, childCommit.getTree());
- if (parentCommit != null) {
- prevTreeParser.reset(reader, parentCommit.getTree());
- }
- } catch (IOException e) {
- return CommitDiffResult.Failure(DiffError.JGIT_ERROR, e.toString());
+ currentTreeParser.reset(reader, childCommit.getTree());
+ if (parentCommit != null) {
+ prevTreeParser.reset(reader, parentCommit.getTree());
}
+ } catch (IOException e) {
+ return CommitDiffResult.Failure(DiffError.JGIT_ERROR, e.toString());
}
final AbstractTreeIterator parentTreeIterator;
@@ -254,6 +254,7 @@ private static CommitDiffResult getPatchDiffs(
final VariationDiff variationDiff = VariationDiffParser.createVariationDiff(
fullDiff,
+ new GitSource(repository, childCommit.getId().name(), Path.of(filename)),
repository.getParseOptions().variationDiffParseOptions()
);
diff --git a/src/main/java/org/variantsync/diffdetective/diff/git/GitPatch.java b/src/main/java/org/variantsync/diffdetective/diff/git/GitPatch.java
index 6bdef202e..e0ca5d9b8 100644
--- a/src/main/java/org/variantsync/diffdetective/diff/git/GitPatch.java
+++ b/src/main/java/org/variantsync/diffdetective/diff/git/GitPatch.java
@@ -1,17 +1,19 @@
package org.variantsync.diffdetective.diff.git;
+import java.util.List;
+
import org.eclipse.jgit.diff.DiffEntry;
import org.variantsync.diffdetective.diff.text.TextBasedDiff;
+import org.variantsync.diffdetective.util.Source;
import org.variantsync.diffdetective.variation.diff.Time;
import org.variantsync.diffdetective.variation.diff.VariationDiff; // For Javadoc
-import org.variantsync.diffdetective.variation.diff.source.VariationDiffSource;
/**
* Interface for patches from a git repository.
* A git patch is a {@link TextBasedDiff} from which {@link VariationDiff}s can be created.
*
*/
-public interface GitPatch extends VariationDiffSource, TextBasedDiff {
+public interface GitPatch extends Source, TextBasedDiff {
/**
* Minimal default implementation of {@link GitPatch}
* @param getDiff The diff in text form.
@@ -41,6 +43,16 @@ public GitPatch shallowClone() {
public String toString() {
return oldFileName + "@ " + getParentCommitHash + " (parent) to " + newFileName + " @ " + getCommitHash + " (child)";
}
+
+ @Override
+ public String getSourceExplanation() {
+ return "SimpleGitPatch";
+ }
+
+ @Override
+ public List