There is a follow up project now on https://github.com/MaRDI4NFDI/MetaExtractIng

ExtractIng

ExtractIng is a tool for the automated metadata extraction. It was developed for the extraction of simulation code outputs in high performance computing environments / computational engineering. Please be aware that the tool is a prototypical implementation. It was developed to show which types of metadata are extractable. A high-level description of the tool and the outcomes of the research can be found in [3].

Prerequisites

For compiling:

Maven
Java 8 (JDK)

For running:

Java 8 (JRE)
Bash
Spark, if the parallel version should be used

If there are problems with a newer version of Java, you can use the following procedure (for example with Ubuntu):

```
sudo apt-get install openjdk-8-jre
```
```
sudo update-alternatives --config java
```

Build

To build the extractor:

mvn clean package

Or:

mvn clean test-compile install package

Configuration

There are several levels of configuration for the tool.

Configuration of the metadata model.

ExtractIng is not bound to a metadata model. The metadata model defines the keys that can be parsed. As it stands, ExtractIng is implemented with EngMeta, a metadata model for computational engineering ([1], [2]). If a different metadata model should be used, the XSD can be automatically transformed to Java classes with the help of the JAXB framework, which was used here. Ususally the user won't do this. If it must be done, all the output classes have to be written for these specific Java classes.

Configuration of the simulation code.

ExtractIng is not bound to a specific simulation code. It is designed in generic way so that users specifiy what and where to parse in a configuration file. A sample file fdm.conf is delivered with the package. The configuration file is basically a lookup table with the following syntax:

where

<EngMetaKey> is the metadata key according to the EngMeta scheme, that should be extracted.

<filename> is the name of the file (and path) where the information can be found.

<searchKey> the search key, where the metadata information is found.

<delimiter> specifies the delimiter that sperates the key from the value.

<semantics> is needed when there are multiple occurences of the same key.

The user has to specifiy each information he or she wants to extract for each simulation code once.

Basically the tool can extract metadata information that complies to the form

Configration of the wrapping script.

The user might needs to adjust some paths in the fdm.sh wrapper script. Especially the jarPath has to be set carefully to the actual directory of the jar file of ExtractIng.

Usage

ExtractIng is wrapped in script, which performs some preparatory tasks before runnig the extraction. The syntax of the script is as follows:

./fdm.sh -c <configFile> -p <directoryToParse>|"<dir1> <dir2> ..." -m [scanner|spark] [-e <executorCores>

<configFile> should hold the location of the configuration file

<directoryToParse> specifies the directory of the simulation code outputs, where the information should be extracted from. When multiple directories should be parsed, they have to be put in brackets.

[scanner|spark] specifies the mode: scanner mode is the native, parallel mode, whereas spark is the parallel execution mode that needs the Spark Data Analytics Frameworki

<executorCore> specifies the number of cores that should be used for processing, if parallel/Spark version is used

Version History / Change Log

March 20th 2020, v0.8 Initial relase of ExtractIng

September 2nd 2020, v0.82

Multiple data files can reside in the directory to parse
Syntax changes in the wrapper script
Multiple directories can now be parsed in one program clal

Development Roadmap

Since this is a prototypical implementation, we try to continiously improve the code and add feature.

Planned:

Improved cutting function to capture values that are marshalled with leading and trailing characters, such as key="value".

References

[1] Schembera, Björn, and Dorothea Iglezakis. "The Genesis of EngMeta-A Metadata Model for Research Data in Computational Engineering." Research Conference on Metadata and Semantics Research. Springer, Cham, 2018. https://link.springer.com/chapter/10.1007/978-3-030-14401-2_12

[2] https://www.izus.uni-stuttgart.de/fokus/engmeta/

[3] Schembera, Björn. "Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark data." The Journal of Supercomputing (2021): 1-21. https://link.springer.com/article/10.1007/s11227-020-03602-6

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/main/java		src/main/java
EngMeta_v1.1.xsd		EngMeta_v1.1.xsd
LICENSE		LICENSE
fdm.conf		fdm.conf
fdm.sh		fdm.sh
pom.xml		pom.xml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

There is a follow up project now on https://github.com/MaRDI4NFDI/MetaExtractIng

ExtractIng

Prerequisites

Build

Configuration

Usage

Version History / Change Log

Development Roadmap

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

bjschembera/ExtractIng

Folders and files

Latest commit

History

Repository files navigation

There is a follow up project now on https://github.com/MaRDI4NFDI/MetaExtractIng

ExtractIng

Prerequisites

Build

Configuration

Usage

Version History / Change Log

Development Roadmap

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages