Ready-to-use benchmarking distribution on basis of the VLO importer and a SOLR instance running inside Tomcat.
The dataset can be obtained from this b2drop share.
-
Make a directory
/var/vlo -
Copy or link the 'data' directory with VLO benchmark data to that directory so that we have
/var/vlo/data/clarin-others/var/vlo/data/hathi/var/vlo/data/KB/var/vlo/data/test
-
Make a directory
/var/vlo/solrdataand make sure it is writable to the user that will be running the tomcat server; this directory will be populated by the SOLR index data, make sure there is enough disk space (at least several gigabytes) -
Start the Tomcat instance with SOLR inside using
start-solr.sh- The Tomcat will run on port 9080 - make sure it is available before starting (it will also occupy ports 9005, 9009 and 9443)
- Check that it's running at http://localhost:9080/vlo-solr-3.1
- The Tomcat can be stopped again by running
stop-solr.sh
-
Start the import by running
time-import.sh- The import will fail with an exception if the SOLR Tomcat is not running or cannot be found at the expected location (see above)
- This will create a file
import-time.out.${timestamp}with timing information - Detailed importer output is available at
vlo/log/vlo-importer.log - The import can take a long time so you may want to run it detached from any terminal session
- A quick test import can be carried out by running
time-import.sh vlo/config/VloConfig-test.xml
In a scheduled setup, the following should happen periodically (assuming that the SOLR Tomcat is running already):
- Download a fresh copy of the data set
- Unpack the data set into the import location (
/var/vlo/data) - Run the import via
time-import.sh
- The import locations (defaults are in
/var/vlo/data) are configured invlo/config/VloConfig.xml - The SOLR data directory location (default is
/var/vlo/solrdata) is configured invlo/config/solr/collection1/conf/solrconfig.xml - If you wish to change the Tomcat port(s), change the following:
- The actual port configurations in
tomcat/conf/server.xml - The SOLR ULR for the importer to connect to in
vlo/config/VloConfig.xml
- Do NOT try to start the tomcat from any other location using the Tomcat
startup script, as the location of the SOLR configuration is defined with a
relative path (in
tomcat/webapps/vlo-solr-3.1/META-INF/context.xml)