1. Get the HDT Library 2. Using HDT Files from Apache Jena 3. Create a SPARQL Endpoint of HDT files using Jena Fuseki |
1. Get the HDT Library
The Jena wrapper is included in the the binary distribution of HDT-java. You can also checkout the latest source code from the Java HDT library in the GitHub repository.
Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show
2. Using HDT Files from Apache Jena
The jena wrapper provides a Graph implementation on top of the HDT Library to access HDT files as a normal read-only Jena Model. In order to use it, you will need to include the hdt-jena.jar
in addition to the hdt-lib.jar
in your application’s classpath (see the manual of the Java HDT library).
Then, you can use this fragment of code to get the Jena Model (hdt-jena/examples/HDTSparql.java):
// Load HDT file using the hdt-java library HDT hdt = HDTManager.mapIndexedHDT("path/to/file.hdt", null); // Create Jena Model on top of HDT. HDTGraph graph = new HDTGraph(hdt); Model model = ModelFactory.createModelForGraph(graph); // Use Jena Model as Read-Only data storage, e.g. Using Jena ARQ for SPARQL.
This Jena Model can also be used in conjunction with Jena ARQ to solve SPARQL Queries.
3. Create a SPARQL Endpoint of HDT files using Jena Fuseki
Thanks to the Jena Integration, you can use Jena Fuseki to set up a public endpoint of one or many HDT files in minutes. You just need to specify which HDT files you want to publish by following these steps:
- Download Jena Fuseki. You will need to download and extract the file named
jena-fuseki-XXX.zip
ortar.gz
. - Create a Fuseki configuration file (See an example). You will need to initialize the HDT Assembler class using the property
ja:loadClass
on thefuseki:Server
instance, and define the HDT-related classes:[] rdf:type fuseki:Server ; ja:loadClass "org.rdfhdt.hdtjena.HDTGraphAssembler" . hdt:DatasetHDT rdfs:subClassOf ja:RDFDataset . hdt:HDTGraph rdfs:subClassOf ja:Graph .
Then, you create a Service with query and read capabilities. This service contains one or many datasets which in turn are associated to one or many graphs. Each Graph can be associated to an HDT file or any other Jena data source.
<#service1> rdf:type fuseki:Service ; fuseki:name "hdtservice" ; fuseki:serviceQuery "query" ; fuseki:serviceReadGraphStore "get" ; fuseki:dataset <#dataset> . <#dataset> rdf:type ja:RDFDataset ; rdfs:label "Dataset" ; ja:defaultGraph <#graph1> ; ja:namedGraph [ ja:graphName <http://example.org/name1> ; ja:graph <#graph2> ] . <#graph1> rdfs:label "RDF Graph1 from HDT file" ; rdf:type hdt:HDTGraph ; hdt:fileName "file1.hdt". <#graph2> rdfs:label "RDF Graph2 from HDT file" ; rdf:type hdt:HDTGraph ; hdt:fileName "file2.hdt" .
- Then, you need to edit the Fuseki launch script to include the
hdt-lib.jar
andhdt-jena.jar
libraries to the classpath. By default Fuseki uses the-jar
option to launch the program, which will ignore any additional-classpath
directive, so you will need to remove the-jar
option, add thefuseki-server.jar
and HDT jars to the classpath, and the fully qualified name of the fuseki launcher class.After these changes, the line that calls java in the Fuseki’s launch script
fuseki-server.bat
for windows will look something like:java -Xmx1200M -classpath "hdt-lib.jar;hdt-jena.jar;fuseki-server.jar" \ org.apache.jena.fuseki.FusekiCmd %*
On Mac/Linux’s
fuseki-server
:java $JVM_ARGS -classpath "hdt-lib.jar:hdt-jena.jar:$JAR" \ org.apache.jena.fuseki.FusekiCmd "$@"
- And finally, launch Fuseki using your custom config file:
$ ./fuseki-server --config=fuseki_example.ttl
- Try your SPARQL Endpoint on a Web Browser: http://localhost:3030.