1. Get the HDT Library 2. Compiling the Java Library / Tools 3. Using the Java Command Line tools 4. Generating and Searching HDT files programmatically from Java |
1. Get the HDT Library
You can download the Java HDT Library both as binary distribution and also the source code from the GitHub repository.
Acknowledgements: If you use our tools in your research, please acknowledge them by citing the following papers: show
2. Compiling the Java Library / Tools
The java library can be compiled using apache ant
. Just download the source distribution and run ant jar
to generate a jar package with the HDT library. For loading HDT files, there are no additional dependencies, just add the hdt-lib.jar
to your application’s classpath. However, there are some optional dependencies:
- Jena RIOT for parsing files in formats other than NTriples when generating HDT files.
- JCommander to use the Java Command Line tools.
3. Using the Java Command Line Tools
Once you have compiled the library or downloaded the binary distribution, you can use the commandline line tools to convert/browse HDT files. You can use the convenient launch scripts (*.sh or *.bat depending on your OS) to execute them. These are the typical operations that you’d probably want to perform:
- Convert your RDF Data to the HDT representation. You might need to increase the memory (The
-Xmx1G
option) inside the script to generate very big files, since this process is memory-intensive. Please note that the tool accepts input files compressed with GZIP (E.g..nt.gz
). To convert it, just do:
$ ./rdf2hdt.sh data/test.nt data/test.hdt
- Convert an HDT to another serialization format, such as NTriples:
$ ./hdt2rdf.sh data/test.hdt data/test.hdtexport.nt
- Open a terminal to search triple patterns within an HDT file:
$ ./hdtSearch.sh data/test.hdt >> ? ? ? http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5 http://example.org/uri4 http://example.org/predicate4 http://example.org/uri5 http://example.org/uri1 http://example.org/predicate1 "literal1" http://example.org/uri1 http://example.org/predicate1 "literalA" http://example.org/uri1 http://example.org/predicate1 "literalB" http://example.org/uri1 http://example.org/predicate1 "literalC" http://example.org/uri1 http://example.org/predicate2 http://example.org/uri3 http://example.org/uri1 http://example.org/predicate2 http://example.org/uriA3 http://example.org/uri2 http://example.org/predicate1 "literal1" 9 results shown. >> http://example.org/uri3 ? ? http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5 2 results shown. >> exit
- Extract the Header of an HDT file:
$ ./hdtInfo.sh data/test.hdt > header.nt
4. Generating and Searching HDT files programmatically from Java
- Generating an HDT File (available at
examples/ExampleGenerate.java
):
public class ExampleGenerate { public static void main(String[] args) throws Exception { // Configuration variables String baseURI = "http://example.com/mydataset"; String rdfInput = "/path/to/dataset.nt"; String inputType = "ntriples"; String hdtOutput = "/path/to/dataset.hdt"; // Create HDT from RDF file HDT hdt = HDTManager.generateHDT( rdfInput, // Input RDF File baseURI, // Base URI RDFNotation.parse(inputType), // Input Type new HDTSpecification(), // HDT Options null // Progress Listener ); // OPTIONAL: Add additional domain-specific properties to the header: //Header header = hdt.getHeader(); //header.insert("myResource1", "property" , "value"); // Save generated HDT to a file hdt.saveToHDT(hdtOutput, null); } }
- Searching Triple Patterns inside an HDT File (available at
examples/ExampleSearch.java
):
// Load an HDT and perform a search. (examples/ExampleSearch.java) public static void main(String[] args) throws Exception { // Load HDT file. // NOTE: Use loadIndexedHDT() for ?P?, ?PO or ??O queries HDT hdt = HDTManager.loadHDT("data/example.hdt", null); // Search pattern: Empty string means "any" IteratorTripleString it = hdt.search("", "", ""); while(it.hasNext()) { TripleString ts = it.next(); System.out.println(ts); } }
You can also use HDTManager.mapHDT()
and HDTManager.mapIndexedHDT()
to map the file instead of loading everything into main memory. The main advantage is that it requires much less memory, as it loads the data from disk on-demand, and therefore allows loading files even bigger than the machine’s main memory. It also allows the Operating System to keep the fragments cached even after closing the application. This results in faster initial loading time, and even allows several processes to access the same HDT file without having multiple copies in memory. The disadvantage is that searches can be slower, especially the first ones and/or when the system does not have much free memory to cache the read blocks.
Please note that the HDT Object is Thread-safe. You can share a single HDT instance between multiple threads of your application. HDT is very concurrency friendly, so you will notice a significative performance improvement when doing so. However, the iterator returned by hdt.search()
is not thread-safe and should only be used from the Thread that did the hdt.search()
petition.