We provide some of the most useful/popular datasets from the LOD cloud in HDT for you to use them easily. If the dataset you need is not available here, you can create your own or kindly ask the data provider to publish their datasets in HDT format for all the community to enjoy.
We are serving here more than 15 Billion triples in HDT files. You can find more than 38 Billion triples in LOD-a-lot, the HDT dataset that collects all triples from LOD Laundromat, a project serving a crawl of a very big subset of the Linked Open Data Cloud.
Important Note (12 April 2022): We are experiencing some technical problems on our “gaia” server, so unfortunately some datasets could be unavailable (e.g. Wikidata). We hope to resolve this issue soon, thanks for your understanding!
Dataset | Size | Triples | Details | Provenance |
Latest Wikidata (3rd march 2021) | 53GB (149GB uncompressed) | 14.6B Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (64GB compressed, 97GB uncompressed). This dataset corresponds to the 3rd march 2021 wikidata dump. You should first unzip the HDT dataset and the additional index to make use of them. | Wikidata dumps.
(Special thanks to Axel Polleres and the Institute for Data, Process and Knowledge Management at WU Vienna for their infrastructure) |
Latest Wikidata (9th march 2020) | 50GB (119GB uncompressed) | 12B Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (55GB compressed, 77GB uncompressed). This dataset corresponds to the 2020-03-09 wikidata dump. You should first unzip the HDT dataset and the additional index to make use of them. | Wikidata dumps. |
DBPedia 2016-10 English | 34GB | 1.8B Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (19GB). This dataset corresponds to the DBpedia 2016-10 release, disregarding NIF data. | Official DBpedia 2016-10 release |
DBLP 2017 | 1GB | 882M Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (0.5GB). This dataset corresponds to the dump of the DBLP Computer Science Bibliography. | Kindly provided and hosted by: |
Freebase | 11GB | 2B Triples | 2013-12-01 Dump. | From the official Freebase Dump in RDF. |
YAGO2s Knowledge Base | 903MB | 159M Triples | 2013-05-08 Dump. | TTL dump of Max Planck Institut Informatik. |
LinkedGeoData | 461MB | 129M Triples | 2012-10-09 Dump | LinkedGeoData download page. |
Geonames | 344MB | 123M Triples | 2012-11-11 Dump | Geonames official dump. |
Wiktionary English | 212MB | 64M Triples | Wiktionary Download Page. | |
Wiktionary French | 124MB | 32M Triples | ||
Wikitionary Deutch | 23MB | 5M Triples | ||
Wikitionary Russian | 40MB | 12M Triples | ||
WordNet 3.1 | 23MB | 5.5M Triples | Generated from the 3.1 NTriples dump on 2014-04-16. | Princeton Wordnet 3.1 in RDF. |
Semantic Web Dog Food | 2.3MB | 242K Triples | 2012-11-28 Dump | SWDF. |
Other (older) datasets:
Dataset | Size | Triples | Details | Provenance |
Wikidata (2019-06-19) | 37GB (88GB uncompressed) | 9.3B Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (45GB compressed, 58GB uncompressed). This dataset corresponds to the 2019-06-19 wikidata dump. You should first unzip the HDT dataset and the additional index to make use of them. | Wikidata dumps. |
Wikidata (2018-09-11) | 28GB (69GB uncompressed) | 7.2B Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (45GB). This dataset corresponds to the 2018-09-11 wikidata dump. | Wikidata dumps. |
DBPedia 2016-04 English | 13GB | 1B Triples | The additional “.index” HDT file (required to speed up all queries) is also available for download (5GB). This dataset corresponds to the DBpedia 2016-04 release. | Kindly provided and hosted by: |
Wikidata 2017-03-13 | 7GB | 2262M Triples | 2017-03-13 Dump. | Wikidata dumps. |
DBpedia 3.9 English | 2.4GB | 474M Triples | All canonicalized datasets together in one big file | Official DBpedia Web Site. |
DBpedia 3.8 English | 2.8GB | 431M Triples | All canonicalized datasets together in one big file | Official DBpedia Web Site. |
DBpedia 3.8 English No Long Abstracts | 2.4GB | 427M Triples | A reduced version without long abstracts | |
DBpedia 3.8 English By Section | One HDT file by section | |||
DBpedia 3.8 English External Links | Links to other datasets | |||
DBLP Computer Science Bibliography | 286MB | 55M Triples | 2012-11-28 Dump | Faceted DBLP project. |
WordNet 3.0 | 26MB | 8M Triples | All the turtle files found in the Git repository as of 2013-03-20. | Princeton Wordnet Page. |
Disclaimer: These datasets were downloaded from the public Web Sites of their respective maintainers and converted to HDT. We are not responsible for the accuracy of the content. Also check their respective licenses on their own sites for details on usage rights.