Querying Archives of Dynamic Linked Open Data
Querying Archives of Dynamic Linked Open Data
Disciplines
Computer Sciences (75%); Economics (25%)
Keywords
-
Linked Data,
Archiving,
Indexing,
Semantic Web,
Compression,
Temporal Queries
The Linked Data paradigm promotes the use of the RDF data model to publish structured data on the Web and to create data-to-data links between different data sources. As a result, a continuously growing interconnected Web of data, consisting of typed hyperlinks between interconnected resources and documents has emerged over the past years and attracted the attention of several research areas, such as indexing and querying, reasoning, visualization of RDF data. However, structured interlinked datasets in this Web of data are not static but continuously evolving, which suggests the investigation of approaches to observe and preserve Linked data across time. This project tackles the problem of archiving and querying semantic Web data. Although traditional techniques for crawling and archiving collections of Web documents together with preservation policies of such collections, could assure the availability and traceability of datasets over time, current approaches are seriously compromised by scalability limits compared to Web scale and growth: as opposed to the scale- free nature of the Web itself, (a) archiving infrastructures are typically centralized; in addition, (b) these infrastructures include only very basic search capabilities, whereas upon archives for structured data on the Web also structured and time-traversing queries are an emerging demand. As for (a), we propose a decentralized, federated architecture for compressed and queryable archiving of Semantic Web data. To that end, we will develop a novel representation technique on the basis of succinct data structures, leading to compressed Linked archives for Linked data that can be distributed in a modular fashion. As for (b) we will investigate and extend suitable query languages for archived data capturing the required expressiveness for archives of evolving interlinked information, allowing to query not only data backwards but also queries about how data developed over time. Our architecture shall support querying for evolution patterns, time-traversing queries and ontological reasoning within and across such archives. Finally, we will validate all our steps on real Data, by archiving governmental Open Data. We will crawl and archive a large corpus of evolving interlinked governmental data and we evaluate our theoretical results on this corpus, assuring the feasibility and sustainability of the approach in a real-world domain of societal interest. We aim at advancing on several research fields (e.g. compact representations and indexing of evolving data at large scale, efficient query models and services enabling queries across time). The proposed work complements existing expertise in the host institution in data extraction and pattern recognition, semantic crawling, live queries, and foundations of query languages and reasoning. We have also established several cooperation agreements supporting project goals and strengthening international collaborations in the host institution.
The Semantic Web is an open system to organize how people publish data in order to foster the reusability, integrability of datasets, discoverability and automatic querying and processing of datasets. However, in the absence of a central control mechanism, this huge knowledge base is ephemeral: datasets constantly appear, change and disappear. This project tackled the problem of efficiently archiving and querying semantic Web data.In our project we have developed, to the best of our knowledge, (i) the first system able to archive and query large amounts of semantic Web data in a compressed form (called v-RDFCSA), (ii) the first evaluation benchmark (called BEAR) to evaluate the storage space efficiency of archives, the retrieval functionality they serve, and the performance of various retrieval operations, and (iii) the first practical on-demand archive of DBpedia (called DBpedia Wayback Machine), a semantic conversion of Wikipedia. In turn, (iv) we have set theoretical foundations to understand the complexity of archives and queries over time, (v) we have identified the different queries emerging when querying archives and we provided practical query languages to model and resolve such queries, and (vi) we also advanced on scalable semantic web representation, compression and indexing, which provides the baseline to manage the different versions of semantic archives. Finally, we have also inspected (vii) the structures and commonalities in real-world datasets and (viii) a practical way to perform updates on the semantic data such as DBpedia.Overall, this project has significantly impacted and advanced the management the evolution and preservation of the emerging Big Semantic Data: in addition to the multiple academic records in the main international venues of the semantic web and the data compression fields, the project organized two international workshops (MEPDaW) and is currently managing a journal special issue on managing semantic data archives. The project, hosted in WU University of Vienna (Austria), established novel collaborations with the University of Valladolid and University of A Coruña (Spain), VU Amsterdam (The Netherlands), the University of Chile (Chile), the University of Bonn and Fraunhofer IAIS (Germany) and the Italian National Research Council (Italy).
- Wirtschaftsuniversität Wien - 100%
Research Output
- 115 Citations
- 23 Publications
-
2017
Title Characterising RDF data sets DOI 10.1177/0165551516677945 Type Journal Article Author Fernández J Journal Journal of Information Science Pages 203-229 -
2017
Title Report on the 2nd Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW 2016) DOI 10.1145/3053408.3053423 Type Journal Article Author Debattista J Journal ACM SIGIR Forum Pages 82-88 -
2016
Title Evaluating Query and Storage Strategies for RDF Archives DOI 10.1145/2993318.2993333 Type Conference Proceeding Abstract Author Fernández J Pages 41-48 Link Publication -
2015
Title BEAR: Benchmarking the Efficiency of RDF Archiving. Type Journal Article Author Fernández Jd Journal Technical Report 02/2015, Department fur Informationsverarbeitung und Prozessmanagement, WU Vienna University of Economics and Business -
2015
Title Serializing RDF in Compressed Space**Research funded by Ministerio de Economiay Competitividad Spain: TIN2013-46238-C4-3-R, and Austrian Science Fund (FWF): M1720-G11 DOI 10.1109/dcc.2015.16 Type Conference Proceeding Abstract Author Hernández-Illera A Pages 363-372 -
0
DOI 10.1145/3132218 Type Other -
0
Title Compresin de Big Semantic Data basada en HDT y MapReduce. Type Other Author Fernández Jd Et Al -
2017
Title V- iHDT++: un Autondice Semntico para la Resolucin de Triple Patterns SPARQL. Type Conference Proceeding Abstract Author Fernández Jd Et Al Conference XXII Jornadas de IngenierÃa del Software y Bases de Datos (JISBD) -
2016
Title V-RDFCSA: Compresin e Indexacin de Colecciones de Versiones RDF. Type Conference Proceeding Abstract Author Cerdeira-Pena A Conference XXI Jornadas de IngenierÃa del Software y Bases de Datos (JISBD) -
2016
Title Report on the 2ndWorkshop on Managing the Evolution and Preservation of the Data Web (MEPDaW 2016). Type Conference Proceeding Abstract Author Debattista J Conference SIGIR Forum, December 2016 -
2016
Title Self-Indexing RDF Archives. Type Conference Proceeding Abstract Author Cerdeira-Pena A Conference Data Compression Conference 2016. -
2016
Title Towards Updating Wikipedia via DBpedia Mappings and SPARQL. Type Conference Proceeding Abstract Author Ahmeti A Conference 10th Alberto Mendelzon Workshop on Fundations of Data Management -
2016
Title Self-Indexing RDF Archives**Funded by MINECO (PGE and FEDER) grants TIN2013-46238-C4-3-R, TIN2013-47090-C3-3-P, and TIN2015-69951-R; CDTI, MINE CO grant ITC-20151247; ICT COST Action IC1302; Xunta de Galicia (co-founded with FEDER) grant GRC2013/053; DOI 10.1109/dcc.2016.40 Type Conference Proceeding Abstract Author Cerdeira-Pena A Pages 526-535 -
2015
Title Improving the usability of Open Data portals from a business process perspective. Type Conference Proceeding Abstract Author Di Ciccio C Conference ODQ2015: Open Data Quality: from Theory to Practice Workshop. -
2015
Title On the Road to the Evaluation of RDF Stream Compression Techniques. Type Conference Proceeding Abstract Author Arias J Conference RDF Stream Processing Workshop, co-located with 12th European Semantic Web Conference (ESWC 2015) -
2015
Title Ontology-Based Search of Genomic Metadata DOI 10.1109/tcbb.2015.2495179 Type Journal Article Author Fernandez J Journal IEEE/ACM Transactions on Computational Biology and Bioinformatics Pages 233-247 Link Publication -
2017
Title Self-Enforcing Access Control for Encrypted RDF DOI 10.1007/978-3-319-58068-5_37 Type Book Chapter Author Fernández J Publisher Springer Nature Pages 607-622 -
2017
Title Updating Wikipedia via DBpedia Mappings and SPARQL DOI 10.1007/978-3-319-58068-5_30 Type Book Chapter Author Ahmeti A Publisher Springer Nature Pages 485-501 -
2017
Title LOD-a-lot DOI 10.1145/3132218.3132241 Type Conference Proceeding Abstract Author Beek W Pages 181-184 Link Publication -
2015
Title HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce DOI 10.1007/978-3-319-18818-8_16 Type Book Chapter Author Giménez-GarcÃa J Publisher Springer Nature Pages 253-268 -
2015
Title Towards Efficient Archiving of Dynamic Linked Open Data. Type Conference Proceeding Abstract Author Fernández Jd Conference DIACHRON Workshop on Managing the Evolution and Preservation of the Data Web co-located with 12th European Semantic Web Conference (ESWC 2015) -
2015
Title The DBpedia wayback machine DOI 10.1145/2814864.2814889 Type Conference Proceeding Abstract Author Fernández J Pages 192-195 -
2016
Title Compresin de Big Semantic Data basada en HDT y MapReduce. Type Conference Proceeding Abstract Author Fernández Jd Et Al Conference XXI Jornadas de IngenierÃa del Software y Bases de Datos (JISBD)