Projectdetail

Grant DOI 10.55776/M1720
Funding program Lise Meitner
Status ended
Start January 1, 2015
End April 30, 2017
Funding amount € 125,000
Project website

Disciplines

Computer Sciences (75%); Economics (25%)

Keywords

Linked Data,
Archiving,
Indexing,
Semantic Web,
Compression,
Temporal Queries

Abstract

Final report

The Linked Data paradigm promotes the use of the RDF data model to publish structured data on the Web and to create data-to-data links between different data sources. As a result, a continuously growing interconnected Web of data, consisting of typed hyperlinks between interconnected resources and documents has emerged over the past years and attracted the attention of several research areas, such as indexing and querying, reasoning, visualization of RDF data. However, structured interlinked datasets in this Web of data are not static but continuously evolving, which suggests the investigation of approaches to observe and preserve Linked data across time. This project tackles the problem of archiving and querying semantic Web data. Although traditional techniques for crawling and archiving collections of Web documents together with preservation policies of such collections, could assure the availability and traceability of datasets over time, current approaches are seriously compromised by scalability limits compared to Web scale and growth: as opposed to the scale- free nature of the Web itself, (a) archiving infrastructures are typically centralized; in addition, (b) these infrastructures include only very basic search capabilities, whereas upon archives for structured data on the Web also structured and time-traversing queries are an emerging demand. As for (a), we propose a decentralized, federated architecture for compressed and queryable archiving of Semantic Web data. To that end, we will develop a novel representation technique on the basis of succinct data structures, leading to compressed Linked archives for Linked data that can be distributed in a modular fashion. As for (b) we will investigate and extend suitable query languages for archived data capturing the required expressiveness for archives of evolving interlinked information, allowing to query not only data backwards but also queries about how data developed over time. Our architecture shall support querying for evolution patterns, time-traversing queries and ontological reasoning within and across such archives. Finally, we will validate all our steps on real Data, by archiving governmental Open Data. We will crawl and archive a large corpus of evolving interlinked governmental data and we evaluate our theoretical results on this corpus, assuring the feasibility and sustainability of the approach in a real-world domain of societal interest. We aim at advancing on several research fields (e.g. compact representations and indexing of evolving data at large scale, efficient query models and services enabling queries across time). The proposed work complements existing expertise in the host institution in data extraction and pattern recognition, semantic crawling, live queries, and foundations of query languages and reasoning. We have also established several cooperation agreements supporting project goals and strengthening international collaborations in the host institution.

The Semantic Web is an open system to organize how people publish data in order to foster the reusability, integrability of datasets, discoverability and automatic querying and processing of datasets. However, in the absence of a central control mechanism, this huge knowledge base is ephemeral: datasets constantly appear, change and disappear. This project tackled the problem of efficiently archiving and querying semantic Web data.In our project we have developed, to the best of our knowledge, (i) the first system able to archive and query large amounts of semantic Web data in a compressed form (called v-RDFCSA), (ii) the first evaluation benchmark (called BEAR) to evaluate the storage space efficiency of archives, the retrieval functionality they serve, and the performance of various retrieval operations, and (iii) the first practical on-demand archive of DBpedia (called DBpedia Wayback Machine), a semantic conversion of Wikipedia. In turn, (iv) we have set theoretical foundations to understand the complexity of archives and queries over time, (v) we have identified the different queries emerging when querying archives and we provided practical query languages to model and resolve such queries, and (vi) we also advanced on scalable semantic web representation, compression and indexing, which provides the baseline to manage the different versions of semantic archives. Finally, we have also inspected (vii) the structures and commonalities in real-world datasets and (viii) a practical way to perform updates on the semantic data such as DBpedia.Overall, this project has significantly impacted and advanced the management the evolution and preservation of the emerging Big Semantic Data: in addition to the multiple academic records in the main international venues of the semantic web and the data compression fields, the project organized two international workshops (MEPDaW) and is currently managing a journal special issue on managing semantic data archives. The project, hosted in WU University of Vienna (Austria), established novel collaborations with the University of Valladolid and University of A Coruña (Spain), VU Amsterdam (The Netherlands), the University of Chile (Chile), the University of Bonn and Fraunhofer IAIS (Germany) and the Italian National Research Council (Italy).

Research institution(s)

Wirtschaftsuniversität Wien - 100%

International project participants

Claudio Gutierrez, Universidad de Santiago de Chile - Chile
Sören Auer, Leibniz Universität Hannover - Germany
Pablo De La Fuente, Universidad de Valladolid - Spain

Research Output

115 Citations
23 Publications

Publications

Title	Towards Updating Wikipedia via DBpedia Mappings and SPARQL.
Type	Conference Proceeding Abstract
Author	Ahmeti A
Conference	10th Alberto Mendelzon Workshop on Fundations of Data Management

Title	Self-Indexing RDF Archives**Funded by MINECO (PGE and FEDER) grants TIN2013-46238-C4-3-R, TIN2013-47090-C3-3-P, and TIN2015-69951-R; CDTI, MINE CO grant ITC-20151247; ICT COST Action IC1302; Xunta de Galicia (co-founded with FEDER) grant GRC2013/053;
DOI	10.1109/dcc.2016.40
Type	Conference Proceeding Abstract
Author	Cerdeira-Pena A
Pages	526-535

Title	Evaluating Query and Storage Strategies for RDF Archives
DOI	10.1145/2993318.2993333
Type	Conference Proceeding Abstract
Author	Fernández J
Pages	41-48
Link	Publication

Title	Report on the 2nd Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW 2016)
DOI	10.1145/3053408.3053423
Type	Journal Article
Author	Debattista J
Journal	ACM SIGIR Forum
Pages	82-88

Title	Characterising RDF data sets
DOI	10.1177/0165551516677945
Type	Journal Article
Author	Fernández J
Journal	Journal of Information Science
Pages	203-229

Title	LOD-a-lot
DOI	10.1145/3132218.3132241
Type	Conference Proceeding Abstract
Author	Beek W
Pages	181-184
Link	Publication

Title	Updating Wikipedia via DBpedia Mappings and SPARQL
DOI	10.1007/978-3-319-58068-5_30
Type	Book Chapter
Author	Ahmeti A
Publisher	Springer Nature
Pages	485-501

Title	Self-Enforcing Access Control for Encrypted RDF
DOI	10.1007/978-3-319-58068-5_37
Type	Book Chapter
Author	Fernández J
Publisher	Springer Nature
Pages	607-622

Title	V- iHDT++: un Autondice Semntico para la Resolucin de Triple Patterns SPARQL.
Type	Conference Proceeding Abstract
Author	Fernández Jd Et Al
Conference	XXII Jornadas de Ingeniería del Software y Bases de Datos (JISBD)

Title	Ontology-Based Search of Genomic Metadata
DOI	10.1109/tcbb.2015.2495179
Type	Journal Article
Author	Fernandez J
Journal	IEEE/ACM Transactions on Computational Biology and Bioinformatics
Pages	233-247
Link	Publication

Title	Serializing RDF in Compressed Space**Research funded by Ministerio de Economiay Competitividad Spain: TIN2013-46238-C4-3-R, and Austrian Science Fund (FWF): M1720-G11
DOI	10.1109/dcc.2015.16
Type	Conference Proceeding Abstract
Author	Hernández-Illera A
Pages	363-372

Title	On the Road to the Evaluation of RDF Stream Compression Techniques.
Type	Conference Proceeding Abstract
Author	Arias J
Conference	RDF Stream Processing Workshop, co-located with 12th European Semantic Web Conference (ESWC 2015)

Title	Compresin de Big Semantic Data basada en HDT y MapReduce.
Type	Conference Proceeding Abstract
Author	Fernández Jd Et Al
Conference	XXI Jornadas de Ingeniería del Software y Bases de Datos (JISBD)

Title	V-RDFCSA: Compresin e Indexacin de Colecciones de Versiones RDF.
Type	Conference Proceeding Abstract
Author	Cerdeira-Pena A
Conference	XXI Jornadas de Ingeniería del Software y Bases de Datos (JISBD)

Title	Self-Indexing RDF Archives.
Type	Conference Proceeding Abstract
Author	Cerdeira-Pena A
Conference	Data Compression Conference 2016.

Title	Report on the 2ndWorkshop on Managing the Evolution and Preservation of the Data Web (MEPDaW 2016).
Type	Conference Proceeding Abstract
Author	Debattista J
Conference	SIGIR Forum, December 2016

Title	The DBpedia wayback machine
DOI	10.1145/2814864.2814889
Type	Conference Proceeding Abstract
Author	Fernández J
Pages	192-195

Title	BEAR: Benchmarking the Efficiency of RDF Archiving.
Type	Journal Article
Author	Fernández Jd
Journal	Technical Report 02/2015, Department fur Informationsverarbeitung und Prozessmanagement, WU Vienna University of Economics and Business

Title	HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce
DOI	10.1007/978-3-319-18818-8_16
Type	Book Chapter
Author	Giménez-García J
Publisher	Springer Nature
Pages	253-268

Title	Towards Efficient Archiving of Dynamic Linked Open Data.
Type	Conference Proceeding Abstract
Author	Fernández Jd
Conference	DIACHRON Workshop on Managing the Evolution and Preservation of the Data Web co-located with 12th European Semantic Web Conference (ESWC 2015)

Title	Improving the usability of Open Data portals from a business process perspective.
Type	Conference Proceeding Abstract
Author	Di Ciccio C
Conference	ODQ2015: Open Data Quality: from Theory to Practice Workshop.

DOI	10.1145/3132218
Type	Other

Title	Compresin de Big Semantic Data basada en HDT y MapReduce.
Type	Other
Author	Fernández Jd Et Al

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Querying Archives of Dynamic Linked Open Data

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Querying Archives of Dynamic Linked Open Data

Disciplines

Keywords

Research Output