Projectdetail

Grant DOI 10.55776/E113
Funding program Impulse Projects
Status Ended
Start September 1, 2006
End August 31, 2008
Funding amount € 95,000

Disciplines

Computer Sciences (100%)

Keywords

Data Cleaning,
Record Linkkage,
Normierung,
Webdatenextraktion,
Änlichkeitsmaße,
Data Fusion

Abstract

In the project LAUNDRY a data cleaning framework with special focus on data extracted from the World Wide Web is designed and implemented as prototype. Components for structural, semantic and syntactic normalization, tokenization, de-duplication, cleaning of inconsistencies, and data fusion are developed. Techniques, methods and tools for data cleaning are studied in context of efficiency and performance and used accordingly; moreover, new techniques for particular components are developed. The advantages of the LAUNDRY system are on the one hand in its open, pluggable, and modular framework, and on the other hand in the interactive generation of cleaning components, and addititionally the data cleaning extensions directly plugged into the Lixto Suite, a sophisticated software for web data extraction and processing. The LAUNDRY system offers all phases of data cleaning based on efficient algorithms, and can be extended with new algorithms. The LAUNDRY data cleaning framework will primarily be used for cleaning of web data that has been extracted with the Lixto Suite. Lixto Suite offers the interactive configuration and runtime environment for data extraction from the web. In numerous application scenarios such as Competitive Intelligence it turned out that web data are very heterogeneous. Hence, beside challenging techniques for extraction from semistructured data also methods for data cleaning, in particular normalization, record linkage, and data fusion are required. With LAUNDRY it is possible to treat these problems, and therefore web data will more easily and efficiently be usable in enterprise applications such as in competitive intelligence (price comparison, product comparison) scenarios.

Project participants

Gilbert Hödl, associated research partner

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

LAUNDRY - Das Lixto Web Data Cleaning Framework

Disciplines

Keywords

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

LAUNDRY - Das Lixto Web Data Cleaning Framework

Disciplines

Keywords