Advanced Data Analysis on Computational Grids
Advanced Data Analysis on Computational Grids
Disciplines
Computer Sciences (100%)
Keywords
-
Computational grids,
OLAP,
Parallel a. distributed query evaluation,
Virtual Data Cubes,
Grand-challenge applications,
Data Analysis
Computational Grids, federations of geographically distributed heterogeneous hardware, software, databases, and other resources, are emerging in academia, between international research labs and within commercial organizations. Built on the Internet and the World Wide Web, the Grid is considered as a new class of infrastructure for 21st century science and business. In the first phase, the Grid research has been almost exclusively driven by grand-challenge scientific applications. Now the focus is going to shift to more general domains, closer to everyday life. The proposed project aims to extend the state-of-art Grid technology to a completely new and societally important category of applications. It will develop and thoroughly evaluate the novel concepts of knowledge discovery in databases and other large data sets attached to the Grid. The project focuses its effort on data mining and On-Line Analytical Processing (OLAP), two complementary technologies, which, if applied in conjunction, can provide a highly efficient and powerful data analysis and knowledge discovery solution on the Grid. These two technologies will be investigated and experimentally implemented within a novel infrastructure called GridMiner, which will be built on top of services developed by other Grid projects. The project will also develop new Grid architecture levels, which will perform information integration from heterogeneous data sources, manage OLAP data structures, and accommodate knowledge discovery and knowledge presentation components. Parallel and distributed query evaluation and optimization techniques, like OLAP aggregation and query results caching, will be extended and adapted to this environment to guarantee high performance and good scalability even for very large data sets as they commonly occur in large-scale scientific and commercial applications. The elaborated methodology and new Grid services will be validated and tested thoroughly on the project Grid- testbed containing computational resources across several countries. Two distributed applications will primarily be addressed: treatment of traumatic brain injury victims and weather forecast and air pollution modeling.
The Grid, a federation of geographically distributed heterogeneous hardware, software, databases, and other resources working together to reach common goals, has been identified as a crucial and revolutionary technology for 21st century by a remarkable breadth of scientific, engineering, commercial, and industrial fields. It has already motivated the emergence of many novel grand-challenge applications, which could not be solved without its support. This project investigated the issues associated with knowledge discovery in databases integrated to the Grid. Across a wide variety of fields, e.g., in high performance scientific simulations, experiments performed on a new generation of high-resolution scientific instruments, enormous amounts of data, often well in the multi- terabyte range are generated. Gaining insights and extracting latent knowledge from these large volumes of data, which can be geographically distributed, requires advanced data management, intelligent data reduction, preprocessing and integration, high performance data mining methods, and novel software mechanisms for efficient specification and coordination of data analysis processes. The project investigated all these aspects and, on top of developed algorithms, workflow specification mechanisms, techniques and methods, implemented an advanced research infrastructure called GridMiner. It includes software services for sequential, parallel and distributed data and text mining, On-Line Analytical Processing (OLAP), data integration based on mediator/wrapper principles, data quality monitoring and improving based on advanced data statistics, and visualization of the results of data analysis and surveying tasks. A special research effort of the project dealt with the integration of all needed services into an interactive workflow, which was executed by an appropriate engine service. Now the GridMiner users are provided with a Graphical User Interface and other supporting mechanisms allowing them to compose and run the workflows according to their individual needs. A portal to the GridMiner is available on the Web. All this research and development was conducted in cooperation with some of the worldwide leading Grid research and application groups. The set of pilot applications directly profiting from the project results includes the medical domain (cancer research, traumatic brain injuries, neurological diseases, and brain informatics) and the ecological domain (environment monitoring and events prediction).
- Technische Universität Wien - 100%
- Kurt Stockinger, CERN - Switzerland
Research Output
- 2 Citations
- 1 Publications
-
2004
Title Towards Service Collaboration Model in Grid-based Zero Latency Data Stream Warehouse (GZLDSWH) DOI 10.1109/scc.2004.1358025 Type Conference Proceeding Abstract Author Nguyen T Pages 357-365