Visual Analysis of Heterogeneous Data using Semantic Subsets
Visual Analysis of Heterogeneous Data using Semantic Subsets
Disciplines
Computer Sciences (100%)
Keywords
-
Visualization,
Visual Analytics,
Human Computer Interaction,
Human Computer Interaction,
Genetics,
Information Visualization
Analyzing and understanding very large and heterogeneous datasets is a fundamental challenge researchers face in many scientific domains. Disciplines such as astronomy, physics and biology have to deal with datasets of an unprecedented scale and complexity. While analyzing these datasets is challenging, they also have the potential to revolutionize our understanding of the underlying processes. To realize this potential, novel analysis approaches have to be developed in all fields of the data sciences. In this proposal for an Erwin Schrödinger fellowship I introduce semantic subsets as a novel method for the visual analysis of large, heterogeneous, and multiple datasets. I propose to leverage machine learning, statistical and other methods to first partition datasets into meaningful subsets, and then use a tight integration of computational and visualization methods to support experts in choosing subsets relevant to a task. These subsets and their relationships are then visualized, facilitating an open, exploratory analysis of the data. The core research challenges addressed in this proposal are how to efficiently and effectively find suitable subsets, manage multiple subsets, and visualize the relationships between them. I argue that this approach is suitable to address the problems posed by the analysis of multiple large and heterogeneous datasets, as it scales well, is highly flexible, and naturally integrates multiple datasets. I intend to develop prototypes realizing the semantic subsets concept for the analysis of biomolecular data in design studies. These applications will be the product of a user-centered design process involving close collaboration with domain experts. The applications will address the domain expert`s data analysis problems and aid them in their scientific discovery process. The formal evaluation of the utility of the approach will be conducted using case studies based on longitudinal observations of the deployed applications in addition to controlled user studies. I plan to conduct this research at the Visual Computing Group at Harvard University, lead by Professor Hanspeter Pfister. Professor Pfister and his group have considerable expertise in developing visualization methods for molecular biology. In addition, the greater Boston area is home to many top-tier molecular biology research labs, including the Harvard Medical School and the Broad Institute of MIT and Harvard, to which Professor Pfister and myself have established ties. This environment is therefore uniquely suited to the proposed kind of research. During the planned return phase at the Institute for Computer Graphics and Vision at Graz University of Technology I will not only be able to pass on my gained knowledge to my peers and to students, but will also be able to support Professor Schmalstieg in his agenda of building a strong data visualization group in Graz and thereby strengthen the already sizable Austrian visualization research community.
In this project we investigated how the concept of semantic subsets can be employed to visualize and analyze large and complex data. Semantic subsets are a method that visualizes small subsets of a large dataset, instead of showing a global overview first. The benefit of this approach is its suitability for very large and complex datasets, the challenges relate to the methods to identifying interesting subsets in the first place and, once these are identified, to find and show related subsets. To overcome these challenges we have developed methods that take users on a smart tour of the dataset, instead of showing all of the data at once. Methods to query large datasets and identify interesting subsets. We developed two techniques to rank and slice subsets of datasets. First, we developed a method to interactively rank multivariate datasets. Ranking is essential in identifying important items, yet, due to the complex combination of attributes and potential biases it is impossible to develop an objective ranking function. Our technique remedies this by letting users dynamically define weights and thus create custom rankings. Another technique lets users divide datasets based on combinations of set attributes, hence users can slice and dice a dataset according to data?driven criteria. Methods to visualize and explore subsets. We developed various techniques to jointly visualize multiple subsets. We distinguish techniques for two fundamental data types: tabular and graph data. For tabular data, we developed methods that work in concert with the selection techniques discussed above and let users dynamically choose, position, and connect subsets on a canvas. In addition, users can define appropriate visualization techniques to use for the subsets and choose the degree of relationship with other subsets. We introduce a formal classification of how two subsets can interact based on shared data types and the desired strength of relationship. A realization of this work is now, for example, used in cancer subtype analysis to explore the properties and interactions of patient classifications. The second data type we investigated are graphs. Here we introduced methods that use a focus and concept method to present a core subset of a graph and automatically pull in related subsets of the graph. The system is highly interactive as a user updates a selection, interesting related subsets are suggested and visualized. In addition to the display of the subsets of the graph topology, we show relevant attributes for selected parts of the displayed graphs. We apply these methods to biological pathways, where our techniques have been used, for example, to study why certain cell lines do respond to a drug while others do not.
- Harvard University - 100%
Research Output
- 2353 Citations
- 14 Publications
-
2013
Title LineUp: Visual Analysis of Multi-Attribute Rankings DOI 10.1109/tvcg.2013.173 Type Journal Article Author Gratzl S Journal IEEE Transactions on Visualization and Computer Graphics Pages 2277-2286 Link Publication -
2013
Title Entourage: Visualizing Relationships between Biological Pathways using Contextual Subsets DOI 10.1109/tvcg.2013.154 Type Journal Article Author Lex A Journal IEEE Transactions on Visualization and Computer Graphics Pages 2536-2545 Link Publication -
2015
Title Vials: Visualizing Alternative Splicing of Genes DOI 10.1109/tvcg.2015.2467911 Type Journal Article Author Strobelt H Journal IEEE Transactions on Visualization and Computer Graphics Pages 399-408 Link Publication -
2014
Title Show me the invisible DOI 10.1145/2556288.2557032 Type Conference Proceeding Abstract Author Geymayer T Pages 3705-3714 Link Publication -
2013
Title enRoute: dynamic path extraction from biological pathway maps for exploring heterogeneous experimental datasets DOI 10.1186/1471-2105-14-s19-s3 Type Journal Article Author Partl C Journal BMC Bioinformatics Link Publication -
2014
Title Domino: Extracting, Comparing, and Manipulating Subsets Across Multiple Tabular Datasets DOI 10.1109/tvcg.2014.2346260 Type Journal Article Author Gratzl S Journal IEEE Transactions on Visualization and Computer Graphics Pages 2023-2032 Link Publication -
2014
Title Show me the invisible: visualizing hidden Content. CHI '14 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Type Conference Proceeding Abstract Author Geymayer T Conference CHI 2014 -
2014
Title Guided visual exploration of genomic stratifications in cancer DOI 10.1038/nmeth.3088 Type Journal Article Author Streit M Journal Nature Methods Pages 884-885 Link Publication -
2014
Title ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery DOI 10.1109/tvcg.2014.2346752 Type Journal Article Author Partl C Journal IEEE Transactions on Visualization and Computer Graphics Pages 1883-1892 Link Publication -
2015
Title OceanPaths: Visualizing Multivariate Oceanography Data. Type Conference Proceeding Abstract Author Lex A Conference Proceedings of the Eurographics Conference on Visualization (EuroVis '15) -
2014
Title Characterizing Cancer Subtypes Using Dual Analysis in Caleydo StratomeX DOI 10.1109/mcg.2014.1 Type Journal Article Author Turkay C Journal IEEE Computer Graphics and Applications Pages 38-47 Link Publication -
2014
Title Sets and intersections DOI 10.1038/nmeth.3033 Type Journal Article Author Lex A Journal Nature Methods Pages 779-779 Link Publication -
2014
Title UpSet: Visualization of Intersecting Sets DOI 10.1109/tvcg.2014.2346248 Type Journal Article Author Lex A Journal IEEE Transactions on Visualization and Computer Graphics Pages 1983-1992 Link Publication -
2014
Title Mu-8: visualizing differences between proteins and their families DOI 10.1186/1753-6561-8-s2-s5 Type Journal Article Author Mercer J Journal BMC Proceedings Link Publication