BMFacts: Knowledge acquisition for a biomedical fact reposito
BMFacts: Knowledge acquisition for a biomedical fact reposito
Disciplines
Other Human Medicine, Health Sciences (30%); Other Social Sciences (20%); Computer Sciences (50%)
Keywords
-
Biomedical facts,
Biomedical linked data,
Knowledge acquisition,
Question answer system,
Co-Occurrences Analysis,
Biomedical terminologies
Information management in biomedical research, health care, and translational medicine would greatly benefit from structured repositories that represent and connect generally accepted biomedical facts. Such fact stores could be used as a knowledge resource to support document retrieval, question answering and decision support systems, in addition to the already established biomedical terminologies and ontologies. The literature database MEDLINE with more than 22 million bibliographic records, is already a comprehensive source of semi- structured biomedical information, especially due to associated rich metadata annotations, using the MeSH indexing vocabulary. This data is linked to other biomedical terminologies and ontologies via the UMLS Metathesaurus. This resource provides, in addition, statistical co- occurrences based on joint corpus annotations, which is a valuable but still underused information source. The proposed project BMFacts pursues to exploit the potential of co- occurrence data and biomedical terminologies in order to infer semantic relations based on statistical associations of annotations in biomedical publications. The inferred facts are characterized by representing context dependent domain knowledge, which goes far beyond what is currently represented in biomedical ontologies or terminologies. In BMFacts, we aim to construct an RDF triplestore of such biomedical facts. It will use Semantic Web technologies and Linked Data to manage and exploit the resulting knowledge base, called Biomedical Facts Repository (BMFR). BMFR uses SNOMED CT codes to identify concepts within inferred triples and proposes new types of binary associations, using and extending relations from the UMLS Semantic Network. We want to investigate the potential of BMFR for supporting the knowledge needed in health related decision support and clinical document retrieval. Linked Data principles will be applied in order to extend the content of the BMFR with external datasets and support translational medicine through the generated triples. The content of BMFR will furthermore be refined in three ways: (i) by comparing to facts from the Linking Open Data (LOD) cloud, (ii) by using additional metadata and extracts from MEDLINE, and (iii) by matching free-text renderings of the predications against large medical reference corpora. After this cleansing process, the BMFR will be benchmarked using two application scenarios: (i) a question answering framework targeting laypersons` information needs regarding diabetes mellitus and related diseases, for which a gold standard exists; and (ii) a clinical query infrastructure on a corpus of anonymised clinical texts, for which a set of user queries and relevance judgements exist.
The goal of BMFacts was to develop and assess methodologies to extract generally acceptable fact-like statements from the biomedical literature database MEDLINE. Such facts could be a valuable knowledge resource to support document retrieval, question answering and decision support. In contrast to content in biomedical terminologies and ontologies, which represent what is universally true (e.g. Lung cancer is always located in the lung), BMFacts aimed at generating contingent knowledge like symptoms / disorder associations, drug indications, side effects, or etiological factors (e.g. lung cancer caused by smoking). MEDLINE with more than 22 million bibliographic records is a rich source for extracting biomedical information, particularly due to annotations of each record with the MeSH (Medical Subject Headings) indexing vocabulary. The BMFacts methodology processes hundreds of millions of MeSH annotations provided by MEDLINE to obtain a list of co-occurring MeSH terms pairs. Then a clustering method is applied to induce suitable predicates that give sense to co-occurring MeSH term pairs. The set of candidate predicates like treats, diagnoses, prevents is taken from an external source, the UMLS Semantic Network and selected by lexical analysis of paper abstracts, also available in MEDLINE. The output is a repository of biomedical facts as simple Subject-Predicate-Object triples. An example would be Insulin; treats; Type-1-Diabetes. The produced repository of biomedical facts was evaluated using a given gold standard from a previous project, in which physicians had manually created a set pf plausible predications on Diabetes mellitus. The evaluation results showed the strengths and weaknesses of the completely unsupervised learning methodology. Dependent of the predicate examined, the precision of the fact repository exhibited a great variation. An in-depth analysis showed that in many cases the co-occurrence of MeSH terms could not be broken down to simple predications but to chains of predications, such as diabetes mellitus causes nephropathy which causes renal insufficiency which is treated by kidney transplant (whereas the found predication
Research Output
- 3 Citations
- 7 Publications
-
2015
Title Knowledge Extraction from MEDLINE by Combining Clustering with Natural Language Processing. Type Journal Article Author Miñarro-Giménez J Journal AMIA ... Annual Symposium proceedings. AMIA Symposium Pages 915-24 -
2015
Title Acquiring Plausible Predications from MEDLINE by Clustering MeSH Annotations. Type Journal Article Author Miñarro-Giménez J Journal Studies in health technology and informatics Pages 716-20 -
2015
Title Acquiring Plausible Predications from MEDLINE by Clustering MeSH Annotations DOI 10.3233/978-1-61499-564-7-716 Type Book Chapter Author MiÑArro-GimÉNez Jose Antonio Publisher IOS Press -
2016
Title MapReduce in the Cloud: A Use Case Study for Efficient Co-Occurrence Processing of MEDLINE Annotations with MeSH. Type Journal Article Author Kreuzthaler M Journal Studies in health technology and informatics Pages 582-6 -
2016
Title Publishing Biomedical Predication Repository About MeSH Co-Occurrences in MEDLINE. Type Journal Article Author Miñarro-Giménez J Journal Studies in health technology and informatics Pages 765-9 -
2016
Title Publishing Biomedical Predication Repository About MeSH Co-Occurrences in MEDLINE DOI 10.3233/978-1-61499-678-1-765 Type Book Chapter Author MiÑArro-GimÉNez Jose Antonio Publisher IOS Press -
2016
Title MapReduce in the Cloud: A Use Case Study for Efficient Co-Occurrence Processing of MEDLINE Annotations with MeSH DOI 10.3233/978-1-61499-678-1-582 Type Book Chapter Author Kreuzthaler Markus Publisher IOS Press