School of Computing Seminar

Fridays 2:30-3:30 pm
McAdams 114

November 22, 2019

Justin Sybrandt
Clemson University SoC

Moliere: Automatic Biomedical Hypothesis Generation


Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 27 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. Using this network we quantify the potential of future connections between entities. To do so, we propose a range of analytic measures, and perform a large-scale validation to ensure their validity. Furthermore, we apply this predictive capacity to identify gene associations related to HIV-associated dementia. These predicted, never-before studied, relationships are verified in wet lab experiments.


Justin is in the 4'th year of his PhD, wherein he has studied text and graph mining. He is a fellow of the Clemson NRT program, and during his studies has interned at Los Alamos National Lab, Google, and Facebook. Moliere, the biomedical hypothesis generation system, is his primary research focus, which has recently grown to include a re-release, PyMoliere, as well as current work in natural language generation. Justin has accepted a position at Google, starting Spring of 2020.

Advisor: Dr. Ilya Safro
Moliere: An Automatic Biomedical Hypothesis Generation System (KDD'17);
Large-scale validation of hypothesis generation systems via candidate ranking (BigData'18);
Are Abstracts Enough for Hypothesis Generation? (BigData'18)



School of Computing | Clemson University