Research Interests:
Common human disease research has evolved into a largely complex and interdisciplinary pursuit. Modern epidemiological challenges such as the characterization of complex systems, the management of ‘big data’, or the integration of data for systems biology epitomize this trend. The early stages of biomedical research typically focus on connecting predictive factors, whether they be genetic, epigenetic or environmental, to increased or decreased common disease susceptibility. This attempt to detect patterns of association is likely complicated by non-linear phenomena such as complex gene-gene interactions, gene-environment interactions, genetic heterogeneity, and phenocopy. My primary research interests focus on the development, evaluation, and application of novel computational, statistical, and visualization methods to facilitate classification and data mining in the complex, noisy domain of biomedical research.
My thesis research focused on the adaptation of a learning classifier system (LCS) algorithm to the task of detecting, modeling, and characterizing epistatic and heterogeneous associations within single nucleotide polymorphism (SNP) association studies. The development and application of LCS algorithms has since become a particular area of specialization. My postdoctoral work expanded upon this successful LCS groundwork leading to the development of ExSTraCS, an Extended Supervised Tracking and Classifying System. This work epitomizes my interest in (1) developing strategies which limit the number of assumptions made about the data, and instead allows the data to speak for itself for detecting complex or heterogeneous patterns, (2) allowing for the integration of data types by offering an algorithmic framework which functions for all combinations of discrete/continuous, attributes/endpoints, and (3) promoting a user friendly, interpretable environment for knowledge discovery. My work with LCS algorithms has also led me to pursue visual and statistical strategies with which to guide and facilitate knowledge discovery. My interests have also branched off into the theory and practice of complex disease model and data simulation, which led to the development of the open source GAMETES software package. Also, my interest in tackling issues related to ‘big data’ have motivated me to explore and expand attribute filtering approaches (ReliefF, SURF, SURFStar, MultiSURF), for computational and algorithmic flexibility and efficiency. These algorithms offer critical preprocessing steps for feature selection and the generation and application of statistical, objective, and unbiased expert knowledge to more efficiently guide stochastic algorithm learning.
In summary, my research interests lie at the intersection of genetics, genomics, biostatistics, epidemiology, machine learning, and artificial intelligence. I have adopted a quantitative biomedical research strategy that embraces, rather than ignores, the complexity of the relationship between predictive factors and disease endpoints.