Publications

Here you can find all of my publications organized by Journal , Conference and Workshop, Books or Book Chapters, Non-Refereed, Thesis, and Presented Abstracts.  Click on the relevant sub-link of the publications tab to see these respective sets. Links to these publications, as well as BibTeX citations are included where available.  Below you will find 7 select publications that highlight some of my best and newest work.

You can navigate to my official NCBI Bibliography by clicking here.

And you can navigate to my Google Scholar page by clicking here.


Select Publications

Pareto inspired multi-objective rule fitness for noise-adaptive rule-based machine learning

  • Ryan J. Urbanowicz, Randal S. Olson, Jason H. Moore (2016)
  • Proceedings of the Parallel Problem Solving from Nature Conference (PPSN XV)
    BibTeX
    @inproceedings{urbanowicz2016pareto,
    title={Pareto Inspired Multi-objective Rule Fitness for Noise-Adaptive Rule-Based Machine Learning},
    author={Urbanowicz, Ryan J and Olson, Randal S and Moore, Jason H},
    booktitle={International Conference on Parallel Problem Solving from Nature},
    pages={514–524},
    year={2016},
    organization={Springer}
    }

Abstract: Learning classifier systems (LCSs) are rule-based evolutionary algorithms uniquely suited to classification and data mining in complex, multi-factorial, and heterogeneous problems. The fitness of individual LCS rules is commonly based on accuracy, but this metric alone is not ideal for assessing global rule ‘value’ in noisy problem domains and thus impedes effective knowledge extraction. Multi-objective fitness functions are promising but rely on prior knowledge of how to weigh objective importance (typically unavailable in real world problems). The Pareto-front concept offers a multi-objective strategy that is agnostic to objective importance. We propose a Pareto-inspired multi-objective rule fitness (PIMORF) for LCS, and combine it with a complimentary rule-compaction approach (SRC). We implemented these strategies in ExSTraCS, a successful supervised LCS and evaluated performance over an array of complex simulated noisy and clean problems (i.e. genetic and multiplexer) that each concurrently model pure interaction effects and heterogeneity. While evaluation over multiple performance metrics yielded mixed results, this work represents an important first step towards efficiently learning complex problem spaces without the advantage of prior problem knowledge. Overall the results suggest that PIMORF paired with SRC improved rule set interpretability, particularly with regard to heterogeneous patterns.

 

ExSTraCS 2.0: addressing scalability with a rule specificity limit in a Michigan-style supervised learning classifier system for classification, prediction, and knowledge discovery

  • Ryan J. Urbanowicz and Jason H. Moore (2015)
  • Journal of Evolutionary Intelligence
    BibTeX
    @article{urbanowicz2015exstracs,
    title={ExSTraCS 2.0: description and evaluation of a scalable learning classifier system},
    author={Urbanowicz, Ryan J and Moore, Jason H},
    journal={Evolutionary intelligence},
    volume={8},
    number={2-3},
    pages={89–116},
    year={2015},
    publisher={Springer}
    }

Abstract: Algorithmic scalability is a major concern for any machine learning strategy in this age of ‘big data’. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExSTraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExSTraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExSTraCS usability was made simpler through the elimination of previously critical run parameters.

 

The role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach

  • Ryan J. Urbanowicz, Angeline Andrew, Margaret Karagas, and Jason H. Moore (2013)
  • Journal of the American Medical Informatics Association (JAMIA)
    BibTeX
    @article{urbanowicz2013role,
    title={Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach},
    author={Urbanowicz, Ryan John and Andrew, Angeline S and Karagas, Margaret Rita and Moore, Jason H},
    journal={Journal of the American Medical Informatics Association},
    volume={20},
    number={4},
    pages={603–612},
    year={2013},
    publisher={The Oxford University Press}
    }

Background and Objective:  Detecting complex patterns of association between genetic or environmental risk factors and disease risk has become an important target for epidemiological research. In particular, strategies that provide multifactor interactions or heterogeneous patterns of association can offer new insights into association studies for which traditional analytic tools have had limited success.

Materials and Methods: To concurrently examine these phenomena, previous work has successfully considered the application of learning classifier systems (LCSs), a flexible class of evolutionary algorithms that distributes learned associations over a population of rules. Subsequent work dealt with the inherent problems of knowledge discovery and interpretation within these algorithms, allowing for the characterization of heterogeneous patterns of association. Whereas these previous advancements were evaluated using complex simulation studies, this study applied these collective works to a ‘real-world’ genetic epidemiology study of bladder cancer susceptibility.

Results and Discussion:  We replicated the identification of previously characterized factors that modify bladder cancer risk—namely, single nucleotide polymorphisms from a DNA repair gene, and smoking. Furthermore, we identified potentially heterogeneous groups of subjects characterized by distinct patterns of association. Cox proportional hazard models comparing clinical outcome variables between the cases of the two largest groups yielded a significant, meaningful difference in survival time in years (survivorship). A marginally significant difference in recurrence time was also noted. These results support the hypothesis that an LCS approach can offer greater insight into complex patterns of association.

Conclusions: This methodology appears to be well suited to the dissection of disease heterogeneity, a key component in the advancement of personalized medicine.

 

GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures

  • Ryan J. Urbanowicz, Jeff Kiralis, Jonathan Fisher, Nicholas Sinnott-Armstrong, and Jason H. Moore (2012)
  • BioData Mining. BioMed Central Ltd.
    BibTeX
    @article{urbanowicz2012gametes,
    title={GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures},
    author={Urbanowicz, Ryan J and Kiralis, Jeff and Sinnott-Armstrong, Nicholas A and Heberling, Tamra and Fisher, Jonathan M and Moore, Jason H},
    journal={BioData mining},
    volume={5},
    number={1},
    pages={1},
    year={2012},
    publisher={BioMed Central}
    }

Background: Geneticists who look beyond single locus disease associations require additional strategies for the detection of complex multi-locus effects. Epistasis, a multi-locus masking effect, presents a particular challenge, and has been the target of bioinformatic development. Thorough evaluation of new algorithms calls for simulation studies in which known disease models are sought. To date, the best methods for generating simulated multi-locus epistatic models rely on genetic algorithms. However, such methods are computationally expensive, difficult to adapt to multiple objectives, and unlikely to yield models with a precise form of epistasis which we refer to as pure and strict. Purely and strictly epistatic models constitute the worst-case in terms of detecting disease associations, since such associations may only be observed if all n-loci are included in the disease model. This makes them an attractive gold standard for simulation studies considering complex multi-locus effects.

Results: We introduce GAMETES, a user-friendly software package and algorithm which generates complex biallelic single nucleotide polymorphism (SNP) disease models for simulation studies. GAMETES rapidly and precisely generates random, pure, strict n-locus models with specified genetic constraints. These constraints include heritability, minor allele frequencies of the SNPs, and population prevalence. GAMETES also includes a simple dataset simulation strategy which may be utilized to rapidly generate an archive of simulated datasets for given genetic models. We highlight the utility and limitations of GAMETES with an example simulation study using MDR, an algorithm designed to detect epistasis.

Conclusions: GAMETES is a fast, flexible, and precise tool for generating complex n-locus models with random architectures. While GAMETES has a limited ability to generate models with higher heritabilities, it is proficient at generating the lower heritability models typically used in simulation studies evaluating new algorithms. In addition, the GAMETES modeling strategy may be flexibly combined with any dataset simulation strategy. Beyond dataset simulation, GAMETES could be employed to pursue theoretical characterization of genetic models and epistasis.

 

An analysis pipeline with statisticsal and visualization-guided knowledge discovery for Michigan-style learning classifier systems

  • Ryan J. Urbanowicz, Ambrose Granizo-Mackenzie, and Jason H. Moore (2012)
  • IEEE Computational Intelligence Magazine
    BibTeX
    @article{urbanowicz2012analysis,
    title={An analysis pipeline with statistical and visualization-guided knowledge discovery for michigan-style learning classifier systems},
    author={Urbanowicz, Ryan J and Granizo-Mackenzie, Ambrose and Moore, Jason H},
    journal={IEEE computational intelligence magazine},
    volume={7},
    number={4},
    pages={35–45},
    year={2012},
    publisher={IEEE}
    }

Abstract: Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data.

Instance-linked attribute tracking and feedback for Michigan-style supervised learning classifier systems

  • Ryan J. Urbanowicz, Ambrose Granizo-Mackenzie, and Jason H. Moore (2012)
  • Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation (GECCO’12)
    BibTeX
    @inproceedings{urbanowicz2012instance,
    title={Instance-linked attribute tracking and feedback for michigan-style supervised learning classifier systems},
    author={Urbanowicz, Ryan and Granizo-Mackenzie, Ambrose and Moore, Jason},
    booktitle={Proceedings of the 14th annual conference on Genetic and evolutionary computation},
    pages={927–934},
    year={2012},
    organization={ACM}
    }

Abstract: The application of learning classifier systems (LCSs) to classification and data mining in genetic association studies has been the target of previous work. Recent efforts have focused on: (1) correctly discriminating between predictive and nonpredictive attributes, and (2) detecting and characterizing epistasis (attribute interaction) and heterogeneity. While the solutions evolved by Michigan-style LCSs (M-LCSs) are conceptually well suited to address these phenomena, the explicit characterization of heterogeneity remains a particular challenge. In this study we introduce attribute tracking, a mechanism akin to memory, for supervised learning in MLCSs. Given a finite training set, a vector of accuracy scores is maintained for each instance in the data. Post-training, we apply these scores to characterize patterns of association in the dataset. Additionally we introduce attribute feedback to the mutation and crossover mechanisms, probabilistically directing rule generalization based on an instance’s tracking scores. We find that attribute tracking combined with clustering and visualization facilitates the characterization of epistasis and heterogeneity while uniquely linking individual instances in the dataset to etiologically heterogeneous subgroups. Moreover, these analyses demonstrate that attribute feedback significantly improves test accuracy, efficient generalization, run time, and the power to discriminate between predictive and non-predictive attributes in the presence of heterogeneity.

 

Learning classifier systems: a complete introduction, review, and roadmap

  • Ryan J. Urbanowicz and Jason H. Moore (2009)
  • Journal of Artificial Evolution and Applications. Hindawi Publishing Corporation
    BibTeX
    @article{urbanowicz2009learning,
    title={Learning classifier systems: a complete introduction, review, and roadmap},
    author={Urbanowicz, Ryan J and Moore, Jason H},
    journal={Journal of Artificial Evolution and Applications},
    volume={2009},
    pages={1},
    year={2009},
    publisher={Hindawi Publishing Corp.}
    }

Abstract: If complexity is your problem, learning classifier systems (LCSs) may offer a solution. These rule-based, multifaceted, machine learning algorithms originated and have evolved in the cradle of evolutionary biology and artificial intelligence. The LCS concept has inspired a multitude of implementations adapted to manage the different problem domains to which it has been applied (e.g., autonomous robotics, classification, knowledge discovery, and modeling). One field that is taking increasing notice of LCS is epidemiology, where there is a growing demand for powerful tools to facilitate etiological discovery. Unfortunately, implementation optimization is nontrivial, and a cohesive encapsulation of implementation alternatives seems to be lacking. This paper aims to provide an accessible foundation for researchers of different backgrounds interested in selecting or developing their own LCS. Included is a simple yet thorough introduction, a historical review, and a roadmap of algorithmic components, emphasizing differences in alternative LCS implementations.