Rule-Based Machine Learning

Summary

Learning Classifier Systems (LCSs) combine machine learning with evolutionary computing and other heuristics to produce an adaptive system that learns to solve a particular problem. LCSs are closely related to and typically assimilate the same components as the more widely utilized genetic algorithm (GA). The goal of LCS is not to identify a single best model or solution, but to create a cooperative set of rules or models which together solve the task. The solution evolved by an LCS is represented as a population of rules, or rule-sets, which are utilized collectively to make decisions/classifications. The schematic below organizes related fields of study into a tree which feeds into the LCS research community.

LCS field tree

LCSs are rule-based algorithms with a unique and flexible set of features that set them apart. Two major genre’s of LCS algorithms exist including Michigan-style and Pittsburgh-style systems. Michigan-style systems are the more traditional of the two LCS architectures and is the focus of my own interest and work. Michigan-style LCSs uniquely distribute learned patters over a competing, yet collaborative population of individually interpretable (IF:THEN) rules. This allows the algorithm to flexibly and effectively describe complex and diverse problem spaces found in behavior modeling, function approximation, classification, and data mining. Michigan-style LCSs also apply iterative rather than batch-wise learning, meaning that rules are evaluated and evolved one training instance at a time rather than being immediately evaluated on the training dataset as a whole. This makes them efficient, and naturally well-suited to learning different problem niches found in multi-class, latent-class, or heterogeneous problem domains.

The nature of Michigan-style LCSs impart them with a number of notable advantages: (1) they are model-free and thus do not make assumptions about the data (e.g. the association can be linear, epistatic, or heterogeneous; the number of predictive attributes; noisy or clean signal; balanced or imbalanced classes; how to handle missing data; or attributes being a mix of both discrete and continuous), (2) Individual rules are directly interpretable as IF:THEN expressions, (3) they are implicitly multi-objective, evolving rules toward maximal accuracy and generality (i.e. rule simplicity) to improve predictive performance, (4) on their own, they have themes in common with an ensemble learning system where individual rules or groups of rules propose different solutions within a collective rule population, (5) they are adaptive to changing dataset environments, such that parts of a solution can adapt without starting over from scratch, (6) by relying on evolutionary computation to explore the search space they offer a practical option when deterministic, exhaustive search is intractable, and (7) the algorithm is compartmentalized such that individual components can be updated or replaced to handle specialized problem domains. Key disadvantages of LCS include (1) the belief that they are somewhat more difficult to properly apply, (2) they lack a comparable theoretical understanding next to other, well-known machine learning strategies and are not guaranteed to converge on the optimal solution, (3) they are relatively computationally demanding and in certain problem domains can take longer to converge on a solution, and (4) most implementations to date have a relatively limited scalability.

Schematic of a generic supervised learning classifier system.

Rule-Based Machine Learning Content

Review Papers

Books

*Available at the end of this year, Will Browne and myself are co-authoring an introductory textbook on learning classifier systems. This will be the first truly introductory book on LCSs available. The book will be paired with free and accessible versions of an LCS algorithm coded in Python.

Summary

Rule-Based Machine Learning Content

Review Papers

Books

Links

Slideshows