Swinburne
Browse

An architecture for situated learning agents

Download (2.19 MB)
thesis
posted on 2024-07-13, 05:42 authored by Matthew MitchellMatthew Mitchell
Situated learning agents are agents which operate in real-world environments. Ideally such agents should be capable of assisting humans by performing complex tasks which involve drudgery or risk. Such agents must be capable of dealing with noisy, non-deterministic environments with large state spaces often requiring various forms of memory. This thesis addresses the problem of situated learning agents. It draws from the lessons of related work in the area to identify three fundamental requirements to aid in making the complex choices and trade-offs which arise when addressing this problem. Based on the three requirements a number of existing techniques for learning are selected for use in a new system which is appropriate for implementing situated learning agents. Within this new system the selected techniques are augmented with a variety of important novel techniques. The resulting system is a reinforcement learning system which dynamically develops a connectionist model of its environment while learning. This model consists of join groups and temporal groups. Join groups are used to address the input-generalistion problem by constructing general rules using a default hierarchy, and temporal groups address the hidden-state problem by implementing a short-term memory mechanism. Groups represent one or more situations in the agent's environment and are connected to detector inputs and/or other groups by arcs which are used to pass a variety of messages. Based on the situations they represent, groups contain nodes which store estimates of action-values and maintain estimated transition probabilities to other situations. New groups are created incrementally while learning and are introduced by joining them to two existing groups as selected by a localised probabilistic mechanism. Each new join group is given a small number of trials to determine its usefulness and is then retained only while its nodes demonstrate improved transition estimates over the nodes in the two groups it joins. Among the distinguishing features of the proposed system is an ability to reduce the complexity of structures by representing logical NOT using only AND combinations. This is achieved through the organisation of nodes into groups along with a suppression mechanism. When compared to Back-propagation neural networks, vertices in the proposed system store transition and value estimates relatively independently, allowing it to exploit learning from fewer training examples. This independence also avoids common problems with distributed representations such as interference and catastrophic forgetting, but at the expense of a larger internal representation for some problems. When dealing with problems containing hidden-state, which require short-term memory to be solved, the system will not continue to expend resources extending memory if the solution provides no useful improvement in achieving reinforcement. This, combined with a depth-first search approach to constructing memory, avoids the problem of choosing to have either a fixed-size history window or a-priori restrictions on the amount of structure used for creating short-term memory. However, a requirement for the proposed system is that a common set of paths is frequently traversed during training. Experiments in spatial navigation environments demonstrate how this requirement can commonly be met without difficulty by manipulating the reward landscape. The system's representation and creation of new groups has strong parallels to both Holland's Learning Classifier Systems and Drescher's Schema Mechanism {Holland 1975, Drescher 1991}. Consequently, the system is capable of discovering and representing a large number of rules efficiently using default hierarchies. Furthermore, the rule set can grow with rules continually being added as more experience is obtained or new problems encountered. However, in place of the genetic algorithm used by Learning Classifier Systems for rule discovery, the proposed system makes random combinations using a number of selective mechanisms to reduce the search space. Structures created by the system are incrementally combined to create more complex structures. This method of creating rules is in contrast to Drescher's Schema mechanism in which new rules require the evaluation of a large number of inputs together. Experimental results are presented which demonstrate clearly that the system described in this thesis is capable of dealing with the difficulties that arise in real-world environments, particularly in relation to input-generalisation and hidden-state. The experiments are based on well-known and commonly used problems from the literature, including concept learning and maze navigation tasks. The results demonstrate that the proposed system performs as well or better than many of the compared approaches in terms of predictive accuracy and the number of training examples required.

History

Thesis type

  • Thesis (PhD)

Thesis note

Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy, School of Computer Science and Software Engineering, Monash University.

Copyright statement

Copyright © 2004 Matthew Mitchell.

Language

eng

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC