Swinburne
Browse

Strategic learning agents in equilibrium-based markets for resource allocation

Download (3.01 MB)
thesis
posted on 2024-07-12, 16:14 authored by Eduardo Rodrigues Gomes
The Commodity Market (CM) economic model offers a promising approach for resource allocation in large-scale distributed systems. A CM provides a marketplace where sellers and buyers trade resources based on a common price known by all the participants. A traditional approach for resource allocation with CM is to apply concepts from the general equilibrium theory of microeconomics, assuming pricetaking participants that will not attempt to strategically influence the mechanism to improve their profits or welfare. Such a condition, apart from being criticized as not realistic, is hardly satisfied in large-scale distributed systems where there is little control over the behaviour of the agents, making it impossible to guarantee that they will behave in an ordered manner. To understand the impacts of these attempts and to develop mechanisms that are robust in the presence of strategic participants are important aspects of the problem. Additionally, most mechanisms focus on the achievement of a Pareto-Optimal (PO) allocation, usually disregarding how fair and how desirable the solution is for both the system and the participants. Different PO outcomes can generate different gains to the involved parties. Therefore, being able to find a mutually desirable PO allocation is also an important aspect of the problem. This thesis addresses the above issues and proposes a framework to optimise the individual and social allocation efficiency in equilibrium-based commodity market mechanisms composed of strategic learning agents. It addresses the problem from the premise that participants can behave like the entities composing real economies and will engage in strategic behaviour in order to satisfy their preferences. We propose the Iterative Price Adjustment with Reinforcement Learning (IPA with RL), a CM-based mechanism in which agents use utility functions to describe preferences over different resource attributes and develop strategic behaviour by learning demand functions adapted to the market through Reinforcement Learning. We investigate and compare the individual and social performance of the mechanism in the presence of two types of strategic learning agents: selfish, whose learning goal is to improve their individual utility; and altruistic, whose learning goal is to improve the social utility. The results show that the market composed exclusively of selfish learning agents can achieve social performance similar to the performance obtained by the market composed exclusively of altruistic learning agents, both achieving near-optimal social welfare measured by the Nash Product function. The results also show that the selfish agents are able to approximate the solution to the fairest PO allocation in situations where the altruistic agents fail. We further investigate these outcomes and present their theoretical analysis first from the perspective of game-theory, highlighting the properties of the results in terms of Nash Equilibrium and Pareto-Optimality, and then from the perspective of the dynamic process generated by the agents' learning algorithm. The thesis advances the knowledge base in a number of areas of Market-based Resource Allocation, Multiagent Learning and Agent-based Computational Economics. First, it formalizes a new conceptual framework involving multiagent reinforcement learning in commodity-market resource allocation mechanisms. Second, it introduces strategic learning agents in market-based resource allocation mechanisms founded on general equilibrium theory. So far, these mechanisms have assumed the existence of rational price-taking participants. The approach proposed in the thesis, instead, is more realistic as it explicitly addresses the existence of strategic participants that can try to exploit the mechanism. Third, it simultaneously optimises both the individual and social efficiency of the allocation. Existing approaches typically focus on the achievement of a PO allocation only, usually disregarding the individual utility and social welfare resulting from it. Fourth, it develops a heoretical model for the dynamics of Multiagent Q-learning with E-greedy exploration. Despite the popularity of this algorithm, such a model has not been developed before. Finally, it develops a new technique to support the application and scalability of reinforcement learning in commodity-market resource allocation.

History

Thesis type

  • Thesis (PhD)

Thesis note

Submitted in fulfillment of the requirements of the degree of Doctor of Philosophy, Swinburne University of Technology, 2009.

Copyright statement

Copyright © 2009 Eduardo Rodrigues Gomes.

Supervisors

Ryszard Kowalczyk

Language

eng

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC