A new way to steer driver behaviour
Game theory, originating in economics, has become a cornerstone of control. Anyone following this blog will have noticed the concept popping up over and over again: it’s a useful way of considering how individual behaviour will drive global outcomes within multi-agent systems. In contrast with (gradient-based) optimisation, game theory deals with situations where each participant’s actions affect the others: the global outcome is not determined by individual choices, but by how they all interact. In multi-agent settings, one no longer speaks of an optimal outcome but an equilibrium outcome, where no individual has an incentive to unilaterally deviate from their choices.
But one weakness of multi-agent systems is that multiple equilibrium outcomes exist, which give rise to varying levels of social welfare. The problem for a central authority aiming to optimize social welfare is that the behaviours of individuals depend on confidential information, making it hard to guide them toward a desired outcome.
To explain this, we need to understand the economic concept of utility, and the way it operates at both an individual and a global level. Utility essentially just means the value an individual attaches to a particular event or situation: what are they getting out of it? Not everyone wants the same thing; not everyone gets the same value out of it. But we can assume that each individual acting within a system does have certain priorities. For each player in the economic game, the rational choices are those that maximize their own utility. However, the emergent global outcome of all these rational individual choices may not be socially desirable.
Commuters, for instance, will plan their route to work based on what is most convenient (and fastest) for them personally; their individual impact on the likelihood of traffic jams is not a key decision factor. Internet users go online, and upload or download files, according to their needs, not consideration of overall data traffic. But of course there is a social cost to such choices, and a responsible authority may want to find a way to minimise that burden (be it congestion, or overuse, or inequality) by applying measures such as tolls or subsidies to guide user choices. Financial incentives change the cost-benefit calculation, so that (ideally) the rational choices are now more likely to promote the common good, leading to a more socially desirable equilibrium outcome.
But how can the authority calculate the financial intervention needed to reduce social cost? In any business setting, we can assume that participants in the game may be reluctant to share financial information, which means that their cost structures, and therefore their utility functions, are unknown to other participants or authorities.
It’s what you do that counts
Anna Maddux, a PhD candidate at EPFL, is working on ways to apply effective interventions without cracking the “black box” of user utility. Instead of looking at user motivations (and factoring that into network conditions), she finds that user behaviour is enough to go on.
Working with Marko Maljkovic, Nikolas Geroliminis and Maryam Kamgarpour, she has been applying this insight to the problem of equitable service in ride-hailing markets. In any city, it is common for taxis to gravitate to the locations most likely to have a rich concentration of passengers: the airport, the centres of nightlife, and so on. But that leaves some areas drastically underserved, and some would-be travellers potentially stranded.
One way to address this (given the increasing electrification of taxi fleets) might be to make more remote areas more attractive by applying spatially differentiated pricing. Fuel is already more expensive in high-traffic areas (such as at motorway service stations), so we can follow the logic to explore this as an incentive mechanism. But even with that in place: how could an authority hope to steer market participants toward a more equitable outcome (better distribution of the fleet) without knowing their utilities?
In talking with Marco about his project, Anna thought about the Stackelberg games she had encountered in her work on no-regret algorithms. (An algorithm is said to be “no-regret” if the sequence of actual decisions taken leads to a similar average utility as that resulting from playing the best decision, in hindsight, with the decisions of other participants known. That is: even if the player had known more at the start, the choices they made following that knowledge would not have led to a significantly better outcome.)
Stackelberg games provide a framework to model two-stage interactions: first the game leader makes a move (for instance, a city authority may adjust pricing at charging stations around the city), then the followers respond (taxi companies direct their fleets to different neighbourhoods). Over time, given the charging prices, driver behaviour is likely to reach a Nash equilibrium – each driver will optimise their route, finding the best response to the game leader’s signals and the other drivers' fleet distribution. So after observing this behaviour, the authority has new information: each fleet operator’s costs are still unknown, but its response to pricing signals becomes an input for the next round of the game. Lower-level game inputs (that is, the cost functions of the fleets) don’t matter anymore; only the equilibrium achieved after each move of the game leader.
Effectively, applying variable pricing means that the social cost of driver behaviour has been given a numerical value. That input determines the best outcome for each driver in the fleet; and the eventual equilibrium reached after all drivers have adjusted their routes becomes the new input for the game leader. It’s a clever way of sidestepping a lack of information through empirical testing, which could be useful in many contexts beyond mobility.