## Abstract

Dr Chayes’ talk described how, to a discrete mathematician, ‘all the world’s a graph, and all the people and domains merely vertices’. A graph is represented as a set of vertices V and a set of edges E, so that, for instance, in the World Wide Web, V is the set of pages and E the directed hyperlinks; in a social network, V is the people and E the set of relationships; and in the autonomous system Internet, V is the set of autonomous systems (such as AOL, Yahoo! and MSN) and E the set of connections. This means that mathematics can be used to study the Web (and other large graphs in the online world) in the following way: first, we can model online networks as large finite graphs; second, we can sample pieces of these graphs; third, we can understand and then control processes on these graphs; and fourth, we can develop algorithms for these graphs and apply them to improve the online experience.

## 1. Modelling networks as graphs

First, in modelling online networks as graphs, we need to take into account observed phenomena, such as (i) the small diameter (‘six degrees of separation’), (ii) the power law degree distribution, (iii) the ageing of vertices (older vertices tend to be more highly connected), and (iv) clustering and triadic closure (if two vertices are connected to a third, they are more likely to be connected to each other). Good models help with practical issues such as search; for instance, PageRank uses the Web structure to rank pages, while more recent work by Chayes and co-workers [1] has enabled the detection and avoidance of anomalies characteristic of Web spammers.

Several types of models have been created. Barabási & Albert described probabilistic growth models such as the preferential attachment models ([2]; this has been followed by rigorous mathematical study of the diameter and power law of such graphs [3]). These growth models account for the first three of the observed phenomena, but not clustering. Chayes and co-workers [4] constructed extended growth models with *directed preferential attachment*, analogous to growth with preferential attachment except that incoming and outgoing edges are treated differently from each other. This yields different in-degree and out-degree power laws, which can be fitted to real data. A fitness parameter [5,6] can also be added to such models and was given rigorous treatment in the study of Borgs *et al.* [7], which showed that, depending on the distribution of fitness, there were three phases to growth: a first-to-market-wins phase, a rich-get-richer phase and an innovation-pays-off phase.

A second type is competition models, in which a vertex ‘chooses’ another vertex to link to as a result of a competition between different factors. One group of such models creates competition-induced preferential attachment [8] by a competition between minimizing the network distance to a central node and minimizing the cost of peering agreements. This produces preferential attachment up to a certain scale, and beyond that the exponential decay characteristic of random graphs, results which fit real experimental data such as data on the structure of the Internet [9].

A third type of model, game-theoretic models, exploits agents’ tendencies to maximize their own utilities in response to moves by their neighbours until equilibrium is reached. In this area, Chayes and co-workers [10] have developed the affiliation network game (hitchhiker’s game) with events among group members in which the costs are borne by particular individuals, but the benefits accrue across the group. This yields a rich class of Nash equilibria with non-trivial structure and triadic closure.

## 2. Sampling from large finite graphs

How can we know how a sample of the Web (used, for example, in a search algorithm) is representative? Chayes and colleagues have developed a new theory of graph limits and testing, analysing graphs according to a number of properties. They defined a distance between two graphs to reflect the closeness of both local and global properties, and used this to describe a convergent sequence of graphs, and then showed that such sequences converge to the same limit with respect to all these properties. Chayes and co-workers [11–14] then used these notions of distance and graph limits to give a general theory for parameter testing.

## 3. Processes on graphs: example of controlling epidemics

As an example of research into processes on graphs, Chayes discussed the spread and containment of epidemics in large networks (e.g. a mutating virus or worm on the Web). The issue with an epidemic is how quickly it spreads or dies out depending on the infection rate. In a preferential attachment graph (with its hubs), virtually any infection will survive and spread for exponential time, which is bad news for the Web. If there is a positive rate of transmission to neighbours, however small, there is a positive probability of an epidemic [15,16].

How can we contain the epidemic? Put another way, how should an antidote be distributed most effectively to raise the epidemic threshold for preferential attachment graphs above 0? Using the mathematical methods of graph theory, it can be shown that distributing the antidote uniformly across the population and the more targeted method of contact tracing (i.e. distribute the antidote proportional to the number of infected neighbours) both require an amount of antidote which is superlinear in the number of members of the network, i.e. too much intervention is required to be practical. Chayes and her colleagues have shown that a better method is to distribute the antidote proportional to the number of neighbours, whether or not infected. In other words, target the hubs to reduce the spread. More formally, in graphs with a bounded average degree (including preferential attachment graphs), curing proportional to degree is sufficient to contain epidemics [17].

## 4. Algorithms for graphs: example of recommendation systems on trust networks

From a social network, one can build a trust network in which trust is represented—a directed graph with a directed edge from one vertex to another iff it trusts the other. With a recommendation system, the question is how to propagate recommendations (likes/dislikes) through the network. Chayes and co-workers [18] put forward a set of six (reasonable) axioms which an algorithm to govern the propagation of opinion through a network must obey, and proved an impossibility theorem—that there was no algorithm that satisfied all six simultaneously. However, they also constructed algorithms which proved that any five of the axioms could be satisfied simultaneously. Hence if the desired properties of the recommendation system could be stated, it could be possible to construct an algorithm that would implement them. Chayes raised the possibility that such algorithms could be used to monetize social networks.

Chayes also described novel work in trust/distrust networks which avoided the common naive fallacy of assuming that ‘the enemy of my enemy is my friend’ recommendations (i.e. if X distrusts Y and Y distrusts Z, X will trust Z’s opinion). Her approach is to axiomatize so that (i) distrusted agents’ opinions (positive and negative) are ignored, (ii) trusted opinions are propagated, and (iii) trust and distrust edges at the same node cancel each other out. It is (iii) here that makes the construction non-trivial. The result is a more nuanced recommendation system that avoids the naive fallacy [19].

## 5. Summary

Chayes argued that mathematics helps us to understand and analyse the structure, dynamics and incentives of online graphs, in order to model some of their characteristics, sample from them, understand and control processes on them, and develop algorithms to improve the online experience.

## Footnotes

One contribution of 15 to a Discussion Meeting Issue ‘Web science: a new frontier’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.