## Abstract

Distributed algorithms are an established tool for designing protocols for sensor networks. In this paper, we discuss the relation between distributed computing theory and sensor network applications. We also present a few basic and illustrative distributed algorithms.

## 1. Introduction

Many of today's computing and communications systems are *distributed*—systems that are composed of autonomous computational entities that communicate with each other, usually by passing messages. Distributed systems encompass a variety of applications. The most obvious distributed system is the Internet itself, and its various applications such as the domain name system, peer-to-peer file sharing or cloud computing. On the other hand, distributed systems also exist on a smaller scale, such as multi-core processors or field-programmable gate arrays. Indeed, if we expand our horizon beyond technical systems, also the brain or society^{1} are in some sense distributed systems, with neurons and human beings taking the roles of the autonomous entities, respectively. Despite their obvious differences, all these distributed systems have in common that many entities may concurrently be active in the system. These entities have certain degrees of freedom: they may have their own hardware, their own data or code, and sometimes their own independent task. Nevertheless, they share common resources and information and, in order to solve a problem that concerns all or a subset of the entities, coordination is necessary.

A sensor network is an ensemble of *nodes*, where each node by itself is a tiny computer that may gather input data from its sensors and is capable of communicating with other nodes by radio. As radio signal strength deteriorates quickly with distance, it is generally infeasible to have all nodes talk to each other (or to a base station) directly. Instead, nodes may have to use other nodes as intermediate relays in order to communicate. This situation can be modelled by a graph, where the node set is given by the sensor nodes (plus possibly the base station); an edge between two nodes indicates that they are capable of directly exchanging messages by radio communication. As such, a sensor network is an out-of-the-book distributed system [1].

Distributed algorithms are at the theoretical foundations of distributed systems—we study how the nodes of a distributed system should orchestrate their communication and computation in order to solve a given task efficiently. Moreover, we are interested in impossibility results and lower bounds, as they demonstrate what is not possible with a distributed algorithm. The area as a whole is known as *distributed computing*.

In a large distributed system, no node can have a *global* view of the entire system at any time. Instead, every node has a *local* view of the system only, and has to base its decisions on this local information. Many tasks can be solved completely locally,^{2} for instance, a node can figure out the lowest measured temperature in its neighbourhood by simply communicating with its neighbours. Many other tasks are inherently global, for instance, identifying the minimum temperature in the whole sensor network. To solve such global problems, some information must traverse across the entire network graph. Among other things, researchers in distributed computing want to know what type of tasks are local, and what type of tasks are global. As we will see, many interesting network coordination tasks that are relevant in sensor networks are neither local nor global, but somehow in between.

There are a number of reasons why distributed algorithms are highly appealing for the design of sensor network protocols. First of all, we want to avoid the situation in which memory-constrained sensor nodes have to store lots of information about far-away nodes. Thus, in fact, it is an advantage if nodes compute the local part of a problem's solution only. More importantly, restricting the distance from which information is collected implies that (i) algorithms are fast, as communication is typically the most time-consuming part, and (ii) changes in input or topology can be dealt with locally, i.e. merely some of the nodes need to recompute their output. In other words, in a large system, faults or reconfigurations interfere with a small fraction of the system only. This brings us to another impressive capability of distributed systems: if well crafted, they can function correctly from a global perspective even if some of the nodes have crashed, show erroneous behaviour or are even maliciously trying to make the system as a whole fail. Distributed algorithms are crucial in order to exploit this potential, as any part of an algorithm that runs on a single node will fail if the respective node crashes. Even worse, if the node does not simply crash, it may corrupt the entire network with wrong information. Last but not least, as solutions are derived from as little knowledge on the overall state of the network as possible, many distributed algorithms turn out to be simple and elegant. While elegance is not necessarily a design issue for sensor networks, simplicity is, in particular considering the limited computational power, memory and energy of sensor nodes.

Overall, we believe that distributed algorithms can offer a lot to designers of sensor networks. However, even though sensor networks *seem* to be out-of-the-book distributed systems in theory, in practice, quite a few difficulties arise between an abstract distributed algorithm and its implementation on a sensor node. In this paper, we will highlight some of the main differences between a real-world sensor network and an idealized distributed system. We will spotlight the pitfalls, success stories and open problems that researchers have encountered—or still do encounter—while trying to close these gaps between theory and practice. We will start our tour with a brief introduction to abstract distributed algorithms. Subsequently, we will gradually proceed to more realistic communication models. Along the way, we will briefly present some basic techniques that we believe will illuminate the way of thinking about distributed computing as well as the beauty of the resulting solutions.

At this point, we emphasize that this paper does not make any attempt to survey the state of the art in distributed computing. Also, we will bias our choice of the topics discussed towards the ones we are familiar with, i.e. we do not attempt to give a representative cross section of distributed computing.

## 2. Distributed algorithms

For many decades, the study of distributed network algorithms was treated like a pure research topic—maybe too pure for the real world of messy distributed systems. As such, the area fell asleep in the 1990s. The advent of sensor (and its close relatives such as ad hoc or mesh) networks in the current millennium reignited interest in the area again, as all of a sudden there were applications matching the local communication model well. To get a better understanding for this, let us first have a closer look at a standard model used in distributed algorithms.

As mentioned before, a distributed system is modelled as a graph, *G*=(*V*,*E*), whose set of nodes *V* (with *n*:=|*V* |) are the computing devices and whose set of edges *E* define which nodes may communicate with each other directly. One typically focuses on the *locality* of a problem, that is, on the maximal (hop) distance from which nodes must receive information as a function of the number of nodes *n*. Theory in its most pure form thus commonly studies the *local, synchronous message passing model*. More concretely, computation proceeds in rounds, where, in each round, all nodes concurrently exchange messages. In other words, any synchronous distributed algorithm looks like this:

When studying distributed algorithms, we care mostly about running time; the running time of a synchronous distributed algorithm is measured by the number of synchronous communication rounds, i.e. how often the communication–computation loop is executed until the nodes terminate.

### (a) Maximal independent set: a classic problem in distributed computing

For the sake of concreteness, let us study an example, the so-called maximal independent set (MIS) problem. As mentioned before, in a distributed system, all nodes are capable of acting simultaneously. However, even if tasks are local in the sense that it does not matter what distant nodes are doing, frequently one needs to avoid two neighbours taking action at the same time. For instance, if two neighbouring nodes try to transmit at the same time, it is likely that the two messages cannot be received due to wireless interference. On the other hand, we want to achieve a large amount of concurrency. In particular, for each node at least *some* neighbour should be able to perform operations. These requirements can be formalized as follows.

### Definition 2.1 (MIS)

Given a graph *G*=(*V*,*E*), an *independent set* is a subset *S*⊆*V* of the nodes such that no two adjacent nodes are in *S*, i.e. for any edge {*v*,*w*}∈*E*, we have that *v*∉*S* or *w*∉*S*. A *maximal independent set* (*MIS*) *S* is an independent set satisfying that *S*∪{*v*} is not independent for any *v*∈*V* ∖*S*. See figure 1 for an example.

The MIS is a basic *symmetry-breaking* problem that is trivial for a centralized algorithm. Starting with *S*:=∅, one just goes over the nodes one by one and adds any node without neighbour in *S* to *S*. However, if nodes act concurrently, they can join *S* in the same round in order to be much faster. The only thing we need to ensure is that no two adjacent nodes enter *S* in the same round. Assuming that nodes have unique identifiers, the trivial sequential MIS solution translates to a simple distributed algorithm. For brevity, denote in the following by 𝒩_{v}:={*w*∈*V* |{*v*,*w*}∈*E*} the neighbourhood of *v*∈*V*.

When executing algorithm 2, any node still active that has a locally maximal identifier will join the independent set in the next (even) round. Because there can never be two neighbours that have a locally maximal identifier, this can never cause a conflict. All neighbours of such nodes terminate in the subsequent round, making sure that we will also have no conflicts later on. Note that it is safe for these nodes to terminate, as they now have at least one neighbour in the independent set, i.e. they cannot be part of any independent set containing the nodes that have already joined. The running time of this algorithm is given by the longest descending chain of identifiers. This is at most *D*+1, where *D* denotes the diameter of the graph, i.e. the maximal length of a shortest path between any pair of nodes.^{3}

Unfortunately, *D* might be large—a most simple network is the line (also known as linked list, or chain) network, which already has a diameter *n*−1! This is not exciting, as in a bad case, it is no better than a sequential solution—it seems that a simple locally definable problem such as finding an MIS should have a solution without a global causality chain.

### (b) Searching for a fast distributed maximal independent set algorithm

Defying this intuition, until now no algorithms are known that are *always* fast. Even if one does not care about the size of messages, the fastest known solution has running time [3]. ‘Interesting’ graphs tend to have a diameter that is much smaller. While some algorithms are efficient if the graph satisfies certain properties, ideally we would like to have a solution that works nicely for *any* network. For distributed algorithms, in many cases, this becomes difficult because, for any fixed course of action, there is a particular example where it is bad. And because nodes initially are aware of a small part of the graph only, it may take a lot of time until they learn about this. Oftentimes, this obstacle can, at least partly, be overcome by means of *randomization*. Randomized distributed algorithms follow the same rules as *deterministic* ones, with the single exception that nodes may base their decisions also on ‘coin tosses’. Again, let us illustrate this concept by a plausible example.

Algorithm 3 is a variant of a long-known family of algorithms [4–7]. Essentially, it is identical to algorithm 2, with the decisive difference that, in every iteration, the ‘identifiers’ are picked anew at random. If the random numbers have a logarithmic size, the probability that two numbers are equal is negligible. It can be shown by a fairly simple argument that, in each iteration, the expected number of edges adjacent to terminating nodes is a constant fraction of the total number of remaining edges. Since this holds for *any* graph, in particular the subgraphs induced by the active nodes in each iteration, this implies an expected running time of . One can show that in fact the algorithm will terminate in rounds also *with high probability (w.h.p.)*, that is, with probability 1−1/*n*^{c}. This is a powerful result, as even for a moderate constant *c*, it is absurdly unlikely that the algorithm does not terminate within rounds.^{4}

### (c) Maximal independent set and colouring

Apart from the vague motivation given above, why should we be interested in an MIS algorithm in the first place? One possible answer to this question is that it is useful in computing different, perhaps more attractive, structures. For instance, an MIS is always a *dominating set* (i.e. in each node's neighbourhood there is at least some node from the set). Typically, one is interested in small dominating sets (as one could trivially take all nodes), but there are interesting graphs where an MIS is not much larger than a smallest set, a so-called *minimum dominating set (MDS)*. Another striking example is colouring.

### Definition 2.2 (Colouring)

Given a graph *G*=(*V*,*E*), a * k-colouring* of

*G*is a function

*c*:

*V*→{1,…,

*k*}, such that, for each colour

*i*∈{1,…,

*k*}, the set

*c*

^{−1}(

*i*) is independent. In other words, any two neighbouring nodes

*v*and

*w*do not have the same colour, that is,

*c*(

*v*)≠

*c*(

*w*). Figure 1 gives an example of a 3-colouring.

Naturally, a goal is to minimize the number of colours: a *k*-colouring permits that each node acts once in *k* rounds without neighbours interfering. This is of high interest in sensor networks, as it makes it possible to set up a schedule where all nodes may transmit once in a while without causing conflicts. Regrettably, it is however extremely difficult to determine the minimum number of colours required to colour a graph: it is NP-hard to obtain even an estimate that is more precise than a factor *n*^{1/7−ε}, for any constant *ε*>0 [8].

For this reason, it is common to settle for finding a colouring with *Δ*+1 or colours, where *δ*(*v*) is the degree (the number of neighbours) of node *v*, and *Δ* is the maximum degree in the graph. Like computing an MIS, computing a (*Δ*+1)-colouring is trivial if approached sequentially, working node by node.

There is a close relation between colouring and MIS. From a colouring with *k* colours, one can derive an MIS in *k* rounds by incrementally adding nodes of the same colour to the independent set concurrently.

In turn, we can emulate an MIS algorithm on a virtual graph where each node *v* with degree *δ*(*v*) emulates a cluster of *δ*(*v*)+1 virtual nodes called *v*_{0},*v*_{1},…,*v*_{δ(v)}. All nodes within each cluster are connected by virtual edges. In addition, *v*_{i} is connected to *w*_{i} if *v* and *w* are neighbours in the original graph. By definition of the MIS problem, each cluster *v* will have exactly one virtual node *v*_{i} in the MIS, and the index *i* of that node determines the colour, i.e. *c*(*v*):=*i*.

Despite using elaborate and complex techniques, the currently best-known deterministic colouring algorithm [9] remains unsatisfactorily slow. In contrast, plugging in algorithm 3 into the above transformation yields a simple randomized algorithm with running time w.h.p. However, as a node *v* needs to emulate *δ*(*v*)+1 many nodes in the virtual graph, which each need to pick and compare random bit strings of size in every iteration, messages would have to contain bits in the underlying graph in order to emulate the MIS algorithm at full speed. Luckily, we can fix this by simply letting each node choose a free (i.e. not yet occupied by a neighbour) colour from the range {1,…,2*δ*(*v*)} at random, in each round. Since the neighbours may have or claim only one colour each, there are always at least *δ*(*v*) free colours. Therefore, every active node can successfully claim a colour differing from all its neighbours' with probability at least 1/2. If successful, it can fix this colour as its output, inform its neighbours and terminate. We again get an algorithm that terminates in time w.h.p.

### (d) Distributed computing models

Of course, distributed computing theory has to offer more evolved algorithms. In general, when devising distributed algorithms, one tries to achieve (or balance between) multiple objectives. A main objective is always to guarantee a small running time. Apart from this, we usually want messages not to be too large. A size restriction of bits per message is the golden standard. Smaller messages often imply that algorithms become restricted, as one cannot encode a node identifier any more in a single message.^{5} In addition, computations should be ‘reasonable’. This in particular excludes just collecting the entire topology and inputs at a single (or each) node and solving the whole problem, or dealing with NP-hard (sub)problems. Luckily, small messages imply that nodes do not obtain much information, quite often resulting in computations remaining simple without special consideration. In addition, often the number of messages sent by each node should be small, and memory consumption should be small. Finally, the nodes should need as little knowledge on the network as possible. Ideally, nodes do not know the total number of nodes *n*, the maximum degree *Δ*, the diameter *D* or the largest identifier in the network.

As can be seen from this (non-exhaustive) list, distributed computing studies quite a few of the properties that are relevant to sensor networks. However, there is an aspect of distributed computing theory that is of equal, if not higher, significance to practitioners. Apart from studying what is possible, distributed computing also deals with understanding what is *im*possible.

### (e) Lower bounds

For instance, is there a (randomized) MIS algorithm that is substantially better than algorithm 3? The answer is ‘no’. For the MIS problem, and for many related problems, a lower bound of (or alternatively ) has been established [10]. In other words, there is *no* (randomized) distributed algorithm that can compute an MIS in less than time; that is, in order to figure out whether to join an MIS, each node must collect at least the information in its -hop neighbourhood. This is certainly surprising, as the MIS problem can be defined strictly locally, and one would not expect such a ‘butterfly effect’.

However, again the picture is incomplete. Even if there is no MIS algorithm that can be faster than rounds in *general* graphs, this does not necessarily imply that the same is true for wireless networks. The MIS lower-bound construction relies on a peculiar family of bipartite graphs, and one might argue that clearly wireless networks do not exhibit arbitrary connectivity.

## 3. Wireless connectivity

As we discussed in the previous section, the MIS lower bound does not apply to the connectivity graphs that one may encounter in a real wireless network. In this section, we study what kind of graph families are reasonable, and whether they allow for faster distributed algorithms, breaking the lower bound for general graphs.

As we know, the strength of a radio transmission decreases with distance. Thus, it is a key characteristic of wireless (sensor) networks that, unless the network is small, it is unlikely that every node can communicate directly with every other node. On the other hand, it is much more likely that two nearby nodes can communicate with each other directly than it is for two far-away nodes. That is, the connectivity between nodes is fundamentally governed by some kind of *geometry*.

In the early days of algorithmic wireless network research, this geometry was modelled by *unit disc graphs* (*UDG*), e.g. [11,12]. Let *V* be a set of nodes, located in the two-dimensional Euclidean plane. In the UDG model, we assume that two nodes are adjacent if and only if their Euclidean distance is at most 1. If two nodes are farther away, they need to communicate through intermediate relay nodes. The UDG model is too idealistic; in reality, radio communication is never perfectly omnidirectional, and even small obstacles such as plants could change radio connectivity.

A natural generalization of the UDG is the *quasi-unit disc graph* (*QUDG*), [13,14]. Again, in a QUDG, the nodes *V* are located in the plane. All pairs of nodes with Euclidean distance at most *ρ* for some given *ρ*∈(0,1] are adjacent. Pairs with a distance larger than 1 are never in each other's transmission range. Finally, pairs with a distance between *ρ* and 1 may or may not be neighbouring. Indeed, for this ‘grey’ area in (*ρ*,1], there are several possible model options. For instance, one could imagine an adversary choosing for each node pair whether they are in each other's transmission range, or one could apply a connectivity probability distribution depending on the distance. Measurement studies [15] suggest that, in an unobstructed environment, *ρ* can be modelled as a reasonable constant. In this case, the overhead to make a protocol work in the QUDG model is often bearable.

In contrast, if our sensor network is inside a building, obstructions usually drive the parameter *ρ* towards 0, as there might be nodes that are physically close but blocked by a wall. And as soon as *ρ* is small enough, it allows for arbitrary connectivity graphs. However, real-world connectivity graphs are not arbitrary. Although nodes that are close but on different sides of a wall may not be able to communicate, a node is typically highly connected to the nodes that are in the same room, and thus many neighbours of a node are adjacent. In other words, even if there are many obstacles, the total number of neighbours of a node that are not adjacent is likely to be small. This observation is hard to capture with purely geometric models, and motivates more advanced connectivity models such as the *bounded-growth graph* (*BGG*) model [16]. The BGG model can be defined abstractly, without referring to Euclidean geometry. Again, we connect our set of nodes *V* by edges *E*. As defined before, a set of nodes *S*⊂*V* is called independent if no two nodes in *S* are neighbours, i.e. there is no edge in *E* connecting two nodes in *S*. Now, we define the ball *B*_{r}(*v*) with radius *r* around a node *v*∈*V* as all nodes that can be reached from *v* by just following at most *r* edges in *E*. And we ask the set of edges *E* to be such that the size of any independent set *S* in ball *B*_{r}(*v*) is polynomially bounded in *r*, i.e. or . Even though the BGG definition is fairly general, many algorithms that run on a BGG are asymptotically as efficient as algorithms for (Q)UDGs. As such, the BGG has become a model of choice when we are interested in connectivity only [17].

### (a) Wireless connectivity simplifies the maximal independent set problem

So, how much can we improve the running time of MIS or colouring algorithms if we restrict ourselves to (Q)UDGs or BGGs? As it turns out, dramatically! There are algorithms that are super-exponentially better than the lower bound for general graphs. In particular, there is a deterministic algorithm [18] that can compute an MIS in rounds.^{6} Using this MIS algorithm as a base, we can also compute many other interesting structures such as asymptotically optimum colourings or (connected) dominating sets in time.

Unfortunately, the algorithm achieving this surprising result is rather intricate. However, we can present a basic building block [19] if we restrict ourselves to 3-colouring on a ring-topology network. In the following, we assume that our nodes are arranged in an oriented circle, i.e. each node knows its clockwise neighbour and anticlockwise neighbour. The nodes need to pick a colour from {1,2,3} not conflicting with their neighbours. Starting from the initial identifiers (which are a valid colouring), in each step, they reduce the size of the bit string encoding their current colour exponentially. To this end, *v* compares its colour *c*(*v*) to *c*(*w*), the colour of its clockwise neighbour *w*. Starting from the least significant bit, *v* counts the number of bits until encountering the first difference. Its new colour is given by this—binary encoded—number, plus the differing bit. Since *w* performs the same operation, they either (i) encode a different index or (ii) append differing bits to their strings. Thus, the resulting colours are not the same, as is the case for *v* and its other neighbour by the same reasoning (figure 2). Since in each step an index in the binary-encoded colour *c*(*v*) is binary encoded and extended by one bit, after iterations, the number of remaining colours is constant. At this point, it is sufficient if each node with a locally maximal colour larger than 3 replaces it in each round by a free colour from the range {1,2,3}. See algorithm 4 for pseudo-code.

It is interesting to note that both algorithm 4 and its descendant [18] are asymptotically optimal.^{7} It was shown that computing a 3-colouring or an MIS on a ring takes at least time [4,20]. Note that a ring is also a (Q)UDG and a BGG. Moreover, it has also been shown that finding an asymptotically optimum (connected) dominating set in a UDG requires rounds [21].

So, with these problems solved in an adequate family of connectivity graphs, are we equipped with all the tools we need to deal with real networks? Unfortunately, there is one more problem! So far, we have assumed that nodes can always transmit concurrently. In wireless networks, this is usually not possible, as concurrent transmissions will interfere with each other. Instead, we need to enhance our distributed algorithms by a media access control (MAC) mechanism in order to handle this interference. This is the topic of §4!

## 4. Wireless interference

Usually, sensor networks communicate by radio. An immediately evident result is that nodes are *not* able to send a different message to each neighbour at the same time. On the other hand, directed antennas are uncommon, i.e. in principle, it is feasible that all neighbours receive a sent message. In other words, we prefer distributed algorithms where nodes transmit the same uniform message to all their neighbours.

A more subtle consequence of the use of radio communication is that the transmission medium is shared: transmissions are exposed to interference. Concretely, a node *u* may not be able to correctly receive a message from an adjacent node *v* because there is a concurrent transmission going on nearby. While for higher-layer protocols, it may be accurate enough to model interference by having random transmission failures, interference must be a first-class citizen for understanding the basic communication infrastructure.

In some sense, an interference model explains how concurrent transmissions block each other. A signal might for example interfere with itself owing to multi-path propagation (e.g. an electromagnetic wave of a direct path cancelling with the wave on a longer path reflecting at an object). Fully understanding these self-interference effects is beyond the scope of this paper, and maybe beyond the scope of research in distributed algorithms in general, for some years to come.

Interference because of two or more concurrent transmissions is captured by two classes of models, protocol and physical interference models [17,22]. In the *protocol* model, interference is defined using an interference graph *G*′=(*V*,*E*′) on the same node set as the connectivity graph *G*=(*V*,*E*). A receiver node *v* can receive a transmission from node *u* if and only if {*u*,*v*}∈*E* (connectivity) but there is no concurrent transmission by a node *w* with {*w*,*v*}∈*E*′ (interference). There have been many proposals as to how the edge set *E*′ should be defined. Generally, *E*′ includes all the edges of *E*, and more. For instance, if *G* is a UDG, *G*′ may be a UDG as well, but with a larger unit radius. Or *E*′ may include all edges between node pairs that have at most distance *k* in *G*, i.e. {*u*,*v*}∈*E*′ if *u*∈*B*_{k}(*v*). Or *G*′ may be an arbitrary (Q)UDG or BGG.

### (a) Physical interference

All these protocol model definitions have in common that interference ends abruptly, as if there was an invisible wall, and that interference does not sum up, i.e. if there are several concurrent transmissions just outside the interference range, they will not affect a receiver. *Physical* interference models strive to capture these effects. The relation between the nodes *V* is given by a distance matrix, derived from either actual distances in a Euclidean space, or a metric space, or an arbitrary abstract gain matrix (without direct physical representation). In other words, between any two nodes {*u*,*v*} we have a distance *d*(*u*,*v*). A transmission of node *u* with transmission power *P* will give a signal at receiver *v* with power *R*=*P*⋅*d*(*u*,*v*)^{−α}, where *α* is the so-called path-loss parameter, often a constant between 2 and 6. Node *v* can successfully decode the transmission of *u* if *R* is at least a factor *β* larger than the interference, defined as the sum of the signal strength produced by all concurrent transmissions. In addition, there might also be background noise. All this gives the so-called signal-to-interference-plus-noise ratio (SINR) model. Recently, studying physical interference in an algorithmic context has become a hot research topic (e.g. [23–30]). For first surveys, we refer to earlier studies [31–33].

However, most distributed algorithms work in a protocol model. Apart from exceptions, media access is usually done probabilistically. A simple idea is to let nodes transmit randomly, with a probability inversely proportional to the ‘competition’. In particular, if each node *v* transmits with probability *p*:=1/*δ*_{2}(*v*), where *δ*_{2}(*v*) is the highest degree in the two-hop neighbourhood of the interference graph *G*′ of node *v*, we are sure that at least a constant fraction of the transmissions are successful [34]. A major problem is now how to figure out the competition *δ*_{2}(*v*), as we have a chicken-and-egg problem: in order to communicate successfully, we need to know an approximation of *δ*_{2}(*v*). There has been a flurry of work on how to approach the right density, by starting with some probability and eventually converging to the right one, depending on different models, e.g. whether nodes can or cannot tell if multiple nodes try to transmit concurrently [35–42]. A good benchmark for multi-hop networks is the so-called broadcast problem, where one source node needs to send a message to all the nodes in the network, as it combines the media access question cleverly with an important application [43–46]. For a recent media access theory survey, we refer to the study of Kowalski [47].

## 5. Clock synchronization

In the previous section, we discussed the issue of avoiding collisions between the sensor nodes when using synchronous time slots. As each sensor node is equipped with an internal clock, it should be easy to divide time into slots, and make each node send (or listen, or sleep, respectively) in the appropriate slots according to the MAC layer protocol used.^{8}

However, as it turns out, this comes at a price. Synchronizing the clocks in a network is not trivial—in particular, in sensor networks, where nodes often try to save energy by sleeping.^{9} As nodes' internal clocks are not perfect, they will run at speeds that are time-dependent. For instance, variations in temperature or supply voltage will affect this *clock drift*. For standard clocks, the drift is of the order of parts per million, i.e. within a second, it will accumulate to a couple of microseconds. Wireless TDMA protocols account for this by introducing *guard times*. Whenever a node knows that it is about to receive a messages from a neighbour, it powers up its radio a little bit earlier to make sure that it does not miss the message even though clocks are not perfectly synchronized. Also, if nodes are badly synchronized, messages of different slots might collide.

### (a) Clock synchronization in theory …

The analysis of guard times is a *clock synchronization* problem. This kind of question has been studied by the distributed computing community extensively. For instance, it is well known that, in a network of diameter *D*, the worst-case skew between two nodes is *Ω*(*D*) [49]. This bound holds even in the absence of clock drift, as it is not possible to measure the time it takes to *transmit* the current clock value to another node precisely. The respective error might accumulate with distance, summing up to *Ω*(*D*) between two nodes at distance *D*.

However, as wireless interference is to some degree a geometric phenomenon, guard times need to account for the clock skew to nearby nodes only. Thus, it is sufficient if close-by nodes are well synchronized. Or, rephrasing this observation, the system can run as if it were synchronous if we can keep the clock skew between neighbours, the *local skew*, small. Fan & Lynch [50] formulated this so-called *gradient clock synchronization* problem and proved the surprising result that the local skew must be at least in the worst case, no matter what algorithm is used. Thus, even though nodes may be perfectly aware of their neighbours' clocks being off, they may not be able to compensate for it consistently! Recently, it has been established that the smallest possible local skew is , where *ρ* bounds the relative clock drift [51]. Recalling that typically *ρ*<10^{−5}, this implies that, in any practical sensor network, the local skew can be bounded by a constant.

### (b) … and in practice

Sadly, again there is a pitfall. The optimal algorithm assumes that neighbours exchange clock values every time units, where is the uncertainty in message transmission time. By means of MAC layer time stamping, can be kept as small as 10^{−6} s. However, powering up the radio and transmitting a message can take up to about 0.1 s, i.e. nodes would have to exchange messages all the time!

On the upside, also lower bounds do not always have the final word. In practice, one can achieve much better synchronization than what the bound in an earlier study [51] states! The issue here is that the lower bound holds *in the worst case*, which is unlikely to occur in practice. Instead of clock drift and message delays being arranged in the most adverse manner, typically the speed of clocks changes at a comparatively slow pace and the message transmission times follow a benign probability distribution. Assuming that clock rates change only slightly within *kB* time (for and *B* being the average time between a node sending/receiving a message), this can be exploited in order to achieve a maximal clock skew of w.h.p. [52].^{10} This bound is asymptotically tight for the more optimistic model, and the claimed upper bound could be verified in a real-world network.

## 6. Conclusion

The study of the complexity of distributed algorithms leads to fascinating and deep research questions. In the context of wireless sensor networks, these research questions remain theoretically interesting, but they obtain an additional practical dimension. In this work, we have focused on characterizing the power of distributed algorithms vis-à-vis some fundamental network coordination tasks that are relevant in wireless sensor networks. We have shown that this power very much depends on the underlying network model.

## Acknowledgements

This work was partially supported by the Swiss National Science Foundation (SNSF). We would like to thank Thomas Moscibroda for his valuable input in an early stage of this manuscript. We are grateful to Johannes Schneider and Jukka Suomela for helpful comments and suggestions on how to improve the paper in a late stage.

## Footnotes

One contribution of 11 to a Theme Issue ‘Sensor network algorithms and applications’.

↵1 The relation between distributed systems and orthodox monolithic systems is sometimes compared with the relation between sociology and psychology.

↵2 We recommend a survey by Suomela [2] on strictly local problems, i.e. problems that can be solved by looking at the graph topology and inputs up to a constant number of hops.

↵3 Without loss of generality, we always assume

*G*to be connected.↵4 For example, if

*n*=30 and*c*=10, it is three times more likely to hit the lottery jackpot*twice in a row*than to see the algorithm not terminating within the stated number of rounds.↵5 Nevertheless, one can accomplish surprising things. For instance, with a clever trick, algorithm 3 can be implemented in such a way that messages contain merely a constant number of bits!

↵6 The function describes the number of times one needs to apply the logarithm to

*n*until the result is at most 1. This ‘inverse tower function’ grows extremely slowly; it is at most 5 for any*n*up to 2^{65536}, a value compared with which the estimated number of atoms in the universe looks tiny.↵7 In contrast, despite all efforts, the (randomized as well as deterministic) asymptotic time complexities for (

*Δ*+1)-colouring in general graphs still elude us. For colouring, the strongest known lower bound is just the one for rings, a mere rounds! Closing this super-exponential gap is among the most prominent open problems in distributed computing.↵8 Indeed, such a time division multiple access (TDMA) protocol may not be needed. In the example of the Aloha protocol, it was shown that one can run slotted protocols also in systems where nodes do not have a common notion of time, at the price of a factor 2. This technique can be generalized to other TDMA MAC protocols. In theory, a factor of 2 does not sound prohibitive; in practice, however, it matters!

↵9 Dozer [48], for instance, is a sensor network protocol stack that enables long-term functionality of an unattended sensor network by powering down nodes' radios whenever possible.

↵10 It is assumed that the considered time interval is no larger than . Since the worst case always has a positive probability of occurring, it is impossible to beat the worst-case lower bound of

*Ω*(*D*) for all times with positive probability.

- This journal is © 2011 The Royal Society