## Abstract

This paper surveys the use of geometric methods for wireless sensor networks. The close relationship of sensor nodes with their embedded physical space imposes a unique geometric character on such systems. The physical locations of the sensor nodes greatly impact on system design in all aspects, from low-level networking and organization to high-level information processing and applications. This paper reviews work in the past 10 years on topics such as network localization, geometric routing, information discovery, data-centric routing and topology discovery.

## 1. Introduction

Networked embedded sensors provide a unique opportunity for real-time, large-scale, high-resolution environmental monitoring. Such systems are becoming ubiquitous across many activities important to our economy and life, including: manufacturing and industrial sensing; traffic and power grid management; wildlife, agriculture and environmental monitoring; hospital operations and patient observation; and all the way through to target tracking, battlefield awareness and other military applications.

The close relationship of sensor nodes with their embedded physical space imposes a unique geometric character on such systems. The physical locations of the sensor nodes greatly impact on system design in all aspects, from low-level networking and organization to high-level information processing and applications. From a networking point of view, node placement clearly influences network connectivity and sensing coverage, which subsequently affects basic network organization, such as clustering and localization, as well as naming and routing in the network. From an application point of view, sensor readings exhibit spatial correlations that can be exploited for data compression, approximation and validation. This review surveys algorithms that exploit geometric structures for sensor network operation and design. It is worth noting, however, that classical geometric techniques assume precise geometric information (e.g. node locations) [1], which is often lacking in sensor deployments, or operate in a heavily centralized fashion. Their adaptation to the sensors in the sensor network context raises many novel challenges. In particular, this review covers the following set of research problems.

—

*Network localization.*As wireless communications in sensor networks are short-range, two nodes can be connected by a link only when they are in close proximity. The problem of network localization is to discover the sensor node positions from the graph connectivity, possibly using additional distance or angle measurements.—

*Geometric routing.*In resource-constrained and highly volatile networks, traditional routing table approaches cannot be applied. Geographical routing and its many variations, using node locations, provide alternative scalable solutions. A number of geometric ideas have been developed for routing using only local information and greedy decisions, even when true node geolocations are unavailable. We also cover recent research work for load-balanced routing and routing using landmarks.—

*Information discovery and data-centric routing.*Emerging applications of ubiquitous sensor deployments in spaces where people live and work envision a smart environment where situations are continual assessed and responses generated in a real-time manner. The sensors serve two purposes: discovering/detecting the events of interest; and forming a supporting infrastructure for distributed resources/users to be informed about and act upon the detected events. In this setting, sensors that detect interesting data are named as*information producers*. Users in the same space who search for such sensor data are called*information consumers*. Unfortunately, neither information producers nor consumers are directly aware of each other, unless some information brokerage mechanism brings them together. We survey various information processing, discovery and brokerage schemes, for different application requirements (query latency, storage, structures in the data, user query patterns), that best serve users' requests in a resource-constrained network. We also cover distributed data aggregation schemes.—

*Network topology discovery.*When a sensor network grows large in scale, it is unrealistic to assume that nodes are placed uniformly in a regularly shaped region. Terrain variations and obstacles often forbid sensor placement; random sensor deployment is also likely to lead to deployment ‘holes’ or ‘voids’. The problem of network topology discovery is detecting and representing non-trivial sensor network topology, and then using this knowledge to improve routing and information discovery/delivery in the network.

We note that there are a number of topics that also heavily involve geometric techniques, such as topology control [2,3] or geometry-based communication models, e.g. signal-to-interference-and-noise ratio [4,5]. These are not covered in this survey owing to space constraints. Interested readers are referred to the cited books or survey papers.

## 2. Network localization

Sensor location information is indispensable for both sensor data integrity and important network management issues such as coverage and data delivery. Traditional approaches to obtain location information mainly rely on global positioning systems (GPS) [6]. But GPS is not appropriate for large-scale sensor localization owing to its high cost, large form factor and environmental constraints. GPS requires direct line of sight to satellites and thus does not work indoors, underground and underwater, and often has poor accuracy in large metropolitan areas (owing to the ‘urban canyon’ problem). There has been a lot of study on localization algorithms that derive the locations of sensor nodes from local measurements such as distance and angle estimations between neighbours [7–17]. The distance between two communicating nodes can be estimated by received signal strength indicator or time-of-arrival techniques. Angles between adjacent edges can be measured by antenna arrays, multiple ultrasound receivers [18] or directional antennas.

Generally speaking, localization algorithms can be classified as anchor-based and anchor-free methods. Anchor-based methods assume that a (sometimes large) number of anchor nodes know their positions already [7–12,17]. In anchor-free methods, no node knows its absolute coordinates; the output is the relative positioning of the sensors, subject to a global translation and rotation.

### (a) Basic trilateration and triangulation

In the basic planar trilateration scheme, a node can determine its location with distance measurements to three anchor nodes that are not collinear [7–9]. For localization of a sensor network, we can repeat trilateration iteratively, called *iterative trilateration*. Nodes that obtain their locations become new anchor nodes and help in localizing others. Similar methods can also be performed by angle measurements [10,11]. For these incremental methods, two issues need to be addressed. The first is to deal with cascading error accumulation in large networks. One can adopt optimization techniques such as mass–spring relaxation to smooth out the error distribution, or use robust statistics to handle outliers in input measurements [19]. The other issue is to handle an insufficient number of initial anchor nodes. If the number of anchors is too small or the anchors are not well distributed, then some nodes may not be able to find three neighbouring anchor nodes to locate themselves. In this case, one can use distance estimations to anchor nodes via multi-hop paths [9] or adopt collaborative multi-lateration by solving a larger optimization problem [7,8].

When the sensor nodes are not uniformly distributed and the network contains big holes, using hop counts as estimates of the Euclidean distances may introduce large errors, as bending around hole boundaries may artificially inflate the Euclidean distance. In order to address this problem, paths that touch hole boundaries can be detected and then either eliminated from network localization [20] or amended to get a better distance approximation [21].

### (b) Graph rigidity and network localizability

Existing anchor-free algorithms take either the connectivity graph [12] or the distances between neighbouring sensor nodes [13–16] as input. One major challenge using this approach is localization ambiguity—when there are multiple, different localization solutions that satisfy all the distance constraints but are far from each other. Indeed, with range information, local optimization such as mass–spring relaxation techniques may get stuck at local minima [15,22].

The problem of whether a graph with given edge length constraints admits a unique embedding in the plane is studied in rigidity theory [23]. A graph is *rigid* in the plane if one cannot continuously deform the shape of the graph without altering the lengths of the edges. A graph is *globally rigid* if it admits a unique embedding in the plane, subject to global rotations and translations. The theory of graph rigidity in two dimensions is relatively well understood. There is a combinatorial condition, the Laman condition, to characterize graphs that are generically rigid. There is also an efficient algorithm, the pebble game [24], to test whether a graph is generically rigid in time *O*(*nm*), where *n* is the number of nodes and *m* is the number of edges. Similarly, both a combinatorial characterization of globally rigid graphs and polynomial algorithms for testing such graphs are known [25,26]. It is however not trivial to apply these rigidity results in the development of efficient localization algorithms. Given a graph with edge lengths specified, finding a valid graph realization in for a fixed dimension *d* is an NP-complete problem [27–29].

The pioneering work of using rigidity theory in network localization [13–15,30–33] focuses on identifying special graphs that do admit efficient localization algorithms. The first idea is to use trilateration graphs [15,30–32]—intuitively, the graphs derived by iterative trilateration methods. A trilateration graph is defined recursively. It is either a triangle or a trilateration graph with a trilateration extension, defined as adding an additional vertex with three edges to existing vertices. If the network contains a trilateration graph, then one can exhaustively search for the ‘seed’ triangle in the graph and greedily find the trilateration extensions. Then an incremental algorithm, i.e. iterative trilateration, can be adopted to find the realization of the network. The second idea is to examine *uniquely d-localizable graphs*. A graph with known edge lengths is called uniquely

*d*-localizable if there is a unique realization of the graph in and there is no non-trivial realization in with

*k*>

*d*. In other words, any realization in with

*k*>

*d*stays in an affine subspace of dimension

*d*in . For example, a generic simplex of

*d*+1 vertices is uniquely

*d*-localizable. For uniquely

*d*-localizable graphs, So & Ye [13] and Biswas & Ye [14] have shown that semi-definite programming (SDP) can be used to find the realization. However, it is not known whether there is a combinatorial characterization of graphs that are

*d*-localizable.

### (c) Tractability of network localization

From a theoretical point of view, one can formulate the localization problem as embedding a unit-disc graph (UDG) in the plane, if we assume that the communication graph follows the UDG model. In a UDG, two nodes are connected by an edge if and only if they are within distance 1. With purely the connectivity information, determining whether a combinatorial graph is a UDG is NP-complete, and thus finding such an embedding in the plane is also hard [34]. In fact, even a relaxed version of the problem is still hard. Kuhn *et al.* [35] proved that finding an embedding such that non-neighbouring pairs are at least 1 away and neighbouring pairs are within is NP-hard. Distance measurements of neighbouring nodes or angle measurements of neighbouring edges alone do not help either [27,28,36]. Not much is known on approximation algorithms for UDG embedding. So far the only positive result is an algorithm with an upper bound on the ratio of the longest distance between neighbouring pairs to the shortest distance between non-neighbouring pairs [35].

### (d) Global optimization methods

A number of localization methods use global optimization tools. These algorithms in general have the best accuracy but the drawback is that they require the global network topology and thus are centralized methods, besides being computationally expensive.

In multi-dimensional scaling (MDS) [12,37], the input is assumed to be a distance matrix on *all* pairs of nodes. By shifting to the centre, the distance matrix is reorganized to derive the Gram matrix *G*=*XX*^{T}, where *X* is the coordinate vector on all sensor nodes. Since the Gram matrix *G* is symmetric and positive semi-definite, we can perform eigen-decomposition and derive *G*=*VAV*^{T}, where *A* is the diagonal matrix of all eigenvalues and *V* is the matrix of the corresponding eigenvectors. Now we can obtain the node coordinates *X* as *VA*^{1/2}. If we require a two- or three-dimensional embedding, then we can take only the largest two or three eigenvalues. In practice, when it is difficult to get distance measurements for all pairs, it is common practice to use the minimum hop count to approximate the Euclidean distance [12]. In a network with uniform sensor density and a regular shape, MDS typically works very well. But its performance may deteriorate significantly when the network shape becomes complex [38].

Biswas & Ye [14] used SDP for network localization. The input is a graph with possible anchor nodes and distance measurements on the edges. The basic idea is to introduce the Gram matrix variables *Y* . Ideally, *Y* =*XX*^{T}, where *X* is the coordinate vector on all sensor nodes. This constraint is relaxed such that *Y* −*XX*^{T} should be a positive semi-definite matrix. Together with other distance constraints, it can be shown that the system is a semi-definite program and can be solved using standard interior point methods. Similarly, with angle measurements, one can formulate a linear program that solves for the node locations [36].

### (e) Cut, embed and paste

A family of practical algorithms uses a bottom-up approach by first localizing small components and assembling them together. Moore *et al.* [15] proposed the use of robust quadrilaterals as the atomic components. A quadrilateral is a complete graph on four nodes and the smallest globally rigid graph. When the distance measurements are noisy, the skinny quadrilaterals, when three of the four nodes are almost collinear, can possibly have flip ambiguities. Thus only a set of robust quadrilaterals, i.e. those that are fat, are used. The global layout is obtained by gluing locally identified robust quadrilaterals. Similarly, with ideas from rigidity theory to improve iterative multi-lateration on sparse networks, Goldenberg *et al.* [32] proposed to record, propagate and verify multiple possible locations of sensors to discover the true network layout. The idea of decomposing the network into small pieces, embedding them and then gluing them together can also be applied in a multi-resolution manner [39].

Lederer *et al.* [38] considered network localization using only connectivity information when the network shape is complex. The algorithm first selects landmarks on network boundaries with sufficient density, then constructs the landmark Voronoi diagram (in which all nodes closest to the same landmark are grouped in the same Voronoi cell^{1}) and its dual combinatorial Delaunay complex on these landmarks (in which an edge is constructed between two landmarks if they share common nodes). The key insight is that the combinatorial Delaunay complex is provably *globally rigid* and has a *unique* realization in the plane, provided that the landmarks are sampled according to the local geometric complexity of the sensor field. Thus an embedding of the landmarks by simply gluing the Delaunay triangles properly recovers the faithful network layout. With the landmarks nicely localized, the rest of the nodes can easily localize themselves using trilateration to nearby landmark nodes. This leads to a practical and accurate localization algorithm for large networks using only network connectivity.

## 3. Geometric routing

### (a) Geographical routing

In geographical routing, the physical locations of the sensor nodes are used to guide the path that a packet takes in the network. Geographical routing was originally proposed by Bose *et al.* [40], and independently by Karp & Kung [41].

Geographical routing has two modes: the greedy mode and the recovery mode.

#### (i) Greedy mode

In this mode, the node currently holding the packet ‘advances’ it towards the destination, based only on the location of itself, its immediate neighbours and the destination. The advance rule may be defined in many ways. Examples are: closest to destination [40–43], most forward within radius [42], nearest forward progress, nearest closer [44], geographical distance routing [45] and compass routing [46]. The most popular way of defining the advance is by decreasing the Euclidean distance to the destination. Greedy routing often suffices to deliver a packet in a dense network, but may fail in a sparse network, where the packet may reach a *local minimum* whose neighbours are all further away from the destination than itself.

#### (ii) Recovery mode

The recovery mode defines how to forward the packet at a local minimum. Some examples of the recovery methods are simple flooding [45], terminode routing [47], breadth-first search or depth-first search [48], the face algorithm [40] and perimeter routing [41].

Here we use a specific routing protocol, the greedy perimeter stateless routing (GPSR) protocol [41], to explain the two modes in detail. A nice summary of geographical routing schemes with a focus on some subtle issues is presented in Frey & Stojmenovic [49]. In GPSR, the greedy mode uses minimum distance to destination as the rule for making progress. To get out of a local minimum, the protocol maintains a planar and connected subgraph, e.g. the *Gabriel graph* or the *relative neighbourhood graph*, and applies routing along the faces of the subgraph that intersect the imaginary line segment between the source and the destination. Greedy routing, combined with perimeter routing, guarantees the delivery of a packet to the destination if there is indeed a way of doing so. In addition, the construction of the planar graphs, as well as the routing decisions, is localized, requiring only information from neighbouring nodes. This makes geographical routing very attractive for large-scale networks, as the state information maintained in each node and in the packet is nearly minimum.

#### (iii) Theoretical bounds on path stretch

The path stretch measures how much longer the routing path is compared with the shortest path. A lower bound proved by Kuhn *et al*. [50] states that any deterministic or randomized algorithm using only local information cannot find a path of length shorter than *Ω*(*k*^{2}) in the worst case, where *k* is the length of the shortest path. On the other hand, it is shown by Gao *et al*. [51] that greedy routing, in the case of successful delivery, uses *O*(*k*^{2}) hops. The GOAFR+ (greedy other adaptive face routing) family protocols [50,52], or, alternatively, using restricted flooding with doubling radii, both guarantee a worst-case path length of *O*(*k*^{2}).

#### (iv) Geographical routing in three dimensions

Durocher *et al.* [53] showed a negative result that it is impossible to find a constant state routing algorithm in three-dimensional unit-ball networks. Thus one cannot hope to directly carry the ideas to routing in three-dimensional networks.

#### (v) Geographical routing in practice

The recovery scheme with face routing or perimeter routing works nicely in theory but encounters a few problems in practice. The correct construction of a planar subgraph assumes accurate location information, which is hard to obtain, and that the communication graph is modelled by a UDG, which does not hold in practice. Experimental evaluations of the communication model on sensor nodes reveal various spatial and temporal radio irregularities [54,55]. In practice, the planar subgraph extracted may become disconnected or still contain crossing edges, as shown in earlier studies [56,57]. This subsequently causes the delivery rate to drop to about 68 per cent on a real testbed [57]. Later, a number of repair mechanisms have been proposed to remove crossing links by probing [57–59]. These mechanisms improve the delivery rate at the cost of higher processing overhead.

Another problem of face routing, especially in a sensor field with big holes, is that the routes are likely to ‘hug’ the hole boundaries. Thus the nodes on the hole boundaries deliver higher traffic than the average. The load imbalance may cause the boundary nodes to run out of battery earlier than the others, enlarging the size of the holes and disconnecting the network.

These two issues motivated researchers to go beyond geographical routing and design virtual coordinates that still retain the good properties of geographical routing but are more robust to radio models, topological changes and node failures.

### (b) Greedy routing with virtual coordinates

#### (i) NoGeo: using rubber-band representations

NoGeo [60] is one of the earliest proposals for efficient geometric routing with virtual coordinates. The virtual coordinates are defined by the rubber-band representation of a graph [61]. In short, suppose that some nodes (preferably on the network perimeter) are given fixed locations; the rubber-band representation is obtained by each node (that is not fixed) running an iterative algorithm of putting itself at the centre of mass of the neighbours' current locations (also known as Laplacian smoothing). This algorithm is known to reduce the total energy of the system, if all the edges are considered as rubber bands whose rest length is zero, and converge to the *rubber-band representation*. NoGeo has no guarantee on delivery by greedy routing on the virtual coordinates but works well in simulations. To understand that, one can see that any node (other than those with fixed locations) cannot be a reflex vertex, i.e. it cannot have all neighbours on the same half-plane. As a reflex vertex is bad for greedy routing, this observation intuitively explains the good performance of routing on rubber-band representations in practice.

#### (ii) Greedy embeddings

Motivated by NoGeo, researchers asked whether one can find an embedding of a given graph in the plane such that greedy routing always works [62]. Such an embedding is called a *greedy embedding* or a *greedy drawing*. It is known that not every graph admits a greedy embedding, such as a star with seven leaves [62]. Some graphs are known to have a greedy embedding—for example, graphs that contain a Hamiltonian path, complete graphs, 4-connected planar graphs (as they are Hamiltonian [63]), power diagrams [64] and Delaunay triangulations [65,66]. But it still remains open to fully characterize the class of graphs that admit a greedy embedding.

Papadimitriou & Ratajczak [62] showed that any planar 3-connected graph (a graph is 3-connected if it remains connected after the removal of any two nodes) has an embedding in three dimensions such that, with a special distance function, greedy routing always works. This is due to a famous graph theory result that a 3-connected planar graph is actually the edge graph of a three-dimensional convex polytope [67]. They also conjectured that a planar 3-connected graph has a greedy embedding in the plane. Dhandapani [68] discovered that, for any planar triangulation, there *exists* a greedy embedding in the plane using Schnyder's realizer concept. The conjecture was proved to be true in 2008 by Leighton & Moitra [69], and independently by Angelini *et al.* [70].

A tricky and important issue, as pointed out by Eppstein & Goodrich [71], is that the greedy embedding as proposed earlier (e.g. in the study of Leighton & Moitra [69]) may require -bit size coordinates. This makes such greedy drawing methods impracticable, as the space requirement is the same as for the standard routing table approach. Ideally, we would like *succinct greedy drawing* in the plane, in which the virtual coordinates are represented by bits. Angelina and co-workers [72] proved that this is not possible. They show that there are infinitely many trees whose greedy embeddings need exponential size grids. Recently, Goodrich & Strash [73] used the embedding presented by Leighton & Moitra [69] and developed succinct -bit coordinates (not Euclidean though) that support greedy routing using a different metric function. In a similar way, He & Zhang [74] showed that the classical Schnyder drawing admits greedy routing with succinct coordinates under a special metric function.

Sarkar *et al.* [75] considered a triangulated domain with holes and produced an embedding such that all the holes are mapped to circular discs, using conformal mappings and discrete Ricci flows. Thus greedy routing never gets stuck at hole boundaries. One other good property is that the mapping can be computed with a distributed gossip-style algorithm.

Most of the embeddings are in low-dimensional (two- or three-dimensional) Euclidean spaces. Flury *et al.* [76] considered embedding a graph in an -dimensional space and the routing stretch is bounded by 1+*ε* for any *ε*.

#### (iii) Embedding in hyperbolic spaces

Kleinberg [77] showed that greedy routing in hyperbolic space is easy, as any *tree* has a greedy embedding in the hyperbolic plane. Unfortunately, the hyperbolic embedding in the study of Kleinberg [77] does not provide succinct coordinates either. Maymounkov [78] provides a greedy drawing method for three-dimensional hyperbolic space using vertices that can be represented with bits. Eppstein & Goodrich [71] proposed a method to embed a tree in a ‘balanced’ manner and therefore showed that any *n*-vertex connected graph can be embedded in two-dimensional hyperbolic space with succinct coordinates that support greedy routing.

### (c) Load-balanced routing

In large-scale sensor networks, it is critical to balance workload on different sensors, to prevent some nodes from running out of battery prematurely. For load-balanced routing, one formulation of the problem is to select routes such that the maximum load is minimized. Unfortunately, solving this problem is NP-hard (modelled as unsplittable flow problem) even in very simple networks (such as a grid). Approximation algorithms [79,80] have been developed for the problem. But they all require global coordination, and are thus inappropriate for low-resource sensor networks. In this section, we focus on the load-balancing issue of *greedy routing* solutions.

#### (i) Routing in a narrow band

Gao & Zhang [81] proposed a greedy routing scheme that achieves a routing stretch factor of 4 (compared with shortest-path routing) and a load-balancing ratio of 3 (compared with the optimal load-balanced routing), for wireless nodes distributed in a narrow strip of width , when the communication graph follows the UDG model. Even for this restricted scenario, it is still NP-hard to compute the most balanced routes.

#### (ii) Routing in a disc

In a simple network with sensors uniformly spread in a disc, shortest-path routing (or geographical greedy routing) will incur higher traffic load on the nodes near the centre. Popa *et al.* [82] proposed the curve-ball method to use stereographic projection to map the network to a hemisphere. Routing is performed by greedy routing using the spherical distance instead of the Euclidean distance. Intuitively, the paths now move away from the centre. The same idea is used by Li & Wang [83] to resolve routing congestion at the network centre.

In a recent work [84], the load-balancing property is studied using the projection of a 3-connected planar graph to a convex three-dimensional polyhedron as described in the Koebe–Andreev–Thurston embedding [67]. When the network is dense and with a proper embedding, the polyhedron is approximately a sphere. Thus greedy routing on the polyhedron also avoids the problem of causing a dense centre. Different from the curve-ball method, greedy routing on the convex polytope guarantees delivery [62].

#### (iii) Routing in a square

Mei & Stefa [85] studied load balancing for a square-shaped sensor network. The network is quadrupled by successive reflections in the *X* and *Y* axes and the network is now four times as big and has the topology of a torus. Each node is mapped to an image in each of the four copies. For routing, each of the four copies of the destination is chosen with equal probability. Routing is executed using greedy geographical routing on the torus, which is equal to reflecting back from the boundary in the original network. This method has good load-balancing properties and the centre is not overloaded, but at the cost of increasing the average traffic load, as the routing path may now be much longer.

#### (iv) Routing in the covering space

For a general network with possibly big holes, none of the previous methods work very well. In particular, most greedy routing methods, using geographical locations or other virtual coordinates, tend to follow the hole boundaries and heavily overload these nodes. Sarkar *et al.* [86] proposed to use the notion of covering space to deal with the problem. This builds on top of the virtual embedding [75] in which all boundaries are circular. Now we have a network with *k* circular holes. For each interior hole, one can take a Möbius transformation that essentially ‘reflects’ the network inwards with respect to the hole. This partially fills up the hole, except that there are *k* smaller circular holes. Now we can continue such transformations with respect to the remaining holes so that all the holes are eventually filled up, with infinitely many transformed copies of the original sensor field. In practice, we stop after a number of rounds when the holes left are sufficiently small. For routing in the covering space, a message can ‘enter’ the hole to route in another copy of the network, effectively reflecting on the hole boundary. The nodes on the boundary are now treated in the same way as the other nodes with respect to routing and the problem of them being overloaded is substantially alleviated.

### (d) Landmark-based routing

A family of routing algorithms use landmark schemes [87–90]. The idea of using landmarks for generating naming and routing schemes in large networks was initially proposed two decades ago, such as the landmark hierarchy by Tsuchiya [91]. The landmark-based routing schemes described shortly are all two-level hierarchy.

In the sensor network setting, a small subset of the nodes are selected as *landmarks*. The landmarks flood the network and every node records its hop count distances to these landmarks. The landmark distances are then used to generate virtual coordinates for the nodes. Routing is typically guided in a greedy way by a potential function on the landmark-based distances. Landmark-based schemes are favoured for their simplicity and independence of the dimensionality of the network.

#### (i) Gradient landmark-based routing

Gradient landmark-based routing [87] is concerned with routing in a sensor network with holes or with an irregular shape. A number of landmarks are selected, and the network is partitioned into Voronoi tiles such that nodes inside the same tile are closest to the same landmark. Two Voronoi tiles are adjacent if there are two neighbouring nodes in different Voronoi tiles. The tile adjacency graph, denoted by the *combinatorial Delaunay graph*, is abstracted and known to all the nodes, for global routing across tiles. The landmarks are selected to capture the global topology of the sensor field. Thus the number of landmarks is likely to be dependent on the topological complexity, such as the number of holes, typically much fewer than the number of sensors [92].

Each node *p* is given virtual coordinates, defined by the distance to its closest landmark *u* and the distance to *u*'s neighbouring landmarks in the Voronoi diagram. To route to the destination node, the source node will first consult with the combinatorial Delaunay graph to find a sequence of tiles to visit. The actual route will be composed of *inter-tile routing* (routing to an adjacent tile) and *intra-tile routing* (routing within a tile), both using greedy, local routing schemes based on landmark distances.

The same combinatorial Delaunay structure is used by Funke & Milosavljević [93,94] to account for inaccurate location information and small network holes and irregularities.

#### (ii) Beacon vector routing

Beacon vector routing (BVR) [88] uses a potential function that depends on the distances to the *k* landmarks closest to the destination, where *k* is a system parameter. Out of the *k* landmarks, the ones that are closer to the destination than to the source impose a ‘pulling’ force while the rest impose a ‘pushing’ force. The potential function is a combination of the two. BVR has received significant attention and has been selected in a number of cases as a comparison benchmark [90,95].

#### (iii) Gradient landmark descent routing

None of the previous methods guarantee routing stretch. Nguyen *et al.* [90] proposed a method with bounded stretch in the continuous domain. The landmarks are selected to be an *r*-sampling: any node is within hop count *r* from a landmark and no two landmarks are within distance *r* of each other. An *r*-sampling has better uniformity than random sampling and can also improve the performance of other methods such as BVR. Routing is repeated by greedy forwarding towards the landmark that *maximizes* the ratio of the distances to source and destination.

#### (iv) Small state, small stretch routing

Mao *et al.* [95] adopted the idea of compact routing schemes (proposed by Thorup & Zwick [96,97]). When the destination is far away, it is a reasonable approach to route towards the landmark closest to the destination. Only when a message gets close to the destination, when all landmarks are equally far away, do we need a separate scheme to deliver the message to the destination. In small state, small stretch (S4) routing, a local routing table is constructed for the last phase. The routing scheme guarantees worst-case stretch of 3 and routing table size of roughly for a uniform regular network of *n* nodes.

To conclude, we remark that a few other protocols, whose ideas are similar to one of these, were not covered in detail here [89,98]. The landmarks in these schemes are mainly used as points of reference, and do not serve the functionalities of gateway nodes (which attract traffic). The schemes described here differ in how routing to a distant or a nearby destination is designed and integrated.

### (e) Information discovery and data-centric routing

In data-centric routing, a node poses a query for data of certain types, and a routing algorithm must route the query to retrieve the relevant data in the sensor network. Data-centric routing is fundamentally different from traditional routing paradigms, as the problem also involves the discovery of the destination, the nodes that have the required data. A dual version is also possible, in which data are routed, looking for relevant queries. Both versions can be formulated as the problem of *information brokerage*, which matches and carries information between *information producers* (also known as sources), the nodes that perform data acquisition and event detection, and *information consumers* (also known as sinks), the nodes that search for or need this information.

This problem in sensor networks was initiated in the directed diffusion work of Intanagonwiwat *et al*. [99]. The sensor nodes locally process their data and organize them by attribute-value pairs. A query (called an interest) is disseminated to the entire network and matched data will be routed back, possibly along multiple paths. Reinforcement learning is used to prune the bad paths. A similar routing paradigm is also adopted in TinyDB [100] to support Structured Query Language (SQL) style aggregation queries on information distributed in the network. These two approaches aim at infrequent queries for streaming data type so that the cost of flooding can be justified and amortized over the following long-term data delivery. In the following, we describe a number of schemes that exploit the geometric properties of the network for more efficient information discovery and brokerage.

#### (i) Geographical hash tables

A parallel approach to data-centric routing is to adopt data-centric storage, which is aimed at large-scale networks with many simultaneously detected events that are not necessarily desirable for all users [101,102]. The idea is similar to distributed hash tables on the Internet [103–106]. A producer leaves data on rendezvous nodes for consumers to retrieve. Thus data across space and time can be aggregated at rendezvous nodes. In geographical hash tables (GHTs) [102], data are hashed by their data type to geographical locations. The nodes close to the hashed location serve as rendezvous nodes. Data and query delivery to the rendezvous nodes is implemented by geographical routing such as GPSR [41].

GHTs have greatly reduced the communication cost and energy consumption by avoiding network-wide flooding for information discovery. Their simplicity is also very appealing. However, the basic GHT scheme does not have *distance sensitivity*. In particular, if the producer is actually close to the data producer (although neither has knowledge of the other), we would like the data consumer to discover the data producer quickly. This is an attractive property in many applications, as information will be most useful, thus queried more frequently, in the spatiotemporal locale where it was collected.

In the study of Funke *et al*. [107], a discrete centre hierarchy [108] is built for distance-sensitive retrieval. The hierarchy is composed of clusters with a parent cluster, including all children clusters. A data item is hashed and stored at certain other nodes, called the *data servers*, chosen from nearby clusters and fewer servers in far-away but large clusters. The retrieval scheme starts to search in the local neighbourhood until it discovers a nearby data server. The total cost is proportional to the distance to the producer.

#### (ii) Double rulings

As data delivery from data source to a rendezvous node is implemented by multi-hop routing, it is natural to leave information hints along the trail that the data travel on, at no extra communication cost. A basic *double-rulings* scheme works as follows: data or data pointers are stored at nodes that follow a *replication curve*, while a data request travels along a *retrieval curve*. The crucial property is that any retrieval curve intersects the replication curve for the desired data. Thus successful retrieval can be guaranteed.

For an easy case, assume that the network is a two-dimensional grid. The information storage curves follow the horizontal lines. The information retrieval curves follow the vertical lines. We name this scheme *rectilinear double rulings* [109–111]. The spherical double-rulings scheme [112] has both GHTs and rectilinear double rulings as special cases. It considers the sensor field as the stereographic projection of a sphere, and the producer curve is a circle through the producer and the hashed location, which—in a stereographic projection—maps to the great circle of a sphere. Both double-rulings schemes are distance-sensitive—if the producer and consumer are distance *d* apart, the producer can find the data with cost *O*(*d*). The spherical double-rulings scheme also has many good properties as it allows natural data aggregates, supports range queries and is robust to node failures.

Double-rulings schemes can also be defined in the virtual coordinates space [113] or in a probabilistic manner (as in rumour routing [114]).

### (f) Data collection and aggregation

Many applications of sensor networks require that data be collected from all nodes of the network or a subset thereof. Most common approaches for data collection in a network are tree-based. In this subsection, we introduce a few geometric ideas that take a flat, non-tree approach and use sweeps or sensor-initiated probes for data collection and aggregation.

#### (i) Sweeps over sensor networks

The network sweep, proposed by Skraba *et al*. [115], is implemented by a narrow band of active nodes that ‘moves’ over the network by issuing invitations to new nodes to join the band, and dropping nodes that have already been processed and serve no other essential purpose.

Using simple local diffusion algorithms, one can pre-compute a certain *potential* at each node, essentially obtained by solving a simple partial differential equation, a Laplace equation with Dirichlet boundary conditions, over the network. The algorithm, in its simplest form, is just the familiar Gauss–Seidel iteration (each node sets its value to the average values of its neighbours). The construction guarantees that no local extrema where the sweep could get stuck are present in the field. In general, such potentials and their gradients are rather resilient to link connectivity changes in the network because their values integrate information from large areas of the network. They also smooth out local variations in node density. They are used by the sweep protocol to provide global guidance so that the wavefront propagates in an orderly way without self-collision while allowing the sweep control to remain completely local.

Such diffusion-based gradients have also been used for information routing and discovery [116]. In particular, the linear space algebraic structure of these gradients allows certain elegant optimizations on load balancing, Boolean query routing and more sophisticated aggregations.

#### (ii) Sparse aggregations

Most of the extant work assumes that the goal of aggregation is to accumulate data from *all* the network nodes, while in certain situations only a relatively small subset of the nodes have useful data to report, or want to participate in an aggregation. Clearly, for rare events, frequent polling is inefficient, as the query needs to reach all nodes, although most of them will not participate. A purely event-driven approach, where each node with a detection independently transmits a record of the event to a base station, may also not be optimal—consider a group of nearby source nodes each sending a message to the base station.

In Gao *et al*. [117], the problem is named *sparse aggregation* and the nodes with data are named *hot nodes*, which are unaware of each other. Key to the solution is a probe protocol that can be initiated independently by a node and involves emitting a few packets out of the node that follow certain paths in the network while leaving a trail behind, until the probes are terminated. These paths are chosen as the double-rulings paths (say horizontal and vertical rays) such that they support distance-sensitive discovery. As the probes encounter each other, the data from the nodes are aggregated. The two nodes can be considered as competing in a tournament and the winner of the tournament is decided based on the data carried or left behind by the probe packets. The losing node then has its probes recalled/terminated, while the probes of the winner continue to propagate. Eventually the champion probe, together with the final aggregated data, propagates to the network boundary, where a high-speed network is assumed to be available. The probes also implicitly build an aggregation tree on the hot nodes, in a bottom-up fashion. In the end, it can be shown that the weight of the aggregation tree, as well as the total message cost, is , where *T* is the weight of the minimum spanning tree of the hot nodes and *n* is the network size.

### (g) Range queries

For a typical range query, we are given a query region plus possibly a range of the sensor data, and then for all the sensors in the query region ask whether any sensor data are within the data range. This is a problem that has been studied a lot in computational geometry. Various distributed schemes have been proposed. In the case of a scalar field, one solution is to partition the information about large geographical regions into subsets according to smaller ranges of the field value, and store these subsets in different nodes. This is the approach taken in the Distributed Index for Features in Sensor networks (DIFS) system [118]. In the Distributed Index for Multi-dimensional data (DIM) system [119], a locality-preserving hash function is used to map portions of a multi-dimensional attribute space to sensors so that all data needed to answer a range searching query can be located conveniently. In the fractional cascading approach [120], information is stored so that more detailed information is available about data obtained in the spatio-temporal locality of the sensor where the query is injected—but without sacrificing the ability to query distant regions or times as well.

All of these schemes are designed to support range queries for static sensor data and essentially use a quadtree-type hierarchical space decomposition. For mobile data, constant updates to a fixed space partitioning make these schemes too costly—small movement of a target may lead to updates up to a high level of the quadtree and possibly updates on *all* sensors, if the mobile target happens to cross a high-level boundary.

For range queries of mobile targets, a naive solution is to do a geocast to the query region, flood the nodes in the query range and collect the target counts. Only the sensors that detect the target keep information about the target. This has an update cost proportional to the target movement distance, but a query cost proportional to the area of the range. Sarkar & Gao [121] recently proposed to store target movements implicitly using the differential 1-form such that the range queries have cost proportional to the perimeter of range *R*. Consider a planar graph embedded in the plane, and one target that lies within a face *f*_{0} and has a weight of *w*. The differential 1-form is represented by a function *ξ* on directed edges. That is, the value for *ξ*(*ab*) is the negation of the value for *ξ*(*ba*). The differential 1-form maintains the property that, for the face *f*_{0}, the summation of all the values of the edges on its boundary, in clockwise order, is *w*. It is easy to see that, for a query range that contains multiple faces, and if one sums up the edge weights in clockwise order for all the faces in the range, the edges that are completely within the range will be counted twice in opposite directions—so their weights cancel out. Thus a range query can simply ignore the edges that are in the interior of *R* and take the clockwise integration along the range boundary only. The differential 1-form provides great flexibility that allows low maintenance cost under both network dynamics and target movements. Network dynamics such as link addition and removal, or node insertion and removal, can be handled in constant time as well.

## 4. Network topology discovery

Large-scale sensor networks are unlikely to be uniform and regular owing to obstacles and terrain variations. A sensor network is regarded as a discrete sampling of an underlying geometric environment or space. Thus, the holes of the sensor field can be intrinsic features indicating important structures of the underlying environment such as inaccessible terrain or obstacles (buildings, lakes, etc.), as may be indicated on a building floor plan, a map of a transportation network, etc. Holes may also map to events that are being monitored by the sensor network. If we consider sensors with readings above a threshold to be ‘invalid’, then the hole boundaries are basically isocontours of the landscape of the attribute of interest. Holes are also important indicators of the general health of a sensor network, indicating, for example, traffic hot spots or sensors with low battery levels.

In order to discover the nodes on the ‘hole boundaries’ of the sensor field, various approaches with different assumptions have been investigated, including geometric methods that use the node locations [122], statistical methods [123,124], graph methods [125–127] and topological methods [128–131]. In this section, we describe some representative ideas to extract and represent the network hole structure.

### (a) Geometric and statistical methods

Geometric methods for boundary detection use geographical location information. The first paper on this topic, by Fang *et al.* [122], assumed that the nodes know their geographical locations and that the communication graph follows the UDG assumption. The definition of holes in Fang *et al*. [122] is intimately associated with geographical forwarding such that a packet can only get stuck at a node on hole boundaries. Fang *et al.* also proposed a simple algorithm that greedily sweeps along hole boundaries and eventually discovers boundary cycles.

Statistical methods for boundary detection make assumptions about the probability distribution of sensor deployment. One can detect boundary nodes as those with relatively low degree [123] or with relatively low ‘restricted stress centrality’ (the number of shortest paths going through) [124], in a network with uniform random sensor distributions. The major limitation of these two algorithms is the requirement on sensor distribution and typically very high node density.

### (b) Graph methods

Funke and Klein [126,127] developed a simple algorithm using only network connectivity information. The basic idea is to construct isocontours based on hop count from a root node and identify where the contours are broken. Kröller *et al.* [125] presented an algorithm searching for combinatorial structures called flowers and augmented cycles, in a quasi-UDG. A flower would ensure that the node in the centre of the flower is not on the boundary. Saukh *et al.* [132] considered the embedding together with the graph structure and defined a node to be on the boundary if there is a quasi-UDG embedding in which the node stays on the *geometric* boundary. Then they considered subgraph *patterns* that ensure a certain subset of the nodes to be always in the interior of any quasi-UDG embedding. Only the nodes that are certified to be in the interior will not be on the boundary. The patterns can be considered as generalizations of the flower structure in Kröller *et al.* [125].

### (c) Topological methods

Ghrist and co-workers [128–130] took an algebraic topology approach to study the sensor coverage problem: check whether the domain is entirely covered by the union of the sensing ranges of all the sensors, assuming each node has a disc sensing range. They abstracted the coverage problem by forming the Čech complex *C*(*r*), in which the *k*-simplices correspond to non-empty intersections of the sensing ranges of *k*+1 distinct sensors, with *r* as the radius of the sensing range. Now, by the nerve theorem, when the sensing ranges and all their non-empty finite intersections are contractible, the union of the sensing ranges has the homotopy type of the covered domain. Thus the detection of sensing holes is directly solved by checking whether the domain boundary cycle is null-homologous in the Čech complex. However, the computation of a Čech complex requires the location of the sensor nodes and expensive computations for the common intersection of discs. Instead, they showed that one can use two Vietoris–Rips complexes to sandwich the Čech complex. A Rips complex contains a *k*-simplex if the *k*+1 vertices are pairwise within distance *r*. In particular, the following inclusion relationship holds: , for . Therefore, the homology of the Rips complex gives sufficient but not necessary conditions for the coverage holes in the Čech complex—if there is a non-trivial cycle in , then there must be a cycle in *C*(*r*); if there are no non-trivial cycles in then *R*(*r*) provides full coverage.

In a different approach [128], the structure of the homotopy classes of the shortest paths from a single node is used to detect holes. Holes in a sensor field create irregularities in hop count distances. This phenomenon gives rise in the continuous setting to the *cut locus* [19], defined as the collection of points with distinct shortest paths of different homotopy type—paths that cannot be ‘continuously deformed’ into one another but get around the hole(s) in different ways and terminate by touching each other, trapping the hole(s) between them. The nodes in a cut can be easily identified, as they have the property that their common ancestor in the shortest-path tree is fairly far away, at the other side of the hole. One nice property of the cut detection is that the operations can be performed independently and locally at each pair of adjacent nodes, allowing natural parallelism.

## Footnotes

One contribution of 11 to a Theme Issue ‘Sensor network algorithms and applications’.

↵1 One node may have multiple closest landmarks and thus may stay in multiple Voronoi cells.

- This journal is © 2011 The Royal Society