## Abstract

The notion of probabilistic computation dates back at least to Turing, who also wrestled with the practical problems of how to implement probabilistic algorithms on machines with, at best, very limited access to randomness. A more recent line of research, known as derandomization, studies the extent to which randomness is superfluous. A recurring theme in the literature on derandomization is that probabilistic algorithms can be simulated quickly by *deterministic* algorithms, if one can obtain *impressive* (i.e. superpolynomial, or even nearly exponential) circuit size lower bounds for certain problems. In contrast to what is needed for derandomization, existing lower bounds seem rather pathetic. Here, we present two instances where ‘pathetic’ lower bounds of the form *n*^{1+ϵ} would suffice to derandomize interesting classes of probabilistic algorithms. We show the following:

— If the word problem over

*S*_{5}requires constant-depth threshold circuits of size*n*^{1+ϵ}for some*ϵ*>0, then any language accepted by uniform polynomial size probabilistic threshold circuits can be solved in subexponential time (and, more strongly, can be accepted by a uniform family of deterministic constant-depth threshold circuits of subexponential size).— If there are no constant-depth arithmetic circuits of size

*n*^{1+ϵ}for the problem of multiplying a sequence of*n*3×3 matrices, then, for every constant*d*, black-box identity testing for depth-*d*arithmetic circuits with bounded individual degree can be performed in subexponential time (and even by a uniform family of deterministic constant-depth AC^{0}circuits of subexponential size).

## 1. Introduction

Alan Turing was interested in probabilistic computation, on both practical and theoretical levels. In 1950, he explicitly proposed the notion of extending deterministic computing devices by providing access to a ‘random element’ [1]. Note that this was roughly contemporaneous with the development of Monte Carlo methods [2]. Turing also participated in the development of the Mark I computer project at Manchester University, and according to some sources [3] he pushed for the inclusion of a random noise generator at the machine instruction level (although this feature was not successful). Thus, from almost the very beginning of the mathematical study of computation, there has been interest in probabilistic computation as well as an appreciation of the obstacles that lie in the way of practical implementations of probabilistic algorithms.

One promising avenue for dealing with the scarcity of truly random bits is to show that, in many cases, there is no reason to use randomness at all. Hardness-based derandomization is one of the success stories of the past quarter century. The main thread of this line of research dates back to the work of Shamir [4], Yao [5] and Blum & Micali [6], and involves showing that, given a suitably hard function *f*, one can construct pseudorandom generators and hitting-set generators. Much of the progress on this front over the years has involved showing how to weaken the hardness assumption on *f* and still obtain useful derandomizations [7–22]. In rare instances, it has been possible to obtain *unconditional* derandomizations using this framework; Nisan [23], Nisan & Wigderson [24] and Viola [25] showed that uniform families of probabilistic AC^{0} circuits can be simulated by uniform deterministic AC^{0} circuits of size . More often, the derandomizations that have been obtained are conditional, and rely on the existence of functions *f* that are hard on average. For certain large complexity classes (notably including #P, PSPACE and exponential time), various types of random self-reducibility and hardness amplification have been employed to show that such hard-on-average functions *f* exist in if and only if there is some problem in that requires large Boolean circuits [7,9].

A more recent thread in the derandomization literature has studied the implications of *arithmetic* circuit lower bounds for derandomization. Kabanets and Impagliazzo [26] showed that, if the permanent requires large *arithmetic circuits*, then the probabilistic algorithm to test if two arithmetic *formulae* (or, more generally, two arithmetic circuits of polynomial degree) are equivalent can be simulated by a quick deterministic algorithm. Subsequently, Dvir *et al.* [27] built on the techniques of Kabanets and Impagliazzo, to show that, if one could present a multilinear polynomial (such as the permanent) that requires depth-*d* arithmetic formulae of size 2^{nϵ}, then the probabilistic algorithm to test if two arithmetic circuits of depth *d*−5 are equivalent (where, in addition, the variables in these circuits have degree at most ) can be derandomized to obtain a deterministic algorithm for the problem.

In contrast to the very strong lower bounds that are needed for both of these threads of derandomization, existing lower bounds seem rather pathetic: linear size lower bounds for general circuits [28], nearly cubic lower bounds for formula size [29], nearly quadratic lower bounds for branching programs [30] and *n*^{1+cd} for depth-*d* threshold circuits [31].

In this paper, we combine these two threads of derandomization with the recent insight that, in some cases, extremely modest-sounding (or even ‘pathetic’) lower bounds can be amplified to obtain superpolynomial bounds [32]. In order to carry out this combination, we need to identify and exploit some special properties of certain functions in and near NC^{1}.

— The word problem over

*S*_{5}is one of the standard complete problems for NC^{1}[33]. Many of the most familiar complete problems for NC^{1}have very efficient*strong downward self-reductions*[32]. We show that the word problem over*S*_{5}, in addition, is*randomly self-reducible*. (This was observed previously by Goldwasser*et al.*[34].) This enables us to transform a ‘pathetic’*worst-case*size lower bound of*n*^{1+ϵ}on constant-depth threshold circuits, to a superpolynomial size*average-case*lower bound for this class of circuits. In turn, by making some adjustments to the Nisan–Wigderson generator, this average-case hard function can be used to give uniform subexponential derandomizations of probabilistic TC^{0}circuits.— Iterated multiplication of

*n*3×3 matrices is a multilinear polynomial that is complete for arithmetic NC^{1}[35]. In the Boolean setting, this function is strongly downward self-reducible via self-reductions computable in TC^{0}[32]. Here we show that there is a corresponding*arithmetic*self-reduction; this enables us to amplify a lower bound of size*n*^{1+ϵ}for constant-depth arithmetic circuits, to obtain a superpolynomial lower bound for constant-depth arithmetic circuits. Then, by building on the approach of Dvir*et al.*[27], we are able to obtain subexponential derandomizations of the identity testing problem for a class of constant-depth arithmetic circuits.

The rest of the paper is organized as follows. In §2, we give some preliminary definitions and notation. In §3, we show how to convert a modest worst-case hardness assumption to a strong average-case hardness separation of NC^{1} from TC^{0}. We also present slightly weaker worst-case-to-average-case reductions for *L* and for the classes GapL and GapNC^{1}. In §4, we build on §3*a*, to give a uniform derandomization of probabilistic TC^{0} circuits. Finally, in §5 we prove our derandomization of a special case of polynomial identity testing under a modest hardness assumption.

## 2. Preliminaries

This paper will mainly discuss NC^{1} and its subclass TC^{0}. The languages in NC^{1} are accepted by families of circuits of depth that are built with fan-in two AND and OR gates, and NOT gates of fan-in one. For any function *s*(*n*), TC^{0}(*s*(*n*)) consists of languages that are decided by constant-depth circuit families of size at most *s*(*n*) which contain only unbounded fan-in MAJORITY gates as well as unary NOT gates: ; . The definitions of AC^{0}(*s*(*n*)),AC^{0} and AC^{0}(SUBEXP) are similar, although MAJORITY gates are not allowed, and unbounded fan-in AND and OR gates are used instead.

We allow circuits to accept inputs not only from the Boolean alphabet {0,1} but also from any finite alphabet *Σ*. This is done by having *σ*-input gates for each *σ*∈*Σ*; a *σ*-input gate *g* connected to input symbol *x*_{i} evaluates to 1 if *x*_{i}=*σ*, and evaluates to 0 otherwise.

As is usual in arguments in derandomization based on the hardness of some function *f*, we require not only that *f* not have small circuits in order to be considered ‘hard’, but furthermore we require that *f* needs large circuits at *every* relevant input length. This motivates the following definition.

### Definition 2.1

*Let A be a language, and let D*_{A} *be the set* {*n*:*A*∩*Σ*^{n}≠∅}. *We say that A*∈io-TC^{0}(*s*(*n*)) *if there is an infinite set I*⊆*D*_{A} *and a language B*∈TC^{0}(*s*(*n*)) *such that, for all n*∈*I*,*A*_{n}=*B*_{n} (*where, for a language C*, *we let C*_{n} *denote the set of all strings of length n in C*). *Similarly, we define* io-TC^{0} *to be* .

Thus *A* requires large threshold circuits on *all* relevant input lengths if *A*∉io-TC^{0}. (A peculiarity of this definition is that if *A* is a *finite* set, or *A*_{n} is empty for infinitely many *n*, then *A*∉io-TC^{0}. This differs significantly from most notions of ‘io’ circuit complexity that have been considered, but it allows us to consider ‘complex’ sets *A* that are empty on infinitely many input lengths; the alternative would be to consider artificial variants of the ‘complex’ sets that we construct, having strings of every length.)

Probabilistic circuits take an input divided into two pieces, the actual input and the random inputs. We say that an input *x* is accepted by such a circuit *C* with probability *p* if, with respect to the uniform distribution *U*_{D} over the domain *D* from which the random inputs are drawn, . We say that input *x* is rejected by *C* with probability *p* if . When defining probabilistic complexity classes (such as probabilistic NC^{1}), we restrict the random inputs to come from {0,1}*, and we say that a family of probabilistic circuits *C*_{n} accepts *A* if, for all *x*∈*A*_{n}, *C*_{n} accepts *x* with probability , and for all other *x*, *C*_{n} rejects *x* with probability .

The standard uniformity condition for small complexity classes is called DLOGTIME-uniformity. In order to provide its proper definition, we need to mention the *direct connection language* associated with a circuit family.

### Definition 2.2

*Let* *be a circuit family. The direct connection language L*_{DC} *of* *is the set of all tuples having the form either* 〈*n*,*p*,*q*,*b*〉 *or* 〈*n*,*p*,*d*〉, *where*:

—

*if q*=*ϵ*,*then b is the type of gate p in C*_{n};—

*if q is the binary encoding of k*,*then b is the k**th input to p in C*_{n};*and*—

*the gate p has fan-in d in C*_{n}.

The circuit family is DLOGTIME-uniform if there is a deterministic Turing machine that accepts *L*_{DC} in linear time. For any circuit complexity class C, uC is its uniform counterpart, consisting of languages that are accepted by DLOGTIME-uniform circuit families. For more background on circuit complexity, we refer the reader to the textbook by Vollmer [36]. The term ‘uniform derandomization’ in the title of this paper refers to the fact that we are presenting uniform circuit families that compute derandomized algorithms; this should not be confused with doing derandomization based on uniform hardness assumptions.

The classes NC^{1}, GapL and GapNC^{1} all have complete problems under AC^{0}-Turing reducibility (see [36] for definitions of these terms). All references to ‘completeness’ refer to this notion of reducibility.

A particularly important complete language for NC^{1} is the word problem WP for *S*_{5}, where *S*_{5} is the symmetric group over five distinct elements [33]. The input to the word problem is a sequence of permutations from *S*_{5} and it is accepted if and only if the product of the sequence evaluates to the identity permutation. The corresponding *search* problem FWP is required to output the exact result of the iterated multiplication. A closely related *balanced* language is BWP, which stands for the balanced word problem.

### Definition 2.3

*The input to* BWP *is a pair* 〈*w*_{1}*w*_{2}…*w*_{n},*S*〉, *where for all i*∈ [1,…,*n*], *w*_{i}∈*S*_{5}, *S*⊆*S*_{5} *and* |*S*|=60. *The pair* 〈*w*_{1}*w*_{2}…*w*_{n},*S*〉 *is in* BWP *if and only if* .

It is easy to verify that BWP is complete for NC^{1} as well.

In the following sections, let FWP_{n} be the subproblem of FWP where the domain is restricted to inputs of length *n* and let BWP_{n} be . Note that BWP_{n} accepts exactly half of the instances in since |*S*_{5}|=120.

The following simplified version of Chernoff's bound turns out to be useful in our application.

### Lemma 2.1 (Chernoff's bound)

*Let X*_{1},…,*X*_{m} *be independent and identically distributed 0–1 random variables with E*[*X*_{i}]=*p*. *Let* . *Then for any* 0<*δ*≤1,

## 3. The existence of an average-case hard language

### (a) *Worst-case-to-average-case reduction for* NC^{1}

In this subsection, we use random self-reducibility to show that, if NC^{1}≠TC^{0}, then there are problems in NC^{1} that are hard on average for TC^{0}. First we recall the definition of hardness on average for decision problems.

### Definition 3.1

*Let U*_{D} *denote the uniform distribution over all inputs in a finite domain D*. *For any Boolean function* , *f is* (1−*ϵ*)-*hard for a set of circuits S if, for every C*∈*S*, *we have that* .

We will sometimes abuse notation by identifying a set with its characteristic function. For languages to be considered hard on average, we consider only those input lengths where the language contains some strings.

### Definition 3.2

*Let Σ be an alphabet. Consider a language* , *where L*_{n}=*L*∩*Σ*^{n}, *and let D*_{L}={*n*:*L*_{n}≠∅}. *We say that L is* (1−*ϵ*)-*hard for a class of circuit families* *if D*_{L} *is an infinite set and, for any circuit family* {*C*_{n}} *in* , *there exists m*_{0} *such that, for all m*∈*D*_{L} *such that m*≥*m*_{0}, .

The following theorem shows that if FWP∉io-TC^{0}, then BWP is hard on average for TC^{0}.

### Theorem 3.1

*There exist constants c,δ>0 and 0<ϵ<1 such that, for any constant d>0, if* FWP_{n} *is not computable by* TC^{0}*(δn(s(n)+cn)) circuits of depth at most d+c, then* BWP_{n} *is (1−ϵ)-hard for* TC^{0} *circuits of size s(n) and depth d.*

### Proof.

Let . We prove the contrapositive. Assume that there is a circuit *C* of size *s*(*n*) and depth *d* such that . We first present a probabilistic algorithm for FWP_{n}.

Let the input instance for FWP_{n} be *w*_{1}*w*_{2}…*w*_{n}. Generate a sequence of *n*+1 random permutations *u*_{0},*u*_{1},…,*u*_{n} in *S*_{5} and a random set *S*⊆*S*_{5} of size 60. Let *ϕ* be the sequence . Note that *ϕ* is a completely random sequence in .

Let us say that *ϕ* is a ‘good’ sequence if, for all ∀*S*′⊂*S*_{5} with |*S*′|=60, *C*(〈*ϕ*,*S*′〉)= BWP_{n}(〈*ϕ*,*S*′〉).

If we have a ‘good’ sequence *ϕ* (meaning that for *every* set *S*′ of size 60, *C* gives the ‘correct’ answer BWP_{n}(*ϕ*,*S*′) on input (*ϕ*,*S*′)), then we can easily find the unique value *r* that is equal to , where *ϕ*_{i}=*u*_{i−1}*w*_{i}*u*_{i}, as follows:

— If

*C*(*ϕ*,*S*)=1, then it must be the case that*r*∈*S*. Pick any element*r*′∈*S*_{5}\*S*and observe that*r*is the only element such that*C*(*ϕ*,(*S*\{*r*})∪{*r*′})=0.— If

*C*(*ϕ*,*S*)=0, then it must be the case that*r*∉*S*. Pick any element*r*′∈*S*and observe that*r*is the only element such that*C*(*ϕ*,(*S*\{*r*′})∪{*r*})=1.

Thus the correct value *r* can be found by trying all such *r*′. Hence, if *ϕ* is good, we have
Produce as output the value .

Since , a standard averaging argument shows that at least three-quarters of the sequences in are good. Thus with probability at least , the probabilistic algorithm computes FWP_{n} correctly. The algorithm can be computed by a threshold circuit of depth *d*+*O*(1) since the subroutines related to *C* can be invoked in parallel and, moreover, the preparation of *ϕ* and the aggregation of results of subroutines can be done by constant-depth threshold circuits. Its size is at most 121*s*(*n*)+*O*(*n*) since there are 121 calls to *C*. Next, we put 10^{4}*n* independent copies together in parallel and output the majority vote. Let *X*_{i} be the random variable that the outcome of the *i*th copy is . By lemma 2.1, on every input the new circuit computes FWP_{n} with probability at least 1−(120^{−n}/2). Thus there is a random sequence that can be hardwired into the circuit, with the property that the resulting circuit gives the correct output on *every* input (and, in fact, at least half of the random sequences have this property). This yields a deterministic TC^{0} circuit computing FWP_{n} exactly which is of depth at most *d*+*c* and of size not more than (121×10^{4})*n*(*s*(*n*)+*cn*) for some universal constant *c*. Choosing *δ*≥(121×10^{4}) completes the proof. ■

### Definition 3.3 ([32], definition 5)

*Let f*:{0,1}*→{0,1}* *be a function. Let* *be functions such that m*(*n*)<*n for all n*, *and let d*≥1 *be an integer. We say that f*_{n} *is downward self-reducible to f*_{m(n)} *by a pure reduction of depth d and size s*(*n*) *if there is a circuit family* {*C*_{n}}_{n≥1} *such that, for each n*, *C*_{n} *computes f*_{n}, *is of depth at most d and size at most s*(*n*), *and consists of fan-in two AND and OR gates, unary NOT gates and oracle gates that compute function f on inputs of size at most m*(*n*). *If f is downward self-reducible to f*_{nϵ} *for some* 1>*ϵ*>0, *we will say that f is strongly downward self-reducible*.

The problem FWP is strongly downward self-reducible [32], proposition 3.6.

### Theorem 3.2 ([32])

*If there is a γ>0 such that* FWP∉io-TC^{0}*(n*^{1+γ}*), then* FWP∉io-TC^{0}*.*

### Proof.

We briefly sketch the proof. By [32], proposition 7, FWP_{n} is strongly downward self-reducible to FWP_{nϵ} by a DLOGTIME-uniform pure reduction of depth *O*(1/*ϵ*) and size *O*(*n*). Hence, as observed in [32], corollary 4.3, for any given *γ*>0, an io-TC^{0} circuit of polynomial size for FWP can be combined with the self-reduction for FWP (for a suitably chosen *ϵ*) to obtain an io-TC^{0} circuit of size *n*^{1+γ}. ■

Theorem 3.2 is not stated in terms of io-TC^{0} in [32], but the proof there shows that if there are infinitely many input lengths *n* where FWP has circuits of size *n*^{k}, then there are infinitely many input lengths *m* where FWP has circuits of size *m*^{1+γ}. The strong downward self-reducibility property allows small circuits for inputs of size *m* to be constructed by efficiently using circuits for size *n*<*m* as subcomponents.

Since FWP is equivalent to WP via linear size reductions on the same input length, the following corollary is its easy consequence.

### Corollary 3.1

*If there is a γ*>0 *such that* WP∉io-TC^{0}(*n*^{1+γ}), *then* FWP∉io-TC^{0}.

Combining corollary 3.1 with theorem 3.1 yields the average-case hardness of BWP from nearly linear size worst-case lower bounds for WP against TC^{0} circuit families.

### Corollary 3.2

*There exists a constant ϵ*>0 *such that, if there exists γ*>0 *such that* WP∉io-TC^{0}(*n*^{1+γ}), *then for any k and d there exists n*_{0}>0 *such that, when n*≥*n*_{0}, BWP_{n} *is* (1−*ϵ*)-*hard for any* TC^{0} *circuit of size n*^{k} *and depth d*.

Define the following Boolean function , where WPM_{n} stands for word problem over multi-set.

### Definition 3.4

*The input to* WPM_{n} *is a pair* 〈*w*_{1}*w*_{2}…*w*_{n},*v*_{1}*v*_{2}…*v*_{60}〉, *where for all i*∈[1,…,*n*], *w*_{i}∈*S*_{5} *and for all j*∈[1,…,60], *v*_{i}∈*S*_{5}. *Then* 〈*w*_{1}*w*_{2}…*w*_{n},*v*_{1}*v*_{2}…*v*_{60}〉∈WPM *if and only if there exists* .

BWP is the restriction of WPM_{n} to the case where all the *v*_{i} are distinct. Hence, WPM inherits the average-case hardness of BWP, since any circuit that computes WPM_{n} on a sufficiently large fraction of inputs also approximates BWP well. Formally, we can state the following lemma.

### Lemma 3.1

*There is an absolute constant* 0<*c*<1 *such that for every ϵ*>0, *if* BWP_{n} *is* (1−*ϵ*)-*hard for* TC^{0} *circuits of size n*^{k} *and depth d*, *then* WPM_{n} *is* (1−*cϵ*)-*hard for* TC^{0} *circuits of size n*^{k} *and depth d*.

### Proof.

Let . Note that *c* is the probability that a sequence of 60 permutations contains no duplicates and is in sorted order. Suppose there is a circuit *C* with the property that . Then the conditional probability that *C*(*x*)≠WPM(*x*) given that the last 60 items in *x* give a list in sorted order with no duplicates is at most *ϵ*. This yields a circuit having the same size, solving BWP with error at most *ϵ*, using the uniform distribution over its domain, contrary to our assumption. ■

### Corollary 3.3

*There exists a constant ϵ*>0 *such that if there exists γ*>0 *such that* WP∉io-TC^{0}(*n*^{1+γ}), *then for any k and d there exists n*_{0}>0 *such that, when n*≥*n*_{0}, WPM_{n} *is* (1−*ϵ*)-*hard for* TC^{0} *circuits of size n*^{k} *and depth* *d*.

Yao's XOR lemma [5] is a powerful tool to boost average-case hardness. We utilize a specialized version of the XOR lemma for our purpose. Several proofs of this useful result have been published. For instance, see the text by Arora and Barak [37] for a proof that is based on Impagliazzo's hardcore lemma [38]. For our application here, we need a version of the XOR lemma that is slightly different from the statement given by Arora and Barak. In the statement of the lemma as given by them, *g* is a function of the form . However, their proof works for any Boolean function *g* defined over any finite alphabet, because both the hardcore lemma and its application in the proof of the XOR lemma are insensitive to the encoding of the alphabet. Hence, we state the XOR lemma in terms of functions over an alphabet set *Σ*. The proof presented in [37] yields the following version of the XOR lemma:

For any Boolean function *g* over some domain *Σ*^{n}, define by *g*^{⊕m}(*x*_{1},*x*_{2},…,*x*_{m})=*g*(*x*_{1})⊕*g*(*x*_{2})⊕⋯⊕*g*(*x*_{m}) where ⊕ represents the parity function.

### Lemma 3.2 ([5])

*Let* *and θ*>2(1−*ϵ*)^{k}. *There is a constant c*>1 *that depends only on* |*Σ*| *such that if g is* (1−*ϵ*)-*hard for* TC^{0} *circuits of size s and depth d*, *then g*^{⊕k} *is* -*hard for* TC^{0} *circuits of size θ*^{2}*s*/*cn and depth d*−1.

Let *Σ*=*S*_{5}. The following corollary is an immediate consequence of corollary 3.3 and lemma 3.2.

### Corollary 3.4

*If there is a γ*>0 *such that* WP∉io-TC^{0}(*n*^{1+γ}), *then for any k*,*k*′ *and d there exists n*_{0}>0 *such that, when n*≥*n*_{0},(WPM_{n})^{⊕n} *is* -*hard for* TC^{0} *circuits of size n*^{k} *and depth d*.

Let . Note that it is a language in uNC^{1} and, moreover, it is decidable in linear time.

### Theorem 3.3

*If there is a γ>0 such that WP∉*io-TC^{0}*(n*^{1+γ}*), then for any integer k*>0, WP^{⊗} *is* *-hard for* TC^{0}*.*

### (b) *Worst-case-to-average-case reduction for* L

Here we show a similar worst-case-to-average-case connection as in the previous subsection, but for the class *L* which contains NC^{1}. Just as the word problem WP is complete for NC^{1}, the word problem *PWP* for *S*_{n} is complete for *L* [39].

### Definition 3.5

*The language* PWP *consists of all inputs* 〈*w*_{1},*w*_{2}…*w*_{n}〉, *where each w*_{i} *encodes a permutation over S*_{n} *and* *is the identity permutation*.

We will use a few different encodings of permutations. Encoding 1 is where the permutation is represented simply as an ordered list of *n* distinct numbers between 1 and *n*—the interpretation of this list as a permutation is that, if the *k*th element in the list is *j*, then *k* maps to *j* in the permutation. Encoding 2 is less economical and represents a permutation as an ordered list of *n* ordered pairs (*i*,*σ*(*i*)), where *i* ranges from 1 to *n* and *σ* is a permutation on [*n*]. The interpretation here is that the *i* maps to *σ*(*i*) in the permutation *σ*. Here, whether the list is ordered does not matter—all permutations of the ordered list represent the same permutation in *S*_{n}. The fact that each pair is ordered is of course critical.

Using the fact that sorting is in TC^{0} [36], we can convert from Encoding 1 to Encoding 2 or vice versa in TC^{0}. The conversion from Encoding 1 to Encoding 2 is trivial—simply prefix each number in the ordered list by its index in the list. To convert from Encoding 2 to Encoding 1, sort using the first element of the ordered pair as the key, and retain only the second element in the sorted list.

For technical reasons, we will use a third, even more verbose, encoding: Encoding 3. In Encoding 3, a permutation *σ* is represented as an ordered list of *n* integers each of which is *n* bits long. The permutation represented by this list is the identity permutation if there are two elements of the list that are equal, and is otherwise the permutation *σ* where *σ*(*i*) is the rank of the *i*th element in the list, i.e. its index in the sorted order. Note that a permutation in Encoding 1 can be trivially converted to Encoding 3 by prefixing each element by zeros. To convert from Encoding 3 to Encoding 1 in TC^{0}, first check that there are no ‘collisions’ in the list, i.e. a pair of identical elements. If there is a collision, output the identity permutation—this can be done in AC^{0}. If there are no collisions, transform the ordered list to an ordered list of ordered pairs formed by pairing each element of the original list with its index in the list. Sort according to the elements of the original list, but retain only the corresponding order on the indices. If the list survives the collision check, this yields a permutation in Encoding 1.

Using the fact that the composition of two TC^{0} functions is in TC^{0}, we get that we can convert from Encoding 2 to Encoding 3 and vice versa in TC^{0}.

By default, we will consider Encoding 3 to be in effect. If this is not the case, we will explicitly say so.

For the purpose of studying a worst-case-to-average-case connection for *L*, we need a balanced version of the language PWP.

### Definition 3.6

*The language BPWP is defined as follows*:

We assume a natural Boolean encoding of the inputs, where the only relevant inputs are of size , with *n* blocks of *n*^{2} bits each representing *w*_{1}…*w*_{n} according to Encoding 3 and the last block representing *i*. We assume without loss of generality that *n* is a power of 2—BPWP remains complete for *L* with this restriction.

### Lemma 3.3

*There is a family* {*C*_{n}} *of randomized* TC^{0} *circuits of polynomial size such that, for each n*, *the output of C*_{n} *is O*(*n*^{2}/2^{n})-*close in statistical distance to the uniform distribution over* (*σ*,*σ*^{−1}), *where σ is uniformly chosen in S*_{n}. *Moreover, when considered purely as bit strings, the first and second outputs of C*_{n} *are O*(*n*^{2}/2^{n})-*close to the uniform distribution*.

### Proof.

The circuits *C*_{n} are defined as follows. First, *n* numbers *x*_{1},*x*_{2},…,*x*_{n}, with each *x*_{i}, 1≤*i*≤*n*, being *n* bits long, are generated at random. As per Encoding 3, this *n*-tuple of numbers represents a permutation *σ*. The identity permutation is generated with probability at most 1/*n*!+*n*^{2}/2^{n}, since the probability of a collision is at most *n*^{2}/2^{n}. Every other permutation is generated with equal probability, which is at least (1−*n*^{2}/2^{n})1/*n*!. A simple computation of the statistical distance yields that the corresponding distribution on permutations is *O*(*n*^{2}/2^{n})-close to the uniform distribution on permutations *σ* over [*n*].

It now remains to be shown how to generate *σ*^{−1}. Sort the *x*-list *x*_{1},*x*_{2},…,*x*_{n}—this can be done in TC^{0}. Then convert *σ* from Encoding 3 to Encoding 2 in TC^{0}. Then we include circuitry that reverses the order of each ordered pair in the list, to yield the representation of *σ*^{−1} according to Encoding 2. Then implement the TC^{0} conversion from Encoding 2 to Encoding 1, and finally use the elements of the resulting list as ranks to select elements from the sorted *x*-list. We thus derive a representation of *σ*^{−1} according to Encoding 3, which is itself a permutation of the representation of *σ* according to Encoding 3. The last part of the lemma follows using this fact and the argument in the previous paragraph on the relative unlikelihood of the identity permutation being represented. ■

Lemma 3.3 gives us the ability to generate a random permutation and its inverse efficiently. This can be used to implement a random self-reduction in TC^{0} and hence derive a worst-case-to-average-case hardness amplification in *L* against TC^{0}.

### Theorem 3.4

*If L⊈*TC^{0}*, then there is a language in L that is (1−1/n*^{2}*)-hard for* TC^{0}*.*

### Proof.

The language for which we show a random self-reduction is BPWP. Assume that BPWP is not (1−1/*n*^{2})-hard for the complexity class TC^{0}. We show how to solve BPWP in TC^{0} based on this assumption. Since BPWP is complete for *L*, this implies that *L*⊆TC^{0}.

Let 〈*w*_{1},*w*_{2},…,*w*_{n},*i*〉 be an input instance for BPWP, where each *w*_{j} represents a permutation over *S*_{n} according to Encoding 3. We generate randomized queries to BPWP such that, for each query, the query with the last coordinate omitted is 1−*O*(*n*^{2}/2^{n})-close to the uniform distribution over binary strings. The queries are generated in TC^{0} as follows. Using lemma 3.3, generate *n* random permutations *σ*_{1},*σ*_{2},…,*σ*_{n} and their inverses. We do not know how to do this exactly, but it suffices to do it approximately as guaranteed by lemma 3.3. Form the permutations *s*_{1},*s*_{2},…,*s*_{n}, where, for each *j*, 1<*j*≤*n*, and *s*_{1}=*w*_{1}*σ*_{1}. To form these permutations, convert to Encoding 1 and use the fact that two permutations can be multiplied in TC^{0} when represented in Encoding 1. When converting back to Encoding 3, for each *j*, 1≤*j*≤*n*, sort the list of numbers representing *σ*_{j} and then use the representation of the permutation in Encoding 1 as ranks to select from the sorted list. Thus for each *j*, the resulting permutation is exponentially close to a random permutation of the list of numbers representing *σ*_{j}. Since the *σ*_{j} are all independent, we have that *s*_{1},…,*s*_{n} are all independent and exponentially close to the uniform distribution as bit strings. Now form the queries 〈*s*_{1},*s*_{2},…,*s*_{n},*k*〉 for each .

Since BPWP is not (1−1/*n*^{2})-hard for TC^{0}, the assumption on the distribution of queries implies that the TC^{0} approximators for BPWP return the correct answers for all queries with probability at least , for large enough *n*. Using the correct answers for all queries, we can reconstruct *s*_{1}*s*_{2}…*s*_{n} in Encoding 1. Also, we know that . Thus we can reconstruct *w*_{1}*w*_{2}…*w*_{n} in Encoding 1 with another multiplication in TC^{0} and then obtain its *ith* bit correctly with high probability. Finally, by a standard amplification step followed by Adleman's trick [40], this probabilistic circuit can be converted to a non-uniform one. ■

### (c) *Worst-case-to-average-case reduction for* GapL *and* GapNC^{1}

We first consider GapL. Let *Determinant* denote the problem of computing the integer determinant. This is a complete problem for GapL (see e.g. [41]). We show that if *Determinant* cannot be computed by TC^{0} circuits, then *Determinant* is somewhat hard on average for TC^{0} circuits. As TC^{0} circuits take Boolean input, we will encode each integer entry of an *n*×*n* integer matrix in binary. In order to keep the overall size of this Boolean input bounded, we will make the simplifying assumption that each entry of an *n*×*n* integer matrix instance of *Determinant* is at most *n* bits long. It is not hard to see that this version of *Determinant* is also complete for GapL. Since the proof of the next theorem is similar to the standard argument for proving random self-reducibility of *Permanent* [42], we omit some low-level details.

### Theorem 3.5

*Let* *denote the set of all n×n matrices where each integer entry has size at most n bits.*^{1} *If there is a* TC^{0} *circuit computing Determinant for at least a 1−1/n*^{5} *fraction of inputs from* *, then there is a* TC^{0} *circuit that computes Determinant for all inputs from* *.*

### Proof.

Let *C*′ denote the TC^{0} circuit that computes the integer determinant for 1−1/*n*^{5} fraction of inputs from . Our goal is to construct a TC^{0} circuit that computes the integer determinant for *every* input matrix . For input , we will describe a *non-adaptive* reduction from the problem of computing to computing for a sequence of random matrices , 1≤*i*≤*r*, where each *M*_{i} is nearly uniformly distributed in . To this end, pick a random matrix . This requires *n*^{3}+*n*^{2} independent unbiased coin flips to pick the *n*^{2} random *n*-bit entries of *A* along with their signs. Now, consider the polynomial . This is a degree-*n* polynomial over in the indeterminate *x*. Let *S*={1,2,…,*n*+1} be distinct interpolating points and consider for each *i*∈*S*. The matrix *M*+*Ai* is random. Unfortunately, it is not uniformly distributed in (indeed, even its support is not contained in ). Therefore, we cannot directly use the circuit *C*′ to compute for all *i*∈*S* and interpolate the value of . We shall get around this difficulty with Chinese remaindering.

By the Hadamard bound |det(*M*)|≤2^{n2}×*n*!<4^{n2} for all . We can pick *n*^{2} distinct -bit primes *p*_{1},*p*_{2},…,*p*_{n2} so that for each . We note that can be reconstructed from the residues , by Chinese remaindering and, moreover, this reconstruction can be done in DLOGTIME-uniform TC^{0} [43]. Hence, it suffices to describe a TC^{0} circuit family for computing for each , where *p* is an -bit prime.

For a matrix picked uniformly at random, consider . This is a degree-*n* polynomial in *x* modulo *p*. We will compute by interpolation. Let *S*={1,2,…,*n*+1} be the distinct interpolating points in ; in order to ensure that this yields more than *n* points in the finite field for each of the primes *p*_{i}, we will pick *p*_{i}>*n*+1 for all *i*. For any fixed *s*∈*S*, we note that the matrix *M*+*As* (mod *p*) is nearly uniformly distributed over *n*×*n* matrices with entries. To see this, consider a randomly picked integer entry *A*_{ij} of the matrix *A*, where *A*_{ij} is at most *n* bits long. Then for each *α*∈{0,1,…,*p*−1} it is easily seen that
Hence, an easy calculation shows that, for any specific matrix *B* in ,
It follows that the statistical distance of the distribution of *M*+*As* (mod *p*) over to the uniform distribution is bounded by 2^{−O(n)}.

However, as explained above, note that we cannot directly use the circuit *C*′ to compute since the entries of *M*+*As* can be bits long. Neither can we directly use *C*′ to compute , because the matrix *M*+*As* (mod *p*) has integer entries in the range {0,1,…,*p*−1} and these matrices are only a (*p*/2^{n})^{n2} fraction of matrices in . It is possible that the output of *C*′ is incorrect on all these matrices. We now describe the solution. Consider the onto mapping
defined by *f*(*M*)=*M* (mod *p*). Now, consider the probability distribution on defined by first picking a uniformly distributed random matrix and then picking a uniformly distributed random preimage matrix *M*∈*f*^{−1}(*M*′). A similar calculation as above shows that this distribution is exponentially close to uniform. Now, we briefly sketch how to obtain a (nearly) random preimage *M* of *M*′. Let 2^{n−1}−1=*q*_{p}*p*+ℓ_{p}, where *q*_{p} and ℓ_{p} are the quotient and remainder on dividing 2^{n}−1 by the prime *p*. Similarly, let −2^{n−1}=−*q*′_{p}*p*+ℓ′_{p}. For each entry *z*=*M*′_{ij} of the matrix *M*′ we uniformly pick a random positive integer *r*_{ij} in the range −*s*′_{p,z}≤*r*_{ij}≤*s*_{p,z} (where *s*′_{p,z}∈{*q*′_{p},*q*′_{p}−1} and *s*_{p,z}∈{*q*_{p},*q*_{p}−1}, so as to guarantee that *z*+*r*_{i,j} is an *n*-bit integer), and set
Clearly, the matrix *M* thus defined is nearly uniformly distributed in *f*^{−1}(*M*′) and a TC^{0} circuit can nearly randomly sample from *f*^{−1}(*M*′).

Now, for the random matrix consider its random preimage , for *s*∈{1,2,…,*n*+1}. By the above argument, it follows that each *M*_{s} is statistically close to the uniform distribution on . Hence, for each *s*∈*S*,
where the term 1/2^{O(n)} is subtracted in the above bound as it bounds the statistical distance of the *M*_{s} distribution from the uniform.

Hence with probability 1−1/*n*^{3} the circuit *C*′ correctly computes for all *s*∈*S*. Now, applying the fact that polynomial interpolation is TC^{0} computable [43], a TC^{0} circuit can recover , given for all *s*∈*S*.

Putting it together, for each prime *p*_{i} we have a randomized TC^{0} circuit that computes with probability 1−1/*n*^{3}. Finally, applying Chinese remaindering which is TC^{0} computable [43], we obtain a randomized TC^{0} circuit that computes with probability 1−1/*n*. As before, the random bits can be fixed after amplifying the success probability using Adleman's trick [40]. ■

We now briefly discuss a similar worst-case-to-average-case reduction for GapNC^{1}. The problem of computing the (1,1)*th* entry of the product of *n* 3×3 integer matrices is GapNC^{1} complete [44]. We show that, if this problem cannot be computed by TC^{0} circuits, then it is somewhat hard on average for TC^{0} circuits. As before, since we consider TC^{0} circuits that take Boolean inputs, we consider inputs (*M*_{1},*M*_{2},…,*M*_{n}) in a smaller set such that each *M*_{i} is a 3×3 matrix with integer entries that are at most *n* bits long. This restricted problem is also easily seen to be GapNC^{1} complete. In order to show the worst-case-to-average-case reduction we pick a uniform random instance and consider the instance (*M*_{1}+*A*_{1}*x*,*M*_{2}+*A*_{2}*x*,…,*M*_{n}+*A*_{n}*x*) for indeterminate *x*. Note that the (1,1)*th* entry of the matrix is a degree-*n* polynomial in *x*. Now, exactly along the same lines as the proof of theorem 3.5, we can show the following.

### Theorem 3.6

*Let* *denote all iterated matrix multiplication instances M*_{1}*,M*_{2}*,…,M*_{n}*, consisting of 3×3 integer matrices M*_{i} *whose entries are at most n bits long. If there is a* TC^{0} *circuit computing* *for at least 1−1/n*^{5} *inputs M*_{1}*,M*_{2}*,…,M*_{n} *in* *, then there is a TC*^{0} *circuit that computes* *for all inputs M*_{1}*,M*_{2}*,…,M*_{n} *in* *.*

## 4. Uniform derandomization

The Nisan–Wigderson generator is the canonical method for proving the existence of pseudorandom generators based on hard functions. It relies on the following definition of combinatorial designs.

### Definition 4.1 (Combinatorial designs)

*Fix a universe of size u*. *An* (*m*,*l*)-*design of size n on* [*u*] *is a list of subsets S*_{1},*S*_{2},…,*S*_{n} *satisfying*:

*for all i*∈[1,…,*n*], |*S*_{i}|=*m*;*for all i*≠*j*∈[1,…,*n*], |*S*_{i}∩*S*_{j}|≤*l*.

Nisan and Wigderson [24] invented a general approach for constructing combinatorial designs for various ranges of parameters. The proof given by Nisan and Wigderson gives designs where , and most applications have used that value of *l*. For our application, *l* can be considerably smaller, and, furthermore, we need the *S*_{i} to be very efficiently computable. For completeness, we present the details here. (Other variants of the Nisan–Wigderson construction have been developed for different settings; we refer the reader to one such construction by Viola [25], as well as to a survey of related work [25], remark 5.3.)

### Lemma 4.1 ([45])

*For l*>0, *the polynomial x*^{2×3l}+*x*^{3l}+1 *is irreducible over* .

### Lemma 4.2 ([24])

*For any integer n*, *any α such that* , *b*=⌈*α*^{−1}⌉ *and m*=⌈*n*^{α}⌉, *there is an* (*m*,*b*)-*design with u*=*O*(*m*^{6}). *Furthermore, each S*_{i} *can be computed within O*(*bm*^{2}) *time*.

### Proof.

Fix *q*=2^{2×3l} for some *l* such that *m*≤*q*≤*m*^{3}. Let the universe for the combinatorial design construction be . Let *p*_{1},*p*_{2},…,*p*_{n} be the lexicographically first *n* univariate polynomials of degree at most *b* over , and let be the graph of the polynomial *p*_{i}. Since *q*^{b}≥(*n*^{α})^{b}≥*n*, there are at least *n* such distinct polynomials *p*_{i} and hence such sets *S*_{i}. No two polynomials share more than *b* points, which implies the second condition of definition 4.1. The first condition holds because we could simply drop elements from any *S*_{i} without increasing the size of intersections.

The arithmetic operations in are performed in time because of the explicitness of the irreducible polynomial given by lemma 4.1. It is evident that, for any *i*∈[*n*], we can enumerate all elements of *S*_{i} in time . ■

### Lemma 4.3

*For any constant α*>0 *and for any large enough integer n*, *if g is* -*hard for* TC^{0} *circuits of size n*^{2} *and depth d*+2, *then any probabilistic* TC^{0} *circuit C of size n and depth d can be simulated by another probabilistic* TC^{0} *circuit of size O*(*n*^{1+α}) *and depth d*+1 *which is given oracle access to g*_{⌈nα⌉} *and uses at most O*(*n*^{6α}) *many random bits*.

### Proof.

This is a direct consequence of lemma 4.2; we adapt the traditional Nisan–Wigderson argument to the setting of TC^{0} circuits. Let *n* and *α* be given, with 0<*α*<1. Let *S*_{1},…,*S*_{n} be the (*m*,*b*)-design from lemma 4.2, where *m*=⌈*n*^{α}⌉, *b*=⌈*α*^{−1}⌉, and each *S*_{i}⊂[*u*], with *u*=*O*(*m*^{6}). We are given ; define by *h*^{g}(*x*)= *g*(*x*|_{S1})*g*(*x*|_{S2})…*g*(*x*|_{Sn}), where *x*|_{Si} is the subsequence restricted to the coordinates specified by *S*_{i}.

The new circuit samples randomness uniformly from *Σ*^{u} and feeds *C* with pseudorandom bits generated by *h*^{g} instead of purely random bits. It only has one more extra layer of oracle gates and its size is bounded by *O*(*n*+*n*⋅*n*^{α})=*O*(*n*^{1+α}). What is left is to prove the following claim. ■

### Claim 4.1

For any constant *ϵ*>0, .

### Proof.

Suppose there exists *ϵ* such that . We will seek a contradiction to the hardness of *g* via a hybrid argument.

Sample *z* uniformly from *Σ*^{u} and *r* uniformly from {0,1}^{n}. Create a sequence of *n*+1 distributions *H*_{i} on {0,1}^{n} where:

—

*H*_{0}=*r*;—

*H*_{n}=*h*^{g}(*z*); and— for all 1≤

*i*≤*n*−1,*H*_{i}=*h*^{g}(*z*)_{1}*h*^{g}(*z*)_{2}…*h*^{g}(*z*)_{i}*r*_{i+1}…*r*_{n}.

By our assumption, . Therefore, there exists *i*∈[*n*] such that .

Assume , otherwise add a NOT gate at the top of *C*, and treat the new circuit as *C* instead.

Consider the following probabilistic TC^{0} circuit *C*′ for the function *g*. On input *x*, the circuit *C*′ samples *z* uniformly from *Σ*^{u} and *r* uniformly from {0,1}^{n} and replaces the substring *z*|_{Si} of *z* (i.e. the substring whose coordinates are indexed by *S*_{i}) with the input string *x*. Then the circuit *C*′ samples a random bit *b*∈{0,1}. If *C*(*h*^{g}(*z*)_{1}…*h*^{g}(*z*)_{i−1}*br*_{i+1}…*r*_{n})=1, *C*′ outputs *b*, otherwise it outputs 1−*b*. For an input string *x*∈*Σ*^{m} picked uniformly at random we now lower bound the probability that *C*′ computes the function *g*.

Let *y* denote the random string *h*^{g}(*z*)_{1}…*h*^{g}(*z*)_{i−1}*br*_{i+1}…*r*_{n}, which is the distri- bution *H*_{i−1}. Further, let and . In the following expressions all probabilities are over uniformly picked strings *x*∈*Σ*^{m}, *z*∈*Σ*^{u} and *r*∈{0,1}^{n}. Thus
where . Observe that

Substituting for *α* above, we get

By an averaging argument we can fix *z*, *r* and *b* and hardwire into *C*′ to get a new circuit *C*′′ such that

Note that in this case for all 1≤*k*≤*i*−1,*h*^{g}(*z*)_{k} is a function on input *x*|_{Sk∩Si}. Since for all *k*≠*i*,|*S*_{i}∩*S*_{k}|≤*b*, we only need a TC^{0} circuit of size at most 2^{O(b)} and of depth at most 2 to compute each *h*^{g}(*z*)_{k}. In conclusion, we obtain a TC^{0} circuit *C*′′′ of size at most (2^{O(b)}+1)*n* and of depth at most *d*+2 such that when *n* is large enough, which is a contradiction. ■

The simulation in lemma 4.3 is quite uniform; thus, plugging in appropriate segments of WP^{⊗} as our candidates for the hard function *g*, we derive our first main result.

### Theorem 4.1

*If* WP *is not infinitely often computed by* TC^{0}*(n*^{1+γ}*) circuit families for some constant γ>0, then any language accepted by polynomial size probabilistic uniform* TC^{0} *circuit family is in* uTC^{0}(SUBEXP).

### Proof.

Fix any small constant *δ*>0. Let *L* be a language accepted by some probabilistic uniform TC^{0} circuit family of size at most *n*^{k} and of depth at most *d* for some constants *k*,*d*.

Choose *m* such that *n*^{δ/12}≤*m*≤*n*^{δ/6}, and let *α* be such that *m*=*n*^{α}. By theorem 3.3, when *m* is large enough, is -hard for TC^{0} circuits of size *n*^{2k} and depth *d*+*c*, where *c* is any constant. Hence, as a consequence of lemma 4.3, we obtain a probabilistic oracle TC^{0} circuit for *L*_{n} of depth *d*+1. Since the computation only needs *O*(*m*^{6}) random bits, it can be turned into a deterministic oracle TC^{0} circuit of depth *d*+2 and of size at most *O*(*n*^{2k})⋅2^{O(m6)}≤2^{O(nδ)} (when *n* is large enough), where we evaluate the previous circuit on every possible random string and add an extra MAJORITY gate at the top. The oracle gates all have fan-in *m*≤*n*^{δ/6}, and thus can be replaced by disjunctive normal form (DNF) circuits of size 2^{O(nδ)}, yielding a deterministic TC^{0} circuit of size 2^{O(nδ)} and depth *d*+3.

We need to show that this construction is uniform, so that the direct connection language can be recognized in time *O*(*n*^{δ}). The analysis consists of the following three parts.

— The connectivity between the top gate and the output gate of individual copies is obviously computable in time

*m*^{6}≤*n*^{δ}.— The connectivity inside individual copies is DLOGTIME-uniform, hence

*n*^{δ}-uniform.— By lemma 4.2 each

*S*_{i}is computable in time*O*(*bm*^{2}), which is*O*(*m*^{2}) since*b*is a constant depending only on*δ*. Moreover, note that WP^{⊗}is a linear-time decidable language. Therefore, the DNF expression corresponding to each oracle gate can be computed within time*O*(*m*^{2})≤*n*^{δ}.

In conclusion, the above construction produces a uniform TC^{0} circuit of size 2^{O(nδ)}. Since *δ* is arbitrarily chosen, our statement holds. ■

## 5. Consequences of pathetic arithmetic circuit lower bounds

In this section we show that a pathetic lower bound assumption for *arithmetic circuits* yields a uniform derandomization of a special case of polynomial identity testing (introduced and studied by Dvir *et al.* [27]).

The explicit polynomial that we consider is {IMM_{n}}_{n>0}, where IMM_{n} is the (1,1)*th* entry of the product of *n* 3×3 matrices whose entries are all distinct indeterminates. Note that IMM_{n} is a degree-*n* multilinear polynomial in 9*n* indeterminates, and IMM_{n} can be considered as a polynomial over any field .

Arithmetic circuits computing a polynomial in the ring are directed acyclic graphs with the indegree zero nodes (the input nodes) labelled by either a variable *x*_{i} or a scalar constant. Each internal node is either a + gate or a × gate, and the circuit *computes* the polynomial that is naturally computed at the output gate. The circuit is a *formula* if the fan-out of each gate is 1.

Before going further, we pause to clarify a point of possible confusion. There is another way that an arithmetic circuit *C* can be said to compute a given polynomial *f*(*x*_{1},*x*_{2},…,*x*_{n}) over a field ; even if *C* does not compute *f* in the sense described in the preceding paragraph, it can still be the case that for all scalars we have *f*(*a*_{1},…,*a*_{n})=*C*(*a*_{1},…,*a*_{n}). In this case, we say that *C functionally* computes *f* over . If the field size is larger than the syntactic degree of circuit *C* and the degree of *f*, then the two notions coincide. Assuming that *f* is not *functionally* computed by a class of circuits is a *stronger* assumption than assuming that *f* is not computed by a class of circuits (in the usual sense). In our work in this paper, we use the weaker intractability assumption.

An *oracle* arithmetic circuit is one that has *oracle* gates: for a given sequence of polynomials *A*={*A*_{n}} as oracle, an oracle gate of fan-in *n* in the circuit evaluates the *n*-variate polynomial *A*_{n} on the values carried by its *n* input wires. An oracle arithmetic circuit is called *pure* (following [32]) if all non-oracle gates are of bounded fan-in. (Note that this use of the term ‘pure’ is not related to the ‘pure’ arithmetic circuits defined by Nisan and Wigderson [46].)

The class of polynomials computed by polynomial size arithmetic formulae is known as arithmetic NC^{1}. By [35] the polynomial IMM_{n} is complete for this class. Whether IMM_{n} has polynomial size *constant-depth* arithmetic circuits is a long-standing open problem in the area of arithmetic circuits [46]. In this context, the known lower bound result is that IMM_{n} requires exponential size multilinear depth-3 circuits [46].

Very little is known about lower bounds for general constant-depth arithmetic circuits, compared to what is known about constant-depth Boolean circuits. Exponential lower bounds for depth-3 arithmetic circuits over finite fields were shown in [47,48]. On the other hand, for depth-3 arithmetic circuits over fields of characteristic zero, only quadratic lower bounds are known [49]. However, it is shown in [50] that the determinant and the permanent require exponential size *multilinear* constant-depth arithmetic circuits. More details of the current status of arithmetic circuit lower bounds can be found in Raz's paper [51], §1.3.

### Definition 5.1

*We say that a sequence of polynomials* {*p*_{n}}_{n>0} *in* *is* (*s*(*n*),*m*(*n*),*d*)-*downward self-reducible if there is a pure oracle arithmetic circuit C*_{n} of depth *O*(*d*) and with *O*(*s*(*n*)) many oracle gates that computes the polynomial *p*_{n} using oracle gates only for *p*_{m′}, for *m*′≤*m*(*n*).

Analogous to [32], proposition 7, we can easily observe the following. It is a direct divide-and-conquer argument using the iterated product structure.

### Lemma 5.1

*For each* 1>*ϵ*>0 *the polynomial sequence* {IMM_{n}} *is* (*n*^{1−ϵ},*n*^{ϵ},1/*ϵ*)-*downward self-reducible*.

An easy argument, analogous to the proof sketch given for theorem 3.2, shows that lemma 5.1 allows for the amplification of weak lower bounds for {IMM_{n}} against arithmetic circuits of constant depth.

### Theorem 5.1

*Suppose there is a constant δ>0 such that, for all d and every n, the polynomial sequence* {IMM_{n}*} requires depth-d arithmetic circuits of size at least n*^{1+δ}*. Then, for any constant depth d the sequence* {IMM_{n}*} is not computable by depth-d arithmetic circuits of size n*^{k} *for any constant k>0.*

Our goal is to apply theorem 5.1 to derandomize a special case of polynomial identity testing (first studied in [27]). To this end we restate a result of Dvir *et al.* [27].

### Theorem 5.2 (Theorem 4 in [27])

*Let n,s,r,m,t,d be integers such that s≥n. Let* *be a field that has at least 2mt elements. Let* *be a non-zero polynomial with* *and* *such that P has an arithmetic circuit of size s and depth d over* *. Let* *be a polynomial with* *such that P(x,f(x))≡0. Then f(x) can be computed by a circuit of size s*′=poly(*s,m*^{r}*) and depth d′=d+O(1) over* *.*

Let the underlying field be large enough (, for instance). The following lemma is a variant of lemma 4.1 in [27]. For completeness, we provide its proof here.

### Lemma 5.2 (Variant of lemma 4.1 in [27])

*Let n*,*r*,*s be integers and let* *be a non-zero polynomial with individual degrees at most r that is computed by an arithmetic circuit of size s*≥*n and depth d*. *Let m*=⌈*n*^{α}⌉ *where α*>0 *is an arbitrary constant. Let S*_{1},*S*_{2},…,*S*_{n} *be the sets of the* (*m*,*b*)-*design constructed in lemma 4.2 where b*=⌈1/*α*⌉. *Let* *be a multilinear polynomial with the property that*
5.1*Then there exist absolute constants a and k such that p*(*z*) *is computable by an arithmetic circuit over* *with size bounded by O*((*sm*^{r})^{a}) *and having depth d*+*k*.

### Proof.

Consider the following set of hybrid polynomials:

The assumption implies that *F*_{0}≢0 while *F*_{n}≡0. Hence, there exists 0≤*i*<*n* such that *F*_{i}≢0 and *F*_{i+1}≡0. Note that *F*_{i} is a non-zero polynomial in the variables {*x*_{j}|*i*+1≤*j*≤*n*} and the variables {*y*_{j}|*j*∈*S*_{1}∪*S*_{2}∪⋯∪*S*_{i}}.

We recall the well-known Schwartz–Zippel lemma.

### Lemma 5.3 (Schwartz–Zippel [52,53])

*Let* *be a field and let* *be a non-zero polynomial with total degree at most r*. *Then for any finite subset* *we have*
5.2

Since , then if we assume that has size more than *nrm*, lemma 5.3 ensures that we can assign values from the field to the variables {*x*_{j}|*i*+1≤*j*≤*n*} and the variables {*y*_{j}|*j*∉*S*_{i+1}} so that *F*_{i} remains a non-zero polynomial in the remaining variables. More precisely, fixing these variables to scalar values yields a polynomial with the property that
where *q*_{j}(*y*|_{Sj∩Si+1}) is the polynomial obtained from *p*_{j}(*y*|_{Sj}) after fixing the variables in *S*_{j}\*S*_{i+1}.

Rename the variables {*y*_{j}|*j*∈*S*_{i+1}} with {*z*_{j}|1≤*j*≤*m*} and replace *x*_{i+1} by *w*. We obtain a polynomial *g* with the property that

In order to apply theorem 5.2, the only thing that remains is to calculate the circuit complexity of *g*. For all *j*≠*i*+1, |*S*_{j}∩*S*_{i+1}|≤*b*, which is a constant. Note that, for each *j*≤*i*, the polynomial *q*_{j}(*y*|_{Sj∩Si+1}) depends only on a constant (bounded by *b*) number of variables and is of constant degree since *p* is multilinear. Hence each polynomial *q*_{j}(*y*|_{Sj∩Si+1}) is a sum of at most 2^{b} many multilinear monomials and can be computed by a 2^{b} size arithmetic circuit of depth 2. Therefore, under the assumption that *f* has a circuit of size *s* and depth *d*, *g* is computable by a circuit of size *s*+*O*(*n*) and depth *d*+2, which is a composition of the aforementioned circuits. It is important to note that .

Now we can use theorem 5.2 to obtain that *p*(*z*) has a circuit of depth *d*+*k* and size at most (*sm*^{r})^{a}, for some constant *a*. This concludes the proof. ■

At this point we describe our deterministic black-box identity testing algorithm for constant-depth arithmetic circuits of polynomial size and bounded individual degree. Let *n*,*m*,*u*,*α* be the parameters as in lemma 4.2. Given such a circuit *C* over variables {*x*_{i}|*i*∈[*n*]} of size *s*=*n*^{t}, depth *d* and individual degree *r*, we simply replace *x*_{i} with IMM(*y*|_{Si}) where *y* is a new set of variables {*y*_{j}|*j*∈[*u*]}. Let denote the polynomial computed by the new circuit.

Note that the total degree of is bounded by *u*^{c} where *c* is a constant depending on the combinatorial design and *r*. Let be any set of *u*^{c}+1 distinct points. Then by lemma 5.3 the polynomial computed by is identically zero if and only if for all (*a*_{1},*a*_{2},…,*a*_{u})∈*R*^{u}.

This gives us the claimed algorithm. Its running time is bounded by *O*((*u*^{c}+1)^{u})=*O*(2^{n7α}). Since *α* can be chosen to be arbitrarily small, we have shown that this identity testing problem is in deterministic subexponential time. The correctness of the algorithm follows from the next lemma.

### Lemma 5.4

*If for every constant d*′>0, *the polynomial sequence* {IMM_{n}} *is not computable by depth*-*d*′ *arithmetic circuits of size n*^{ℓ} *for any* ℓ>0, *then C*[*x*_{1},…,*x*_{n}]≡0 *if and only if* .

### Proof.

The ‘only-if’ part is easy to see. Let us focus on the ‘if’ part. Suppose it is not the case, which means that but *C*[*x*_{1},…,*x*_{n}]≢0. Then let *C*[*x*_{1},…,*x*_{n}] play the role of *f*[*x*_{1},…,*x*_{n}] in lemma 5.2 and let IMM[*z*_{1},…,*z*_{m}] take the place of *p*[*z*_{1},…,*z*_{m}]. Therefore, IMM[*z*_{1},…,*z*_{m}] is computable by a circuit of depth *d*+*k* and size at most (*n*^{t}*m*^{r})^{a}=*m*^{O(1)}, where *k* is the constant in lemma 5.2 and *n*^{t} is the size of *C*. This contradicts the hardness assumption about {IMM_{n}}. ■

Putting it together, we get the following result.

### Theorem 5.3

*If there exists δ>0 such that for any constant e*>0, IMM *requires depth-e arithmetic circuits of size at least n*^{1+δ}*, then the black-box identity testing problem for constant-depth arithmetic circuits of polynomial size and bounded individual degree is in deterministic subexponential time.*

Next, we note that the above upper bound can be sharpened considerably. The algorithm simply takes the OR over subexponentially many evaluations of an arithmetic circuit; if any of the evaluations does not evaluate to zero, then we know that the expressions are not equivalent; otherwise they are. Note that evaluating an arithmetic circuit can be accomplished in logspace. (When evaluating a circuit over , this is shown in [43], corollary 6.8; the argument for other fields is similar, using standard results about the complexity of field arithmetic.) Note also that every language computable in logspace has AC^{0} circuits of subexponential size. (This appears to have been observed first by Gutfreund and Viola [54]; see also [55] for a proof.) This yields the following uniform derandomization result.

### Theorem 5.4

*If there are no constant-depth arithmetic circuits of size n*^{1+ϵ} *for the polynomial sequence* {IMM_{n}*}, then for every constant d, black-box identity testing for depth-d arithmetic circuits with bounded individual degree can be performed by a uniform family of constant-depth* AC^{0} *circuits of subexponential size.*

We call attention to an interesting difference between theorems 4.1 and 5.4. In theorem 5.4, in order to solve the identity testing problem with uniform AC^{0} circuits of size 2^{nϵ} for smaller and smaller *ϵ*, the depth of the AC^{0} circuits increases as *ϵ* decreases. In contrast, in order to obtain a deterministic threshold circuit of size 2^{nϵ} to simulate a given probabilistic TC^{0} algorithm, the argument that we present in the proof of theorem 4.1 gives a circuit whose depth is not affected by the choice of *ϵ*. We do not know if a similar improvement of theorem 5.4 is possible, but we observe here that the depth need not depend on *ϵ* if we use threshold circuits for the identity test.

### Theorem 5.5

*If there are no constant-depth arithmetic circuits of size n*^{1+ϵ} *for the polynomial sequence* {IMM_{n}*}, then there is a constant c such that, for every constant d and every γ>0, black-box identity testing for depth-d arithmetic circuits with bounded individual degree can be performed by a uniform family of depth-(d+c) threshold circuits of size 2*^{nγ}.

### Proof.

We provide only a sketch. Choose *α*<*γ*/14, where *α* is the constant from the discussion in the paragraph before lemma 5.4. Thus, our identity testing algorithm will evaluate a depth-*d* arithmetic circuit *C*(*x*_{1},…,*x*_{n}) at fewer than 2^{nγ/2} points , where each *v*_{i} is obtained by computing an instance of IMM_{nα} consisting of *n*^{α} 3×3 matrices, whose entries without loss of generality have representations having length at most *n*^{α}. Thus these instances of IMM have DNF representations of size 2^{O(n2α)}. These DNF representations are uniform, since the direct connection language can be evaluated by computing, for a given input assignment to IMM_{nα}, the product of the matrices represented by that assignment, which takes time at most . Evaluating the circuit *C* on can be done in uniform TC^{0} [43,56]. ■

## Acknowledgements

We thank Luke Friedman for many helpful discussions, and we thank Lance Fortnow for some useful suggestions. We are grateful to the referees for their helpful comments and corrections. E.A. was supported in part by NSF grants CCF-0830133, CCF-0832787 and CCF-1064785. Part of this work was performed when E.A. was a visiting scholar at the University of Cape Town. F.W. was supported in part by NSF grants CCF-0830133, CCF-0832787 and CCF-1064785.

## Footnotes

One contribution of 18 to a Theme Issue ‘The foundations of computation, physics and mentality: the Turing legacy’.

↵1 It is necessary to be precise about what it means for an integer entry to have

*n*bits. We use two's-complement notation; thus the entries come from the set {−2^{n−1},…,2^{n−1}−1}.

- This journal is © 2012 The Royal Society