Parallel pointer machines

June 9, 2017 | Autor: Patrick Dymond | Categoría: Cognitive Science, Computational Complexity, Time varying

Descripción

comput complexity 3 (1993), 19-30

PARALLEL

1016-3328/93/010019-12 $1.50+0.20 9 1993 Birkhguser Verlag, Basel

POINTER

MACHINES

S T E P H E N A . C O O K AND PATRICK W .

DYMOND

A b s t r a c t . The parallel pointer machine is a synchronous collection of finite state transducers, each transducer receiving its inputs via pointers to the other transducers. Each transducer may change its input pointers dynamically by "pointer jumping". These machines provide a simple example of a parallel model with a time-varying processor inter-connection structure, and are sufficiently powerful to simulate deterministic space S(n) within time O(S(n)). Subject classifications. 68Q10.

Existing theoretical models of synchronous parallel computers can be divided into classes based on the power of the communication capabilities provided. One basic class consists of models in which the communications connections between processors are bounded and fixed throughout the computation. Models such as conglomerates [13], uniform circuits and aggregates [9] fall into this first class. A second class contains parallel models in which processors may communicate in one step with any of an unbounded number of other processors (e.g., via a shared global memory) at any step of the computation. SIMDAGs [13] and various other kinds of parallel random access machines ( P R a M s ) [12], [27], [10] fall into this second class, because individual processors in these PRAM models have the ability to read from and write to global memory locations. Using indirect addressing, any desired communication pattern can be realized and varied as the computation proceeds. Goldschlager [13], Borodin [2], Chandra and Stockmeyer [4] and others observed close relationships between time or depth on parallel machines and space on sequential machines. For a given model X from the first group there are typically inclusions of the form

DSPACE(T) c_ X-TIME(T 2) c_ DSPACE(T 2)

(1)

whereas for models in the second group the corresponding inclusions are

DSPACE(T) C_X-TIME(T)C_ DSPACE(T2).

(2)

20

Cook & Dymond

comput complexity 3 (1993)

Throughout this paper we assume T(n) : AF --~ j~ is fl(logn). We use the notation X-TIME(T) to denote the class of sets over the alphabet {0, 1} recognized by machines of type X within time O(T(n)), and DSPACE(T) to denote the class of sets recognized by deterministic Turing machines in space T(n) (see [19]). Similarly, UDEPTH(T) denotes the class of sets recognized by bounded fan-in uniform circuit families using and, or and not gates in depth O(T(n)). See [25] for a discussion of uniform circuits. The two types of inclusions described above indicate an apparent difference in power between the two classes of parallel machines. Time on machines of the second class is able to simulate sequential space without loss, whereas machines in the first class use a quadratic increase in time to simulate space. Another distinguishing feature of the models in the first class is that each processing element is finite state, whereas in the second class, the individual processors are random access machines with an unbounded local memory and wordsize. The models satisfying (2) use powerful instruction.s to obtain their linear time simulation of deterministic space, instructions which would seem to require a very complex physical realization. For example, in SIMDAGs, the "store" instruction requires that any of an unbounded number of processors be able to store into any of an unbounded number of global memory locations in a single step. Because of this huge implicit fan-in, Goldschlager introduced a variant of the SIMDAG model, in which the time charged for each instruction is proportional to the time which would be needed to simulate its execution on a (fixed structure) conglomerate. The resulting model satisfies (1) rather than

(2). Cook [5] proposed investigation of an intermediate model, which would provide communication access to only a bounded number of processors at any one time, but which would allow varying the processors for which connection is provided as computation proceeds. We define a specific example of this idea in "parallel pointer machines" (PPMs). Some of our results about PPMs appeared in preliminary form in [6] and [8], where they were called '"hardware modification machines (HMM)" in analogy with sequential storage modification machines earlier described by Schhnhage [26]. (Similar sequential pointer machines have been considered by Kolmogorov and Uspenskii [21], as "linking automata" by Knuth [20], and as "reference machines" by Tarjan [28],) Many shared memory parallel algorithms work mainly by storing and updating pointers to global memory. The parallel pointer machine is well-suited to expressing this aspect of parallel algorithms. The PPM model is intended to examine the complexity classes defined by collections of finite state synchronous machines, each of which can rearrange its communication links by a

comput complexity 3 (1993)

Parallel pointer machines

21

bounded amount in one step. We show that PPMs are surprisingly powerful, falling more closely into the second class of models defined above by satisfying inclusions (2). They may be in some sense the simplest class of parallel machines which do so. Any technology allowing the realization of any of the above-mentioned models satisfying inclusion (2) would also allow the construction of PPMs. This follows from the observation that there is a straightforward linear time and processor simulation of PPMs by PRAMs, where the PRAM allocates one of its processors and a block of global memory cells for each of the PPMs processing units. Each processor in the PRAM simulates the moves of its unit, using the blocks in global memory to provide access to the processing unit's state and pointer information. Barzdin and Kalnin'sh [1] have previously considered a "parallel action automata" from the standpoint of devising a universal machine of this type, although they do not consider complexity classes related to time and space. Hong [17] describes polynomial simulations between our PPM model and other parallel machine models, in which both time and hardware resources are simultaneously related; this provides further evidence for the "extended parallel computation thesis" in [6]. Hong also considers nondeterministic variants of these machines. Nondeterministic PPMs are also considered in [7]. Lam and Ruzzo [22] have established a very close simultaneous correspondence between time-hardware resources on PPMs and the same resources on a restricted PRAM model. Their PRAM is restricted in two ways: first, global memory satisfies a concurrent-read owner-write or CROW protocol [10] in which every cell is written only by its designated owner; second, the arithmetic capabilities of the PRAM are limited to addition of one and doubling. In [14] Goodrich and Kosaraju used the term "parallel pointer machine" for a PRAM model in which values in global memory are of two types, pointer and arithmetic, and in which numeric operations are performed only on arithmetic values, while only pointer values may be used for indexing. Our PPM differs by eliminating arithmetic values and operations. We use the unqualified term P P M to represent our variant of the model in this paper. Actual machines with capabilities for varying processor interconnection patterns have been proposed and built; see for example the descriptions in [111 and [16]. The essential features of PPMs are: 1. Each piece of equipment, called a processing unit or unit, is a finite state transducer, accepting a k-tuple of input symbols, and computing an output symbol based on these symbols and current state. All active units have the same transition function. Each transducer has an ordered set of k input lines, called links or pointers, which can be thought of as taps on

22

Cook & Dymond

comput complexity 3 (1993)

the outputs of other units, or as pointers to those units. (We are allowing unbounded fan-out but fan-in is the constant k.) The input symbol at a given step along a link 1 for a unit Ui, which has its link I pointing to a unit Uj, is the output symbol of Uj at the end of the previous step. 2. The units operate synchronously, and each unit has the ability to adjust the connections of its links in a restricted way. In particular, a unit U; may in one step redirect one of its links to point to a unit Uj at distance no more than two from it. (That is, U~ must either already have a pointer to Uj or have a pointer to some Ur and one of Ur links points to Uj.) 3. Initially a single unit U0 is the only one active. In addition, a fixed input tree structure described below provides U0 with access to the PPM's input. U0 starts in state q0 and its input links are all pointing to itself except for one pointing to the root of the input tree. As the computation proceeds, each active unit can activate other units by specifying that one of its links should be redirected to a new unit, and specifying the initial state and locations for the links of the new units. The links must be set to units at distance _ 1 from the creator. The PPM accepts its input if U0 ever enters its "accept" state. 4. Access to the input word of the PPM is provided by means of a tree structure composed of special inactive units, each with three links which we refer to using labels {L, R, U}. These units are arranged in a complete binary tree of depth [log n]. Active units access the bits of the input by following pointers down this tree from the root. The input word w = wlw2...wn, wi C {0, 1}, is stored in the n successive units comprising the leftmost leaves of the tree by having each of these units output either zero or one. (The remaining leaf units each output the blank symbol "B".) Each leaf unit is pointing to its left and right neighbors (using links L and R) and to its parent in the tree (link U). A non-leaf unit in the tree has link U pointing to its parent and links L and R pointing to its left and right child, respectively. These non-leaf units output "L" or :'R" depending on whether they are a left child or a right child. The root of the tree outputs "B" and has its link U pointing to U0~ which in turn has its first link pointing to the root. The input-tree units never make any transitions. The input convention described above has been chosen to allow :'random access" to the bits of the input, so that sublinear time bounds can be st.udied. Given a tap on the root, any active unit can find the first bit of the input by

comput complexity 3 (1993)

Parallel pointer machines

23

moving a pointer down the "L" path from the root using [log n] transitions. Other input bits can be found quickly as well, provided the description of the path is known or can be generated. Because the input tree units themselves are not active and make no transitions, they are not counted when we consider hardware on PPMs. It is of course clear that linear hardware is always required to represent inputs, but it is interesting to consider how much additional active hardware is required for particular computations, in the same way that we consider sublinear space bounds for deterministic Turing machines. The convention that the input units make no transitions corresponds to the Tm convention of read-only input. This input convention has no significant effect in cases where at least linear time and linear hardware are used, since the input tree could be created within these resource bounds from any reasonable input convention. A P P M accepts an input w in time t if, in a computation from the initial state with input w, U0 enters the accepting state before it has made more than t transitions. Similarly, a P P M accepts w using hardware h if, in a computation from the initial state with input w, U0 enters the accepting state with no more than h units active (not counting the input tree units). We say Z accepts a set A C {0,1}* if for all w, Z accepts w if and only if w C A. A P P M Z accepts a set A in time T(n) (or in hardware H(n)) if Z accepts A and for every" w E A, Iw[ = n, Z accepts w in time T(n) (respectively, accepts w in hardware H(n)). The complexity classes defined by hardware and time bounds for PPMs are P P M - T I M E ( T ) and P P M - H A R D W A R E ( I t ) . We have chosen our definition of acceptance within a time or space bound so that nonaccepting computations need not be bounded. If we instead required working within the resource bounds on all inputs, accepted or not, then we would also add constructibility assumptions to the hypotheses of our theorems. Defined in the way chosen here, PPMs accept exactly the recursively enumerable sets. THEOREM 1. P P M - H A R D W A R E ( H ( n ) )

=

DSPACE(H(n) * (logn +

log H(n)). PROOF. A deterministic Tm M simulating a P P M Z with h active units keeps a list of h records, one for each (non-input-tree) unit of Z. Each record consists of the unit's current state and output symbol and, for each of the unit's links, an integer of O(log n + log h) bits indicating the unit at which the unit is pointing. (A negative integer can be used to represent that the link is pointing to one of the O(n) input tree units, and a positive integer i between 1 and h indicates the link is pointing to the active unit represented by the i-th record in the

24

Cook & Dymond

cornput complexity 3 (1993)

list.) M can obtain the output symbols of input tree units by consulting its input tape, and has the states and output symbols of all other units. Thus it can make a new list, representing the state of Z after another parallel step has taken place. The space required for this simulation is O(h(log }~+ Iog n)). No assumption of constructability of the function H(n) is required for this simulation, since instead of knowing H(n), M can just add new records as Z adds new units. Conversely, a PPM Z simulating an H(n)(log H(n) + log n) space T m M does so using only h = O(H(n)) units by encoding extra information using the units' pointers. For simplicity of exposition, we will suppose that M has, besides its input tape, a single (binary) worktape of length h log h, where h is a power of 2. We will consider this worktape to be divided into h blocks of length log h. The simulating PPM Z contains a tree of O(h) units with h leaves, organized so that (1) by starting at any of the leaves and following the pointers of the units up to the root a unique sequence of log h bits can be recovered; and (2) given a sequence of log h bits, the corresponding leaf unit can be found by following a path of pointers down from the root. The tree is organized in the same way as that of the PPM input tree described at the beginning of this section. Each leaf l codes a sequence over { L,R } of length log h describing the path from 1 to the root, and a unit pointing to [ can recover the sequence by following the pointer P of the tree units from 1 up to the root. Now Z can represent the hlog h bits in the work tape of M by forming a doubly linked list of h units, where the i-th unit represents a block of log h bits by having a link point to the appropriate leaf. Note that the input tree can also be used to store block information, so that actually a work tape of length h(log h + log n) can be represented. No assumption of constructability of H(n) is needed, since successively larger values for h can be tried until one works. []

COROLLARY 2. I[ H(n) = f~(n ~) for some c, then PPM-HARDg/A,_~E(H) =DSPA CE(H log H). Theorem 1 for PPMs stands in close analogy to results in 19] for uniform aggregates by exhibiting a precise (up to a constant factor) correspondence between hardware and space complexity classes. (Aggregates are similar to bounded fan-in boolean circuits, but allow feedback in a synchronous fashion the outputs of all gates at a given time step determine the inputs for all gates at the next time step.) For uniform aggregates and constructable resource bounds, there is a linear correspondence between aggregate hardware and T m space. In the case of PPMs, however, less hardware is needed to simulate

comput complexity 3 (1993)

Parallel pointer machines

25

a given space bound. We can conclude by the deterministic Turing machine space hierarchy theorem [15] that hardware on these machines with variable intercommunication structure is strictly more powerful than uniform aggregate hardware, in which the intercommunication paths are fixed. Turning our attention from hardware to P P M time, we now show that the inclusions corresponding to more powerful machine models (2) apply to PPMs. Since deterministic space is powerful enough to linearly simulate uniform circuit depth [2] and aggregate time [9], it follows from the theorem below that time on PPMs is at least as powerful as time on the other parallel models in the first group. THEOREM

3. DSPACE(T) c_ P P M - T I M E ( T ) c DSPACE(T2).

PROOF. The first inclusion will follow from Lemma 4. The second inclusion will be shown by means of Lemma 5 below which establishes that a P P M working in time T can be simulated by an alternating Turing machine [3], [24], [25] in time O(T2). The inclusion follows by the result in [3] giving a linear simulation of ATM time by deterministic space. LEMMA 4. Suppose a directed graph G with out-degree one (self-Ioops are allowed) and N nodes is represented in a PPM H by a coI1ection of N units such that the unit representing node i has its input line pointing to unit j if and only if (i, j) is an edge of G. Also suppose that node N's arc points to itself and that the corresponding unit is outputting a special syrnbd s, marking the sink of the graph. Then for any t, It can determine if there is a path from 1 to N of length < 2t in t + O(1) steps. PROOF. The argument uses the now well-known technique of parallel pointer jumping suggested by Wyllie's proof [29] that DSPACE(T) C P R A M TIME(T). Each node originally is receiving an input from its immediate (1step) successor. In a single transition of H each of the units can adjust its input line to point to its (unique) 2-step successor. After t transitions, each of the units will be pointing to its 2~-step successor. A path from node 1 to node N of length _< 2 t exists if and onty if the unit representing node 1 is receiving s after t transitions, o The application of the lemma to the proof of the first inclusion of the theorem is as follows: the N nodes of the graph will represent configurations (including input head positions) of M, the machine to be simulated. An arc (i,j) means the immediate successor configuration of i is j, as determined by M's transition function. Node 1 will represent the starting configuration, and

26

Cook & Dymond

comput complexity 3 (1993)

node N the unique accepting configuration. A path of length

Lihat lebih banyak...

Parallel pointer machines

Descripción

Comentarios