A low latency bi-directional serial-parallel multiplier architecture

July 13, 2017 | Autor: Ahmed Bouridane | Categoría: Nearest Neighbour, Low Latency
Share Embed


Descripción

ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland

A LOW LATENCY BI-DIRECTIONAL SERIAL-PARALLEL MULTIPLIER ARCHITECTURE A . Bouridane, M. Nibouche, 0. Nibouche, D.Crookes and B. Albesher School of Computer Science The Queen's University of Belfast Belfast BT7 1"

ABSTRACT A new bi-directional bit serial-parallel multiplication architecture is presented. The proposed structure is regular and modular, and requires nearest neighbour communication links only, which makes it more efficient for VLSI implementation. Furthermore, a judicious deployment of larches in the circuit ensures that the multiplier operates on two coefficients of the multiplicand at the same time thus speeding up the process. Comparison of the new multiplier structure with previous ones has shown the superiority of the new architecture.

1. INTRODUCTION Several structures for bit serial-parallel and iterative pipelined multipliers have been reported in the literature [ 1-41, In many conventional serial-parallel multipliers the signal line of the serial input is distributed to all arithmetic cells through a global interconnection line. Such a global distribution structure is unsuitable for VLSI implementation because it increases the cost and lowers the clock frequency. Therefore, the globally distributed line should be eliminated if possible [4]. Furthermore, the throughput rate of these architectures can be increased by pipelining the architecture to the bit-level. However, the pipelining of the architecture will increase the latency and the number of latches. In other words, the price to be paid to increase the throughput rate is greater area and more cycles. In addition, some of these architectures [4] are based on a gated full-adder cell (consisting of an AND gate, a full-adder, and some latches) while others are based on two combined gated full-adder cells [l-2,5]. Recently, the authors described an efficient Most Significant Bit First (MSBF) low-latency multiplier architecture which is based on a hybrid number system representation [7]. In this paper, a new cell architecture for high performance bit-level multiplication is presented. The basic tenet of the architecture is to arrange the bits of one

of the multiplication operands into pairs. The multiplication process is then carried out in a bi-directional serial-parallel fashion using a new 2-bit adder cell. Unlike the previous work where the operands are fed to the multiplier using an MSBF approach, the proposed architecture has been devised to operate using the more natural Least Significant Bit First (LSBF) scheme. The throughput rate of the new architecture is the same as the throughput rate of the existing architectures [l-21 which is limited by the propagation delay of one logic AND gate, a full adder, and a latch, whereas the number of latches and the latency are decreased. Moreover, the structure based on the proposed new cell is regular and modular, and requires nearest neighbour communication links only, which makes it more efficient for VLSI implementation, thus reducing design error, time, and cost.

2. THE ALGORITHM IN DETAILS The multiplication of two N-bit numbers: A ( aN.laN.2 ... a&) and B ( b~-lb~-:! . .. blbo) with a product p ( P ~ N - I P ~ N - ~ . .. PIPo) can be written using the following expression:

P = A-B

(1)

N -1

aj2'

A = j = o

N -1

B =

bi2'

(3)

i= 0

N - 1

P =

E

i= 0

N - 1

b i 2 ' E

a

, 2 '

(4)

j = 0

By grouping the coefficient (A) bits in pairs, the equation for the product, P, can be rewritten as

0-7803-5482-6/99/$10.00 02000 IEEE

v-593

3. DESCRIPTION OF THE NEW MULTIPLIER

follows: N-l

N 12-1

i=O

;=0

Equation (5) can easily be mapped to a serial-parallel architecture. The multiplicand A is multiplied by each bit, bi, of the multiplier, B, which is a serial input whereas the multiplicand A should be fed in parallel at the initial multiplication step. The novelty of the new multiplier is in the architecture as shown in Figure 2. As can be seen from this figure, the Basic Cell (BC) is mainly based on a novel 2-bit adder, two AND gates, and some latches.

The proposed multiplier is shown in Figure 3. The multiplicand bits are loaded in parallel. The multiplier bits are fed serially from one end of the multiplier, while the partial product bits are moved in the opposite direction. As shown in Figure 2, each cell of the multiplier performs the multiplication of two multiplicand bits, aj., and aj, by the multiplier bit bi and adds the results to a sum bit shifted from the next left cell and the two carry bits stored at the same cell from the previous cycle, according to the equation [7]:

So + 2Ci, E D

C B A

+ 4c21= bi aj.l + 2bi aj + Si +CIi + 2CZi

Once this is done the multiplier bit, b,, and the sum bit, Si, are shifted to the next left and right cell, respectively and the process is repeated. The above equation cin easily be mapped to a serial-parallel multiplier architecture using the gated new 2-bit adder [7] shown in Figure 3. The 2-bit adder structure presented in [7] is used to add five independent 1-bit numbers. Three of them A, B and C possess the same weight (i.e., 2' position). The other two numbers D and E, each have a weight twice the weight of the previous inputs. As can be seen from Figure 1, the new adder consists of 20 gates with a delay time of 6 gates, which is equivalent to the speed of a full-adder ( i.e., 6 gates) and an area of two full-adders ( i.e., 20 gates). Therefore, the clock cycle time of the new serial-parallel multiplier architecture is the time for one logic AND gate, a full-adder and a latch with no initial delay.

Figure 1 The 2-bit Adder

4. EVALUATION AND COMPARISON The proposed multiplier architecture has been fully simulated functionally using VHDL tools.

s,+i-

s,

0 Latch Figure 2 The new Basic Cell ( BC )

A comparison of the proposed serial-parallel systolic structure of Figure 3 with three similar existing serialparallel multiplier structures using the figures of merit described in [6,7] are listed in Table 1, 2, and 3 for the requirements of time, area, and area-time respectively. ANEWand T ~ represent w the area required and the propagation time of the new 2-bit adder respectively while G represents the unit gate area of a NAND or a NOR gate. It is to be noted that ANEW equals the area of two conventional full adders while TNEW equals the speed of a conventional full adder [7].

v-594

It should be noted that the conventional CSAS multiplier structure [6] requires N full-adder, N AND gates, N latches for multiplicand data storage, and 2N latches for synchronising the multiplication operation.

The comparison of the total area of the proposed systolic structure with related ones is illustrated in Table 2. It can be noted that the reduction in total area over the structures of refemces [I], ref.[2], and CSAS is by 12%, 12%, and 21% respectively.

As can be seen from Table 1, the proposed systolic structure reduces the initial delay to zero which is the same as the initial.delay of CSAS which uses a broadcast data type. Furthermore the total number of cycles required to produce a full 2N product is still unchanged (i.e., equals to 2N cycles).

Figure 3 New systolic serial-parallel multiplier

Table 1 The Total Delay Comparison With Related Multipliers

I

Structure

AredCell

I

Total Area

I

Table 2 The Total Area Comparison With Related Serial-ParallelMultipliers

Table 3 The Total Area-Time Comparison With Related Multipliers

v-595

%

As can be observed from Table 3, the proposed systolic serial-parallel architecture has the lowest area-time complexity over the related architectures. The complexity is reduced by 17 %, 15 %, and 21 % when the proposed architecture compared to those described in reference [I], reference [2] and CSAS [6], respectively. This table shows that the proposed architecture has the better performance.

5. SUMMARY AND CONCLUSION In this paper, a novel architecture for high throughput rate and low-latency bit serial-parallel multipliers is presented. The new structure is based on a new cell which is based on a new 2-bit adder which consists of 20 gates with a delay time of 6 gates. This is equivalent to a speed of a full-adder (6 gates) and an area of two full-adders (20 gates). The proposed multiplier requires 2n cycles to obtain a complete product, with no initial delay (i.e. latency = 0). Moreover, the proposed architecture reduces the number of latches, area, and area-time (complexity) by up to 33%, 21%, and 2 1% respectively when compared to the existing structures. The structure based on the new cell is regular and modular, and requires nearest neighbour communication links only, which makes it more efficient for VLSI implementation. The new cell used for the design of the new serial-parallel multiplier can be extended to implement an iterative multiplier array. Furthermore, the algorithm can easily be extended to operate on two's complement numbers using for example the Baugh and Wooley algorithm. Currently work is underway to develop digit-serial multiplication

structures (with digit sizes of 4, 8, 16, .. etc) using Baugh and Wooley algorithm.

6. REFERENCES 1. PEKMESTZI, K. Z., and CARAISCOS, C. G.: "A class of systolic serial-parallel multipliers", Znr. J. Electronics, 1994, Vol. 76, No. 3, pp. 463-468. 2. AIT-BOUDAOUD, D., IBRAHIM, M.K., and HAYES-GILL, B .R.: "Novel cell architecture for bit level systolic arrays multiplication", IEE Proc. E, 1991, 138, pp. 21-26. 3. CARAISCOS, C. G., AND PEKMESTZI, K. Z.: "Low-Latency Bit-Parallel Systolic VLSI Implementation of FIR Digital Filter", IEEE Trans. Circuits Syst., 1996, Vol. 43., No. 7, pp. 529-534. 4. MOH, S.-M., and YOON, S.-H.: "Serial-Parallel Multiplier For Two's Complement Numbers", Electronics Letters, 1995, Vol. 31, No. 9, pp. 703704. 5. AIT-BOUDAOUD, D., IBRAHIM, M.K., AND HAYES-GILL, B.R.: "Novel Pipelined SerialParallel Multiplier" , Electronics Letters, 1990, Vol. 26, NO. 9, pp. 582-583. 6. DANIELSSON, P.: "Serial-Parallel Convolver", IEEE Transaction on computer. 1984, Vol. C-33, NO. 7, pp. 652-667. 7. AL-BESHER, B., BOURIDANE, A., ASHLJR, A. S., AND CROOKES D.: "A Hybrid Low-Latency Serial-Parallel Multiplier Architecture", Electronics Letters, 1997, Vo1.34, No.2, pp.141143.

V-596

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.