Faster Energy Efficient Column Compression Multiplication

July 7, 2017 | Autor: Harish Kittur | Categoría: Energy efficient, Hardware architecture

Descripción

A Design Technique for Faster Dadda Multiplier B. Ramkumar, V. Sreedeep and Harish M Kittur, Member, IEEE Abstract- In this work faster column compression multiplication has been achieved by using a combination of two design techniques: partition of the partial products into two parts for independent parallel column compression and acceleration of the final addition using a hybrid adder proposed in this work. Based on the proposed techniques 8, 16, 32 and 64bit Dadda multipliers are developed and compared with the regular Dadda multiplier. The performance of the proposed multiplier is analyzed by evaluating the delay, area and power, with 180 nm process technologies on interconnect and layout using industry standard design and layout tools. The result analysis shows that the 64-bit regular Dadda multiplier is as much as 41.1% slower than the proposed multiplier and requires only 1.4% and 3.7% less area and power respectively. Also the power-delay product of the proposed design is significantly lower than that of the regular Dadda multiplier.

Tree (PPST), and finally due to the Final Adder [9]. Of these the dominant components of the multiplier delay are due to the PPST and the final adder. The relative delay due to the PPG is small. Therefore significant improvement in the speed of the multiplier can be achieved by reducing the delay in the PPST and the final adder stage of the multiplier. In this work the delay introduced by the PPST is reduced by using two independent structures in the partial products. The proposed hybrid final adder computes the final products much faster. This paper is structured as follows: Sections II and III describe the design of parallel structures for the PPST and the design of hybrid final adder structure respectively. Section IV reports the ASIC implementation details and the simulation results. Finally, Section V summarizes the analysis. Throughout the paper, it is assumed that the number of bits in the multiplier and multiplicand are equal.

Index Terms- Column compression, Dadda multiplier, Faster, Hybrid final adder.

II. DESIGN OF PARALLEL STRUCTURES

I. INTRODUCTION

H

igh speed multiplication is a primary requirement of high performance digital systems. In recent trends the column compression multipliers are popular for high speed computations due to their higher speeds [1-2]. The first column compression multiplier was introduced by Wallace in 1964 [3]. He reduced the partial product of N rows by grouping into sets of three row set and two row set using (3,2) counters and (2,2) counters respectively. In 1965, Dadda altered the approach of Wallace by starting with the exact placement of the (3,2) counters and (2,2) counters in the maximum critical path delay of the multiplier [4]. Since 2000’s, a closer reconsideration of Wallace and Dadda multipliers has been done and proved that the Dadda multiplier is slightly faster than the Wallace multiplier and the hardware required for Dadda multiplier is lesser than the Wallace multiplier [5-6]. Since the Dadda multiplier has a faster performance, we implement the proposed techniques in the same and the improved performance is compared with the regular Dadda multiplier. The column compression multipliers have total delays that are proportional to the logarithm of the operand word lengths which is unlike the array multipliers which have speeds proportional to the word length [7-8]. The total delay of the multiplier can be split up into three parts: due to the Partial Product Generation (PPG), the Partial Product Summation

This work was carried out at the Integrated Circuit Design Laboratories, VIT University, Vellore, India. B. Ramkumar is with the School of Electronics Engineering, VIT University, Vellore (email: [email protected]). V. Sreedeep is with the School of Electronics Engineering, VIT University, Vellore (email: [email protected]). Harish M Kittur .is with the School of Electronics Engineering, VIT University, Vellore (email: [email protected])

The multiplication process begins with the generation of all partial products in parallel using an array of AND gates. The next major steps in the design process are partitioning of the partial products and their reduction process. Each of these steps are elaborated in the following subsections. A.

Partitioning the partial products

We consider two n-bit operands an-1an-2…a2a1a0 and bnb 1 n-2…b2b1b0 for n by n Baugh-Wooley multiplier, the partial products of two n-bit numbers are aibj where i,j go from 0,1,..n-1. The partial products form a matrix of n rows and 2n-1 columns as show in Fig. 1(a). To each partial product we assign a number as shown in Fig. 1 (a), e.g. a0b0 is given an index 0, a1b0 the index 1 and so on. For convenience we rearrange the partial products as shown in Fig 1(b). The longest column in the middle of the partial products contributes to the maximum delay in the PPST. Therefore in this work we split-up the PPST into two parts as shown in the Fig. 1(c), in which the Part0 and part1 consists of n columns. We then proceed to sum up each column of the two parts in parallel. The summation procedure adopted in this work is described in the next section. B.

The Dadda based reduction

Next the partial products of each part are reduced to two rows by the using (3,2) and (2,2) counters based on the regular Dadda reduction algorithm as shown in Fig. 2 and Fig. 3. The grouping of 3-bits and 2-bits indicates (3,2) and (2,2) counters respectively and the different colors classify the difference between each column, where s and c denote partial sum and partial carry respectively. E.g. the bit positions of 6 and 13 in part0 are added using a (2,2) counter to generate sum s0 and c0. The c0 is carried to the next column where it is to be added up with the sum s1 of a (3,2)

65

64

8

7

6

5

4

3

2

1

16

15

14

13

12

11

10

9

17

24

23

22

21

20

19

18

32

31

30

29

28

27

26

25

40

39

38

37

36

35

34

33

41

48

47

46

45

44

43

42

56

55

54

53

52

51

50

49

63

62

61

60

59

58

57

7

6

5

4

3

2

1

14

13

12

11

10

9

8

21

20

19

18

17

16

28

27

26

25

24

35

34

33

32

42

41

40

49

48

0

0

(a) 56

65

64

56

48

40

32

24

8

7

6

5

4

3

2

1

63

55

47

39

31

16

15

14

13

12

11

10

9

62

54

46

38

23

22

21

20

19

18

17

61

53

45

30

29

28

27

26

25

60

52

37

36

35

34

33

59

44

43

42

41

51

50

49

58

57

0 c1

s1

s0

5

4

3

2

1

c2

c0

20

12

11

10

9

8

s2

27

19

18

17

16

42

34

26

25

24

49

41

33

32

56

48

40

c1

s6

s5

s4

s3

3

2

1

c2

c5

c4

c3

18

10

9

8

c6

s9

s8

s7

25

17

16

c9

c8

c7

40

32

24

s15

s14

s13

s12

s11

s10

2

1

c14

c13

c12

c11

c10

17

9

8

c9

c8

c7

40

32

24

16

c15

s22

s21

s20

s19

s18

s17

s16

1

c22

c21

c20

c19

c18

c17

c16

16

8

p0[9]

p0[8]

p7

p6

p5

p4

p3

p2

p1

0

(b)

65

64

56

48

40

32

24

8

7

6

5

4

3

2

1

63

55

47

39

31

16

15

14

13

12

11

10

9

62

54

46

38

23

22

21

20

19

18

17

61

53

45

30

29

28

27

26

25

60

52

37

36

35

34

33

59

44

43

42

41

51

50

49

58

57

Part1

0

c15

Part0

0

0

(c) Fig. 1. Partitioning the partial products: (a) Partial product array diagram for 8*8 multiplier, (b) An Alternative Representation, (c) Partitioned structure of multiplier showing part0 and part1.

p0[10]

counter adding 7, 14 and 21. The carry c1 of (3,2) counter is added to the next column. The final two rows of each part are summed using a Carry Look-ahead Adder (CLA) to form the partial final products of a height of one bit column which indicated at the bottom of Fig. 2 and Fig. 3. The two parallel structures for Fig. 2 and Fig. 3 based on the Dadda approach are shown in Fig. 4, where HA, FA, p0, p1 and p denote Half Adder ((2,2)counter)), Full Adder ((3,2)counter) , partial final product from part0, partial final product from part1 and final product respectively. The numerals residing on the HA and FA indicates the position of partial products. The output of part0 and part1 are computed independently in parallel and those values are added using a high speed hybrid final adder to get the final product. However, before we proceed to carry out the final addition with the proposed hybrid adder, we first carry out the final addition with the CLA for both the unpartitioned Dadda multiplier and the partitioned Dadda multiplier. This enables us to evaluate and analyze the effect of partitioning the PPST into two parts. The simulation results are listed in Table I and Table II. The comparison between the Table I

0

p0

Fig. 2. Reduction of the partial products of part1 based on the Dadda approach.

and Table II gives that the percentage improvement in delay, area and power of the partitioned multipliers with respect to the regular Dadda multiplier. It can be seen that for the 8-bit multiplier, there is no improvement in the speed, area and power. But with the increase in the word size, the improvement in the speed, area and power of the partitioned multipliers increases. There is a maximum of 10.5% improvement in delay for the 64-bit multiplier with only a slight increase in the area and power of 1% and 1.8% respectively. Having clearly demonstrated the reduction in the delay of the Dadda multipliers due to the partitioning of the partial products we now proceed to further enhance the speed of the proposed multiplier. The further improvement in the performance can be achieved by replacing the CLA with the proposed hybrid final adder structure which is elaborated in the next section.

63

55

47

39

31

23

15

62

54

46

38

30

22

61

53

45

37

29

height of one bit column, we get the final partial products as follows, p0[10] p0[9] p0[8] p[7] p[6] p[5] p[4] p[3] p[2] p[1] p[0] p1[15] p1[14] p1[13] p1[12] p1[11] p1[10] p1[9] p1[8]

60

52

44

36

59

51

43

58

50 57

The p0[10:8] are the exceeding carry bits of part0 and (a) p1[15] is the carry bit of part1. The p[7:0] of part0 are directly assigned as the final products. To find the remaining p[15:8], we use the CLA and the MBEC shown in Fig. 5. 14

7

63

55

47

39

31

s24

s23

54

46

c24

c23

29

C0

S0 20

35

HA C2

13

HA S1

28

62

6

21

FA C1

53

38

37

36

49

56

C8

S3

C3

33

26

48

FA S9

11

HA

41

34

FA C9

S6

4

19

S4

C4

S5

FA C6

12

5

FA

C5 42

61

27

FA S2

HA S8

C7

S7

45

44

43

FA

FA

C15

C14

S15

FA

FA

C13

S14

C12

S13

3

25

18

60

FA C11

S12

32

40

52

51

50

FA C22

FA C21

S22

FA C20

S21

FA C19

S20

S10 17

FA C18

S19

10

HA C10

S11

2

24

FA C17

S18

9

HA C16

S17

S16 16

59

58

57

FA

FA

p0[8]

p0[9]

p0[10] ]

FA

FA

p7

p6

1

8

FA

FA

FA

FA

HA

p5

p4

p3

p2

p1

0

p0

(a) 63

55

47

s28

s27

s26

s25

23

15

30

62

c28

c27

c26

c25

43

46

39

24

53

FA C28

54

60

s30

s29

S28

C27

47

61

c30

c29

58

57

54

55

C41

C34

C40

51

HA S30

C29

S33

C32

FA C33

S34

S29

S31

C31

S32

50

58

FA C39

S40

HA

FA

61

FA S41

S25

FA S39

C38

HA

FA S38

C37

S37

C36

S36 57

63

s33

s32

s31

s30

s29

c33

c32

c31

c30

c29

50

62

61

c30

c29

58

57

(b)

63

s39

s38

s37

s36

s35

s34

Fig. 4. The Dadda based implementation: (a) Implementation of part1, (b) Implementation of part2

c39

c38

c37

c36

c35

c34

57

p1[14]

p1[13]

p1[12]

p1[11]

p1[10]

p1[9]

p1[8]

Fig. 3. Reduction of multiplier partial products of part2 based on the Dadda reduction tree.

III. THE HYBRID FINAL ADDER DESIGN In previous works the hybrid final adder designs used to achieve the faster performance in parallel multipliers were made up of CLA (Carry Lookahead Adder) and CSLA (Carry Select Adder) [9-11]. But due to the structure of the CSLA, it occupies more chip area than other adders. Thus to achieve the optimal performance, the proposed hybrid adder in this work uses MBEC (Multiplexers with Binary to Excess-1 Converters) and Ripple Carry Adder (CLA) for fast summation of uneven input arrival time of the signals originating from the PPST. The MBEC adder provides faster performance than Carry Save Adder (CSA) and Carry Look Ahead (CLA) adder [12]. Also it consumes less area and power than the Carry Select Adder (CSLA) [13]. A.

44

C25

55

p1[15]

p1[15]

59

52

S26

43

FA S35

62

FA

63

C26

29

FA

60

FA C35

S23 36

37

FA C30

C23

FA S27

45

50

S24

31

FA

22

HA

HA C24

Hybrid Adder for 8 by 8Multiplier

Once each part of the partial products has been reduced to a

FA

FA

p1[14]

p1[13]

FA

FA

p1[12]

p1[11]

FA

p1[10]

FA

HA

p1[9]

p1[8]

TABLE I REGULAR DADDA MULTIPLIER WITH CLA Multiplier N by N

Area ( m2 )

Delay (ns)

Power ( W )

8 by 8

8,428

3.40

6.32

16 by 16

29,169

4.71

33.09

32 by 32

105,237

5.92

210.50

64 by 64

397,146

7.54

925.92

TABLE II PARTITIONED DADDA MULTIPLIER WITH CLA Multiplier N by N

Area ( m2 )

Delay (ns)

Power ( W )

8 by 8

8,957

3.51

6.85

16 by 16

30,241

4.61

35.22

32 by 32

107,362

5.47

218.76

64 by 64

386,629

6.94

952.59

B.

The p0[10:8] and p1[10:8] are added using 3-bit CLA which finds p[10:8]. To obtain the remaining p[15:11], the p1[15:11] are assigned to the input of 5-bit MBEC, which produce the two partial results p1[15:11] with Cin of ‘0’ and the 5-bit BEC output with the Cin of ’1’. Depending on the Cout of CLA(c[10]), the mux provides the final p[15:11] without having to ripple the carry through p1[15:11]. The 8-bit multiplier uses a single 5-bit MBEC in the final adder. But the large bit sized multipliers requires multiple MBEC and each of them requires the selection input from the carry output of the preceding MBEC. Therefore to generate the carry output from the MBEC, an additional block is developed which is called MBECWC (MBEC With Carry). The detailed structures of the 5-bit BEC without carry (BEC) and with carry (BECWC) are shown Fig. 6(a) and Fig. 6(b). The BEC gets n inputs and generates n output; the BECWC gets n input and generates n+1 output to give the carry output as the selection input of the next stage mux used in the final adder design of 16, 32 and 64-bit multipliers. The function table of BEC and BECWC are shown in Table III.

MBEC p1[15:11]

p1[10:8]

5

5-Bit BEC 5

10:5 Mux

0

8

5 p[7:0]

p[10:8]

Fig. 5. Hybrid final adder of 8 by 8 multiplier

b4

b4

x4

b3

b3

b2

b2

x3

x4

b1

x3

x2

x2

(a)

x1

b4

b0

b1

x1

x0

b4

b0

x0

Cout

TABLE III FUNCTION TABLE OF 5-BIT BEC & BECWC

Input b[4:0] 00000 00001 00010 00011 00100

11011 11100 11101 11110 11111

BEC without BEC with carry carry cy x[4:0] x[4:0] 00001 0 00001 00010 0 00010 00011 0 00011 00100 0 00100 00101 0 00101

11100 11101 11110 11111 00000

0 0 0 0 1

11100 11101 11110 11111 00000

p[7:0]

3

p[15:11]

The variable size of adder blocks always leads to faster adders than fixed size block adder [14]. Thus to further improve the speed of addition, we breakdown the ripple of gates in the MBEC into multiple size groups of size 2n, where n  2. Based on this approach the final adder design for 16, 32 and 64-bit multipliers are shown in Fig. 7. In MBECWC, the mux is getting n-bits of data input “as it is” input for selection input ‘0’ and n+1-bits of data input from the BECWC output for selection input ‘1’. Thus to make equal the size of the inputs to the mux, the one bit ‘0’ is appended as the MSB (Most Significant Bit) to the n-bits of input. E.g. In Fig. 7(a), the 10:5 mux of MBECWC gets the two inputs: 4-bits (n-bits) of p[23:20] for selection input ‘0’ and 5-bits (n+1-bits) from the 4-bit BECWC for selection input ‘1’ respectively. Thus to make equal the size of the inputs, the one bit ‘0’ is appended as the MSB to the input of p[23:20] is like {0,p[23:20]}.

3

3-bit RCA

c[10]

5 1

p0[10:8]

3

Variable Block Hybrid Adder

x4

Cout

b3

b2

b3

b1

b2

x3

x4

b0

b1

x2

x3

x2

x1

x1

b0

x0

x0

(b)

Fig. 6. The 5-bit Binary to Execss-1 Code Converter: (a) BEC (without carry), (b) BECWC (with carry).

To analyze independently the effect of the proposed hybrid adder, the partitioned multiplier with CLA final adder is compared with the partitioned multiplier along with the proposed hybrid adder. The simulation results are listed in Table IV and Table V. The comparison between the Table IV and Table V gives that the percentage improvement in the delay, area and power of the proposed multiplier (partitioned multiplier with hybrid final adder) with respect to the partitioned multiplier with CLA final adder. The plot clearly shows that the performance improvement in delay increases with the word size of the multiplier. The speed of the 8, 16, 32 and 64-bit multipliers are improved 14.9%, 21.1%, 25.2% and 27.7% respectively. The area and power overhead for all word sizes is only slightly higher. IV. ASIC IMPLEMENTATION AND SIMULATION RESULTS The ASIC implementation of proposed design follows the cadence design flow. The design has been developed using Verilog-HDL and synthesized in Encounter RTL compiler using typical libraries of TSMC 180nm technology. The Cadence SoC Encounter is adopted for Placement & Routing (P&R) [15]. Parasitic extraction is performed using

A MBECWC p1[31:24]

p1[19:16]

p1[23:20]

8

4

p0[19:16]

4

p[15:0]

4

0 8-Bit BEC

4-Bit BECWC

8

8 1

16:8 Mux

5 0

1

0

16

4

10:5 Mux

8

5

p[31:24]

c[23],p[23:20]

p1[63:49]

p1[48:41]

1

5

0

9 1

5 0

1

0

32

5

10:5 Mux

9

p[63:49]

5

5-bit RCA

c[36]

5

18:9 Mux

15

p[31:0]

4-bit BECWC

9

30:15 Mux

p0[36:32]

0

8-bit BECWC

15

p1[36:32]

4 0

15-bit BEC

p[15:0]

p[19:16]

p1[40:37]

8

15

15

4-bit RCA

c[19]

5

5

c[48],p[48:41]

c[40],p[40:37]

p[31:0]

p[36:32]

(b)

p1[127:98]

p1[97:82]

30

p1[81:74] 8

16 0

30-bit BEC

30

0

16-bit BECWC

30 1

0 60:30 Mux 30 p[127:98]

17 1

9

0

1

5 0

18:9 Mux

17

c[69]

5 1

0

6

6-bit RCA

64

6

10:5 Mux

9

c[97],p[97:82]

6

p[63:0]

4-bit BECWC

9

34:17 Mux

p0[69:64]

0

8-bit BECWC

17

p1[69:64]

p1[73:70] 4

5

c[81],p[81:74]

c[73],p[73:70]

p[69:64]

p[63:0]

(c) Fig. 7. Variable block hybrid final adder: (a) For 16-bit multiplier, (b) For 32-multiplier, (c) For 64-bit multiplier.

TABLE IV PARTITIONED DADDA MULTIPLIER WITH CLA

TABLE V PARTITIONED DADDA MULTIPLIER WITH HYBRID ADDER

Multiplier N by N

Area ( m2 )

Delay (ns)

Power ( W )

Multiplier N by N

Area ( m2 )

Delay (ns)

Power ( W )

8 by 8

8,957

3.51

6.85

8 by 8

9,144

3.38

7.07

16 by 16

30,241

4.61

35.22

16 by 16

30,577

4.13

35.99

32 by 32

107,362

5.47

218.76

32 by 32

107,491

4.71

221.01

64 by 64

386,629

6.94

952.59

64 by 64

381,776

5.51

966.45

Encounter Native RC extraction tool. The extracted parasitic RC (SPEF format) is back annotated to Common Timing Engine in Encounter Platform for static timing analysis. For each word size of the multiplier, the same VCD (Value Changed Dump) file is generated for possible input conditions and imported the same to Cadence Encounter. Power Analysis to perform the power simulations. The similar design flow is followed for both the designs in this work.

proposed multiplier design technique can be implemented with any type of parallel multipliers to achieve faster performance.

REFERENCES [1] [2]

[3]

V. RESULT SUMMARY The comparison between the Table I (regular Dadda multiplier with CLA) and Table V (partitioned multiplier with hybrid adder) summarizes the enhanced performance of the proposed multiplier in terms of percentages which are listed in Table VI. It exhibits that the area of the regular Dadda multiplier is only slightly lesser, ranging from 7.7% to 1.4% for the 8, 16, 32 and 64-bits respectively, than the area of the proposed multiplier. It is clear that the area overhead of the proposed multiplier continuously decreases with increasing word size and is only 1.4% for the 64-bit multiplier. The power consumption of the regular Dadda multiplier is 5.2% less than the proposed multiplier for the 8-bit word size. With increasing word size the difference in power requirement of the proposed and the Dadda multiplier decreases. Thus the 64-bit Dadda multiplier requires only 3.7% less power than the proposed multiplier. The delay values clearly indicate that the proposed multiplier is always faster than the regular Dadda multiplier, also with increasing word size the percentage reduction of the delay increases. The speed enhancement is significant for the 64-bit where the regular Dadda requires 41.1% more time than the proposed multiplier.

[4] [5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

VI. CONCLUSION

[14]

We have successfully achieved faster multiplication by using a combination of two design techniques; partitioning of the partial products into two parts to perform independent parallel column compression and fast final addition using hybrid final adder structure. The result analysis shows that the power and area overheads are not significant. But the speed and power-delay product improvements are significant compared to the regular Dadda multipliers. The

[15]

TABLE VI PERFORMANCE OF THE REGULAR WITH REFERENCE TO THE PROPOSED DADDA MULTIPLIER Multiplier N by N

Area %

Delay %

Power %

8 by 8

-8.5

+ 0.5

-11.8

16 by 16

-4.8

+ 12.21

-8.76

32 by 32

-2.1

+ 20.40

-4.99

64 by 64

3.8

+ 26.91

-2.21

B.Parhami, "Computer Arithmetic", Oxford University Press, 2000. E. E. Swartzlander, Jr. and G. Goto, "Computer arithmetic," The Computer Engineering Handbook, V. G. Oklobdzija, ed., Boca Raton, FL: CRC Press, 2002. C. S. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Transactions on Electronic Computers, Vol. EC-13, pp. 14-17, 1964. Luigi Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, Vol. 34, pp. 349-356, August 1965. K.C. Bickerstaff, E.E. Swartzlander, M.J. Schulte, Analysis of column compression multipliers, Proceedings of 15th IEEE Symposium on Computer Arithmeitc,2001. W. J. Townsend, Earl E. Swartzlander and J.A. Abraham, “A comparison of Dadda and Wallace multiplier delays”, Advanced Signal Processing Algorithms, Architectures and Implementations XIII. Proceedings of the SPIE, vol. 5205, 2003, pages 552-560. P. R. Cappello and K Steiglitz: A VLSI layout for a pipe-lined Dadda multiplier, ACM Transactions on Computer Systems,pp. 157-174, 1983. Bickerstaff, K.C.”Optimization of Column Compression Multipliers” Doctoral Dissertation, Dept. of Electrical and Computer Engineering, University of Texas at Austin, Austin, Texas, 2007. V. G. Oklobdzija and D.Villeger, “Improving Multiplier Design by Using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology”, IEEE transactions on Very Large Scale Integration (VLSI) systems, Vol. 3, no. 2, June 1995. Paul F.Stelling, “Design strategies for optimal hybrid final adders in parallel multiplier”,Journal of VLSI signal processing, vol 14,pp,321331,1996. Sabyasachi Das and Sunil P.Khatri,"Generation of the Optimal BitWidth Topology of the Fast Hybrid Adder in a Parallel Multiplier", International Conference on Integrated Circuit Design and Technology (ICICDT) May, 2007. B.Ramkumar, Harish M Kittur and P.Mahesh Kannan, “ ASIC Implementation of Modified Faster Carry Save Adder”, European Journal of Scientific Research, Vol. 42, Issue 1, 2010. B.Ramkumar and Harish M Kittur , “ Low Power and Area Efficient Carry Select Adder”, IEEE Transactions on Very Large Scale Integration (VLSI) systems, vol. 20, Issue2, pp.371-375, Feb 2012. J. M. Rabaey, Digtal Integrated Circuits - A Design Perspective. Prentice Hall Press, 2001. EncounterTM User Guide, February, 2006.

B.Ramkumar received the B.E. degree in Electronics and Communication Engineering from the Madurai Kamaraj University, Madurai, in the year 2004, and the M.E. degree in VLSI design from the Anna University, Chennai, in 2006. Currently, he is pursuing Ph.D at the VIT University, Vellore. V.Sreedeep received the B.Tech. degree in Electrical and Electronics Engineering from the Jawaharlal Nehru Technological University, Anantapur, in the year 2004. Currently, he is pursuing M.Tech at the VIT University, Vellore. Harish M Kittur received the B. Sc. degree in Physics, Mathematics and Electronics from the Karnataka University, Dharwad, in 1994. M. Sc. in Physics from the Indian Institute of Technology, Mumbai, in 1996. M.Tech. in Solid State Technology in the year 1999 from Indian Institute of Technology, Madras, and Ph. D. in Physics from the RWTH Aachen in the year 2004. He is a member of IEEE and IETE.

Lihat lebih banyak...

Faster Energy Efficient Column Compression Multiplication

Descripción

Comentarios