OPERAS in a DSP CAD environment

June 14, 2017 | Autor: Kallol Bagchi | Categoría: System Design, Low Power, System modeling, Object Oriented, Power Dissipation, Hardware architecture

Share Embed

Laporkan tautan ini

Descripción

OPERAS in a DSP CAD Environment Gerard K. Yeh, Kallol Bagchi, James B. Burr and Allen M. Peterson Space, Telecommunications, and Radioscience Laboratory Department of Electrical Engineering Stanford University [email protected]

Abstract

The design of low power digital signal processor requires a CAD-environment where the designer can concurrently optimize the processing algorithm and hardware architecture over area, time and energy. The tool called OPERAS is presented as part of an overall signal processing system design ow to support the system modeling and simulation. OPERAS uses object-oriented principles and ecient C++ language to model and simulate the system behavior including power dissipation. Design examples are included which show salient aspects of the system.

1 Introduction

Low-power digital signal processing (DSP) systems are in demands for the new generation of remote sensing instruments and portable communication devices. High performance and low-power system design require experimentation with various algorithm and architecture combinations. In addition to algorithm and architecture, new technologies such as MCM packaging also improves the power and performance of a system. Because power is proportional to square of the supply voltage, low voltage circuit techniques are popular [25]. However, reduced supply voltage increases signal latency in MOS circuits, and system architect often uses techniques such as pipelining or parallelism to regain loss performance. Furthermore, as the supply approach the MOSFET threshold voltage (Vt ) the performance decreases exponentially. The reduced Vt technology increases circuit area and requires certain logic circuit styles due to increased leakage current [3]. With these and other interacting factors and tradeos, the low-power DSP system designer can bene t from a concurrent design approach. The DSP based system designs often starts with blockdiagram data- ow style of algorithmic description, and the implementation is derived by designers or synthesis tools. During this process, the system design undergoes a number of transformations to meet the overall design goals and speci cations. In addition to verifying the designs through simulation, exploration of design tradeos is desired. To validate each design candidate, fast simulation is needed to extensively simulate the algorithm and architecture using real signal data. More importantly, the realistic comparisons of dierent designs need good modelings of aspects

from packaging to system partitioning and from architecture to circuit and technology. Because power dissipation is ubiquitous throughout a system in the forms of computation, communication and storage, the system design tradeos for low power necessarily involve area, latency and energy (power normalized by circuit activity). Issues such as system reliability, control latency and hardware and software partitioning are also important in the overall design tradeos for low power. The present work is motivated by low power image signal processing applications where the design is optimized for both performance and energy. Another goal of present work is the modeling of interactions between system performance and energy with the CMOS technology parameters: supply voltage and Vt . Furthermore, the tool also accommodates systems with mixed supply voltages and systems with scalable supply voltage. The organization of the paper is as follows. Section 2 has descriptions of the CAD tools and a proposed design process ow for DSP system design. The description of object-oriented (OO) modeling and simulation ow of OPERAS (Object-Oriented Power Estimating Restructurable Architecture Simulator) is given in section 3. In section 4, two system design examples are included.

2 The DSP CAD tool environment

Design process- ow of DSP VLSI systems ideally starts from an algorithmic description and leads to the nal layout using the appropriate CAD technologies. Designers usually start with ideal design attributes, re ne them and include them in a design scheme. CAD tools are used or built to automate the scheme, whenever possible. References [15], [18], [7], [19] and [8] have discussions on recent CAD tools and techniques. Using a CAD environment for a design from algorithms to chips, one needs to work out some problems [17]. Memory and speed limitations in large and complicated designs, lack of veri ability, absence of source codes etc., continue to create problems. Easy user-interface and an integrated environment with a number of tools are essential components of a CAD tool-set. To reduce the design eort and time, a combined environment with existing and in-house tools can be created. However, incompatibility of many existing packages often create problems in such eorts.

2.1 CAD for VLSI DSP systems design

Tools

Multiprocessor Algorithm (Distrib. Mem)

Sequential Algorithm

Algorithm/Architecture Combination Evaluation and Selection Tools

Design Knowledge Database : Function Performance Energy/Power Area

* OPERAS System Simulator

Arch: Multi-DSP (Systolic, Multi-Processor)

* VHDL * Spice * Synthesis * Magic

Arch: DSP Processor

Architecture Design and Synthesis

Architecture Mapping Tools

* Algorithm to Flow Graph like SDF, etc. (e.g. Ptolemy)

Multiprocessor Algorithm (Shared Mem)

* DSP Compiler * Scheduler, etc.

Systolic Algorithm

Algorithm Design

Algorithm Transformation Tools

Circuit/Physical Design

Synthesis and simulation systems at logic and circuit levels exist as well as high-level synthesis: [23] [15]. Some tools such as Mars [1], Oct [7], SPICE, VHDL, Verilog [4], Synopsys and Magic [20] are used extensively in modern VLSI design environment. References [23], [15] and [7] have details on some of these often-used tools. Ideally logic and layout CAD systems are integrated with the high-level DSP CAD systems to provide the designer a near-complete system design environment. The Cathedral systems [14] are examples of DSP based silicon compiler. The Parsifal-I/II systems[5] have been used to design a number of DSP ASIC, and the FACE system is OO-based with a graphical user interface. Ptolemy[13] caters to those speci c DSP solutions that can be expressed eciently in synchronous data- ow graph notations, whereas Hyper[22] helps the designer to synthesize and explore the various architectural solutions. GRAPE[11] is a set of tools based on SFG graph synthesis that derive ASIC or multiprocessor solutions. Hi /Nelsis[6] is a system with OO user interface that can be used to derive regular array structures. It can be noted that an integrated CAD system for lowpower DSP system design is still not available, although Hyper system do provide some guidance in that direction. A DSP system design ow is proposed here, and the overall

ow of the design scheme is shown in Figure 1. Tools for algorithm transformation, scheduling, architectural mapping and synthesis are used at the higher-level, and the design is fed into the OPERAS simulator. As shown, OPERAS is a system modeling and simulation tool that pulls detailed hardware design knowledge to the system modeling level. The knowledge database includes functional interface, performance, power and area information of a particular module or macro-cell implementation. The circuit and physical CAD tools for detailed VLSI design are used once a particular system solution is selected. Natural exchange of data and easy interface at all levels are desired goals for creating such a design framework. Verifying simulation and synthesis results from various levels poses a number of diculties like representative workload generation, creating a consistent and uniform framework for comparison of results from simulators at dierent levels, etc. Several commercial and academic logic and behavioral simulators are available, but they model hardware at a ner granularity and runs at a relative slow speed for DSP system level simulation and modeling [4] [24]. In addition, the energy behavioral modeling is often lacking. For performance, exibility and quick prototyping reasons, DSP designers often write system behavioral models in general purpose language such as C. The purposes of this level of modeling include the algorithm and architecture veri cation, test vector generation and performance estimation. OPERAS extends this level of modeling with objected-oriented methodology, energy behavior modeling and a coarse grain module level event-driven simulation mechanism.

Domain DSP/DIP Algorithm

Select the Best Algorithm/Arch Combination

HDL/RTL Description

Circuit Layout

+SDF: Synchronous Dataflow Graph Design Feedback and Iteration

Figure 1: Design Flow of DSP System and CAD Tools

3 A description of OPERAS

The overall simulation environment of OPERAS is based on the standard C++ language. The user describes the system design by de ning a group of modules using the OPERAS input system design language (OSDL). The input description is preprocessed by the OSDL preprocessor to automatically generates the C++ code while the user-embedded C++ code are copied to the output without change. The OSDL language syntax allows for ease of OO modeling and simulation without the more complex C++ syntax although the standard C++ can still be used. The modeling and simulation codes are embedded within the OPERAS library, and the OPERAS's development time was shorten by the use of LEDA class library [16]. OPERAS leverages the ecient C++ compiler and linker to generate a fast and ecient simulator, allowing for simulation of a large number of virtual machine cycles. The standard C++ software development environment facilitates portability and interface with other CAD tools and signal acquisition systems. Also, the use of C++ brings the software and hardware modeling and development of system in the same environment. The encapsulation of models within the module makes re-usage, module modi cation and debugging easier. Another advantage to OO based modeling is that the good code re-usage, model extension and version modeling can be implemented using the inheritance mechanism. OPERAS

uses the C++ class inheritance mechanism to allow user to specialize and extend existing designs. Existing modules can be stored in a module library to be re-used or extended. The user can direct the simulation by providing the test vectors or initial contents to the memory modules. The information is stored on the host machine's le system. Similarly, the input test vectors can be used to test the design by de ning a test signal generator module. The simulator generates a text le that describes the output waveform, and the waveform can be displayed by a standard waveform display program. Output traces of the selected nodes within a design can be generated to monitor for correctness and for other analysis and simulation. For example, statistical and circuit based energy estimation techniques can use the signal traces as input. Currently, OPERAS displays waveforms using the IRSIM analyzer [24].

3.1 System modeling

In OPERAS based modeling, the whole system is constructed in a concise and modular manner using two fundamental objects: Module and Net. The detailed behavior of a module is speci ed by using C/C++ code, and the C/C++ code is encapsulated within the module. Nets are used to interconnect the modules, and each net object can be de ned as a single wire or a bus of wires. The Subnet object is used to connect a module to one or multiple wires within a bus. The whole design is described in an hierarchical treelike manner, allowing one to formulate and model a complex system as a single module at the root level containing instantiation of other modules [26]. Each module has the associated interface and functional speci cations, performance and energy estimation function. In particular, the functional, delay and energy functions are described using the member functions (known as methods in OO terminology) of the module. Furthermore, each module may also contain instantiation of other modules called cells or stdcells. A cell is an instantiation of an user-speci ed module de nition, and a stdcell is an instantiation of a module from the module libraries. Each cell in turn can consist of other cells, and the net connections are speci ed through the interconnections during instantiation. Additional modeling information can be added by the user through the use of regular C++ code.

3.2 Design representation

Although designers are usually comfortable and familiar with the hierarchical modular design, real hardware functions like a directed graph with each node representing a hardware module and each edge representing a signal between the modules. The graph representation can model the complex interactions between hardware modules in a clean and ecient way. The internal directed graph representation is generated from the hierarchical description. After this design database transformation, the hierarchical user view is preserved through the naming of modules and nets.

The module can logically represent a wide range of objects with a varying degree of complexity. For example, the system model can be an array of signal processors with each module representing one processor. On the other hand, a module may represent a gate level model of an adder circuit. A module object just needs to have the standard methods for the simulation to work [26]. Furthermore, the module structure allows for the study of gate and macrocell level structures with the rest of the system represented with high-level behavioral description. This exibility allows for interesting and ecient exploration of dierent system structures.

3.3 Simulation and energy modeling

The simulation is conducted over the modules and nets in an event-driven manner which allows for modeling of hardware concurrency. When an event arrives, the module models the functionality, delay and energy by calling the corresponding methods. Each event is associated with a net. An activated event will cause the net object to propagate the new input value to the modules, via the intervening Subnet objects, if any. After all the events have been activated and propagated to the proper modules, the eected modules' methods are executed to produce the new output and to update energy estimates. The module schedules a new event at the net corresponding to the output if a new and dierent value is generated The concurrency and signal delays through modules are important for energy modeling of system because the hazards of signals also consume energy. The delay of each signal is scaled by the supply voltage. The concurrency is coarsely modeled at the module boundary using nets because the detailed modeling is traded for simulation performance. The concurrency modeling at the module level is sucient for architectural level simulation and validation, but a better signal glitching model is needed in the energy estimation method. The energy behavior of the modules can be characterized by the energy per operation of module as scaled by the supply voltage. This parameter can be derived either through detailed circuit analysis [21] or circuit simulation [2][24]. Furthermore, the statistical power estimation [10] or behavioral modeling may be used by gathering the signal statistics at the nets. For the reduced or zero Vt technology, the DC leakage energy can be integrated by keeping track of the previous invocation of the energy estimator method.

4 Examples

The modeling and simulation features of OPERAS are illustrated in this section. First, the basic design method is illustrated with a simple ALU circuit suitable for image processing. The second example illustrates the modeling of a modern DSP architecture, and shows various estimator outputs useful for conducting experiments to arrive at a low-power system design.

4.1 An 8-bits ALU

The design example is presented in Figure 2. The module description of an 8-bits ALU design is shown with the

module ALU8 { desc { An 8-bit ALU unit } derivedfrom {ALU8b}

A

INVERTER inv

A Opcode1 B //* Interface Section ***** output {Answer} output {CarryOut} output {Overflow} input {A} input {B} input {CarryIn} input {Opcode1} //select input mux input {Opcode2} //select input mux input {Clk} //clock output latch //* net net net net

N

Opcode2

N

MUX muxA Opcode1

MUX

MUX

Opcode2

N

Addin1

S

Bbar

MUX muxB

N N

N

ADDER8

S

4

S

Addin2 S

4

4

4 Carry In

Carry Out

CarryOut

N

N

CarryIn

ADDER8

N

CarryLSB

ADDR4 MSB 4

S

S

ADDR4 LSB 4

Sum

Net Section *********** {Addin1[8]} //Specified with bus width {Addin2[8]} {Bbar[8]} {Sum[8]}

//* Cell Section ********** cell MUX_8 {muxA (Addin1,Answer,A,Opcode1)} cell MUX_8 {muxB (Addin2,B,Bbar,Opcode2)} cell ADDER_8 {adder8 (Sum,CarryOut, Addin1,Addin2,CarryIn)} stdcell LATCH{latch(Answer,Sum,Clk)} stdcell NOT {inv (Bbar,B)}

B

B

A

N

LEGEND

LATCH

N

Clk

Sum Clk

Module Object

N

Answer

S

Subnet Object

N

Net Object

LATCH

Event Driven Path Answer

Figure a: Schematic of the 8-bit ALU (ALU8)

N

Event Schedule Path

Figure b: Simulator Internal Representation of the ALU8 Datapath Circuit

OPERAS ANALYZER

Sun Jan 9 21:14:27 1994

Clk

Answer

00

AA

FF

55

00

55

FF

AB

01

AB

00

AB

56

AB

FF

AB

01

hinclude {,} CarryOut

//* Performance ********* delay Overflow { //* Legal C++ Code that describes delay */ } //* Functionality ********* execute (output Overflow,input Sum, input Addin1,input Addin2) { //* Overflow Detection Logic */ Overflow = ( !((Addin1>>7)&0x1 ^ (Addin2>>7)&0x1) & ((Addin1>>7)&0x1 ^ (Sum>>7)&0x1)) ? 1 : 0; }

Overflow

AA

A

B

00

55

55

AA

AA

55

55

AA

AA

CarryIn

Opcode1

Opcode2

time

//* Energy Estimation ***** update_energy { /* energy estimation function */ . . . . . }

(gate delay) 00

10

20

30

40

50

60

70

80

84

Figure c: OPERAS waveform display.

Figure 2: 8-bits Fixed Point ALU Simulation with OPERAS corresponding schematic in Figure 2a. As shown in the description, the ALU8 design is derived from a behavioral version called ALU8b through the OO inheritance mechanism. The interface section declares input and output nets, and net section de nes new net objects. The cell section instantiates other modules as new cell or stdcell object and de nes the interconnections with the nets. The necessary de nitions of stdcell libraries are included in the hinclude statement. The logic, latency, energy and area estimation functions are de ned by execute(), delay(), update energy() and ndA() respectively. In conducting the simulation, the simulator looks for these standard methods within the module to obtain the energy, latency and area information. The user can extend the modeling and simulation by adding her own subroutines within the standard methods. Furthermore, to model the dierent system partitioning and packaging scheme, update energy() can interrogate the connected nets to obtain interconnect information such as fanout and capacitance. Figure 2b illustrates the internal representation of the 8bit ALU example presented in Figure 2a. The actual graphlike representation of Figure 2b is generated and stored within the design database. In this gure, the 8-bit adder is assumed to be constructed out of two 4-bit adders. In a hierarchical view, the 4-bit adders would be encapsulated

within the 8-bit adder module as cells (instantiated modules). Consequently, the design details of the 8-bit adder are hidden in the hierarchical representation, and in Figure 2b, the two 4-bits adder are shown within the box labeled as ADDER8. The ALU8 input signals such as A, B and CarryIn can come from other modules or from the test vector generator module de ned by the user. The gure shows the 4-bit adder inputs need to be taken from two separate nets by rst splitting the nets into the most signi cant subnet (MSB) and least signi cant subnet (LSB). The answer from the MSB adder and from the LSB is then combined at the output. In both the splitting and merging cases, the subnet object is used to handle those abstractions and to propagate the events. Figure 2c shows the waveforms from the 8-bits ALU simulation, and each signal corresponds to a net object of Figure 2b. The Answer waveform has been annotated to show the ALU operations and results, and the CarryOut and Over ow also show the corresponding status.

4.2 TMS320 architectural modeling

A 256-taps FIR is implemented by the Texas Instrument's TMS320C25 DSP Processor because the single cycle multiply accumulate unit matches the FIR computation. The architectural model and pipeline timing are de-

module TMS320 DATAPATH f desc f TMS320C25 Datapath g //* Interface Section ***** input fPsi1g //Input Clocks input fPsi2g input fPsi3g input fPsi4g //Program Bus input fPGDataBus wg //Data Bus inout fDataBus wg input fsxg //sign extend? //load TR Reg input floadTR s1eg //ALU commands input faluSub s3eg input faluXor s3eg input faluOr s3eg input faluNeg s3eg input faluAnd s3eg ........ //* Net Section ********** //TR output net fTrOut v3e[16]g net fMulMemOp v3e[16]g //Selected Mem Op //Multiplier input 2 net fMulIn2 v3e[16]g //Multiplier output net fMulOut v2m[32]g ........ //* Cell Section ********** stdcell LATCH fTR (TrOut v3e,DataBus w, loadTR s1e)g stdcell MUX2 fmuxA (MulMemOp v3e, PGDataBus w,DataBus w,mulBusSel s3e)g stdcell MUX2 fmuxB (MulIn2 v3e, MulMemOp v3e,TrOut v3e,sqr s3e)g cell MUL16 fMUL (MulOut v2m,TrOut v3e, MulIn2 v3e)g stdcell LATCH fPR (PrOut v3e,MulOut v2m, loadPR s2f)g cell SFR6 32 fmushr (SFROut v3e,PrOut v3e)g cell SFL14 32 fmushl (SFLOut v3e, PrOut v3e,mulShAmt s3e)g stdcell MUX3 fmuxD (MulShOut v3e, SFROut v3e, PrOut v3e,SFLOut v3e, mulOutSel s3e)g //************************ cell SHIFTL16 16 finshift (ShiftOut v3e, DataBus w,InShAmt s3e,sx)g stdcell MUX2 fmuxE (ALUIn1 v3e,ShiftOut v3e, MulShOut v3e,aluInSel s3e)g cell ALU32 fmuxF (ALUOut s4e,ALUIn1 v3e, ACC v4e,aluAdd s3e,aluSub s3e,aluXor s3e, aluOr s3e,aluNeg s3e,aluAnd s3e)g stdcell LATCH fACC (ACC v4e,ALUOut s4e, loadAcc s4e)g cell SHIFTL8 32 faccsh1 (ACCShHOut v3e, ACC v4e,accHShAmt s3e)g

The datapath block diagram is attached at the end of paper.

cell

g

SHIFTL8 8 faccsh2 (ACCShLOut v3e, ACC v4e[15:0],accLShAmt s3e)g stdcell TRIDRV fmulh (DataBus w, MulShOut v3e[31:16],mulDrvH s3e)g stdcell TRIDRV fmull (DataBus w, MulShOut v3e[15:0],mulDrvL s3e)g stdcell TRIDRV facch (DataBus w, ACCShHOut v3e,accDrvH s3e)g stdcell TRIDRV faccl (DataBus w, ACCShLOut v3e,accDrvL s3e)g hinclude f(STDC/latch.h),(STDC/gate.h)g

Figure 3: TI TMS320C25 Datapath Simulation with OPERAS

rived from the information presented in [12] and [9]. The TMS320C25 datapath block diagram [9] and the corresponding OSDL code is shown in Figure 3. The OSDL code models each block as a separate module based on real circuits. For the FIR lter implementation, the principle components, Multiplier and ALU, operates in a pipeline to achieve the single cycle multiply accumulation. The results of the TMS320C25 datapath model and simulation assuming a CMOS implementation are shown in Table 1. From Table 1, it can be seen that the area gets substantially reduced when the technology is changed from 1.8 to 0.8 microns. Power is drastically reduced when the datapath is implemented in a 0.8 m 1.1 V technology, which should be the design choice for low-power systems. In this case, however, the cycle time increases, although not in the same magnitude. As shown in Table 1, OPERAS provides system designer with information on the eects of dierent implementation technologies based on the simulation of a large number of

virtual machine cycles. A designer can experiment with different datapath con gurations to optimize for the dierent objectives. For example, for dedicated or adaptive FIR, a solution can be arrived at by simplifying the TMS320 datapath. Filtering of real data using dierent word size can also be tried to trade o signal-to-noise-ratio for energy and area. For programmable FIR lter, the instruction set decoder, described in C++, used to control the datapath and address generation unit can be drastically reduced. The changes to re ect a reduced instruction set require a minimal programming eort. Better memory and data bus structure would reduce the large communication and storage energy due to the large capacitances of long buses and large memories.

5 Conclusion

This paper presented a CAD synthesis and simulation environment for low-power signal processing system design. This scheme supports design from algorithm-level to

Technology Area Power Cycle Time and Voltage (mm2 ) (mW ) (ns) 1.8m 5V 5.688 88.8 100 0.8m 5V 0.937 133 29.6 0.8m 3:3V 0.937 38.0 44.9 0.8m 1:1V 0.937 1.42 135

Table 1: Performance of the TMS320C25 Datapath circuit-level, and the focus is on the system architectural simulation aspect of the tool called OPERAS. The simulator is currently running under UNIX using the GNU g++ compiler. The design is represented as a set of modules interconnected via the nets. The function, performance and power estimators are encapsulated within the modules. The coarse grain event driven simulation is conducted over this simple object-oriented representation of nets and modules. By specifying energy estimation function on a module basis and modeling the delay of signals, one can achieve good simulation performance and power estimates. OPERAS, when used with other higher and lower level CAD tools, can provide a near-complete design path from algorithms to system and chip design. Designers can rapidly simulate and evaluate various system alternatives using behavioral models derived from actual circuits. Ensuring correct and consistent results from various levels of simulation is important, and the validation procedure is currently being investigated in some details. Image processing systems are also presently being modeled for experimentations and comparisons.

Acknowledgements

The rst author is supported by a NASA Global Change Fellowship. (NASA Grant NGT30115) The authors acknowledges Drew Wingard and Masataka Matsui for their help in adapting the IRSIM analyzer. The OPERAS preprocessor is derived from the Ptolemy preprocessor.

References

[1] P. Agrawal. Synergism of VLSI Architecture and Algorithms: The MARS VLSI System. In Proceedings Advanced Research in VLSI, pages 53{59. MIT Press, 1990. [2] R. Burch, et al. A Monte Carlo Approach for Power Estimation. In IEEE Transactions on VLSI, pages 63{71, March 1993. [3] J. Burr and J. Shott. A 200mV Self-Testing Encoder/Decoder using Stanford Ultra-Low-Power CMOS. In IEEE International Solid-State Circuits Conference, pages 84{85, February 1994. [4] Cadence Design, Inc. Verilog Reference Manual, 1991. [5] A.E. Casavant and M. A. D`Abreu, et al. A Synthesis Environment for Designing DSP Systems. In IEEE Design & Test of Computers, pages 35{44, April 1989. [6] A. de Lange, et al. HiFi: An OO System for the Structural Synthesis of Signal Processing Algorithms and the VLSI Compilation of the Signal Flow Graphs. In European Conf. on Circuit Theory and Design, pages 629{33, Sept. 1989.

[7] S. W. Director, editor. Proceedings IEEE, Special Issue on VLSI CAD. IEEE Press, February 1990. [8] D. D. Gajski, et al. Towards intelligent silicon compilation. In Design Systems for VLSI Circuits. Logic Synthesis and Silicon Compilation, pages 365{83, July 1986. [9] Texas Instruments. Second-Generation TMS320 User's Guide. Texas Instruments, 1987. [10] P. Landman and J. Rabaey. Power Estimation for High Level Synthesis. In Proceedings of EDAC-EUROASIC '93, pages 361{366, February 1993. [11] R. Lauwereins, et al. GRAPE: a CASE tool for digital signal parallel processing. In IEEE ASSP Magazine, pages 32{43. April 1990. [12] E. A. Lee. Programmable DSP Architectures: Part II. In IEEE ASSP Magazine, pages 4{14, 1989. [13] E. A. Lee. A design lab for statistical signal processing. In International Conference on Acoustics, Speech, and Signal Processing, pages 81{4, March 1992. [14] H. De Man, et al. Architecture-driven synthesis techniques for VLSI implementationof DSP algorithms. In Proceedings of the IEEE, pages 319{35, February 1990. [15] G. De Micheli. Extending CAD Tools and Techniques. In IEEE Computer Magazine, pages 85{87, January 1993. [16] Stefan Naher. LEDA Manual. Max Planck Institute, 1992. [17] A. R. Newton. Has CAD for VLSI reached a dead end? In VLSI 91. IFIP TC10/WG 10.5 International Conference, pages 187{92, August 1991. [18] A. R. Newton. VLSI-based system design challenges in the early 1990s. In Applied Computer Science and Software, Turning Theory into Practice, pages 148{58, October 1991. [19] A. F. Nielsen. A Regular VLSI Array Design Scheme for a Class of DSP Algorithms. PhD thesis, Aalborg University, Denmark, 1992. [20] J. Ousterhout, et al. Magic: A VLSI Layout System. In ACM/IEEE Design Automation Conf., pages 152{59, 1984. [21] S. Powell and P. Chau. Power dissipation of VLSI array processing systems. In Journal of VLSI Signal Processing, pages 199{212, May 1992. [22] J. M. Rabaey, P. Hoang C. Chu, and M. Potkonjak. Fast Prototyping of Datapath-Intensive Architectures. In IEEE Design & Test of Computers, pages 40{51, June 1991. [23] W. Rosenstiel and H. Kramer. Scheduling and Assignment in High level Synthesis of DSP Algorithms. In R. Camposano and W. Wolf, editors, High-level VLSI Synthesis, pages 355{82, 1991. [24] Arturo Salz and Mark A. Horowitz. IRSIM: An Incremental MOS Switch-Level Simulator. In ACM/IEEE Design Automation Conference, pages 173{178, 1989. [25] Eric A. Vittoz. Micropower techniques. In Y. Tsividis and P. Antognetti, editors, Design of MOS VLSI Circuits for Telecommunications. Prentice-Hall, 1985. [26] B. P. Zeigler. Object-Oriented Simulation with Hierarchical, Modular Models. Academic Press, 1990.

Lihat lebih banyak...

OPERAS in a DSP CAD environment

Descripción

Comentarios