LEON-2: General Purpose Processor for a Wireless Engine

June 16, 2017 | Autor: Jiri Gaisler | Categoría: Case Study, Power Consumption, Low Power, High performance, System on a Chip
Share Embed


Descripción

LEON-2: General Purpose Processor for a Wireless Engine Z. Stamenković, C. Wolf, G. Schoof and J. Gaisler* IHP GmbH, Frankfurt (Oder), Germany *Gaisler Research, Göteborg, Sweden

IEEE802.11a [2] and HiperLAN2 [3] standards. The WBN team developed special hardware/software co-designs that allow the realization of wireless modems with full throughput of 54Mb/s including the MAC layer [4]. Wireless Internet (WI) project develops a new terminal oriented TCP/IP for wireless systems. The focus is to raise the energy efficiency by using a vertical optimization [5]. Mobile Business Engine (MBE) project seeks a specific application processor for the wireless engine that targets highly efficient encryption operations to increase wireless privacy and security [6]. Test project defines a new Design-for-Testability (DFT) approach and techniques for testing multiprocessors on a chip. To support these projects, we develop a library of reusable ASIC modules (Modular Processor Library [7]) particularly suitable for small, low-power devices, as required for a wireless engine. Powerful embedded processors always play the crucial role in such a library. In starting phase, we implemented a high-performance low-power system-on-chip based on LEON-2 processor system [8] as a general purpose processor of our wireless engine targeted to run at maximum frequency of 80 MHz and with power consumption of 500 mW. The paper is organized as follows. The system architecture as well as configuration issues are presented in Section 2. Integration of system components and verification methodology are presented in Section 3. The conclusion is given in Section 4.

Abstract - The paper presents a case study on the implementation of LEON-2 processor system on a chip. LEON-2 core is used as a general purpose processor for the concept of high-performance low-power wireless engine. The implemented processor system has been verified and become a reusable module of our modular library. The measured speed and power consumption of implemented system on a chip prove LEON-2 processor is a good processor candidate for the target application.

I.

INTRODUCTION

To harness complexity and decrease time-to-market, the system design process is developing, mainly going from composing designs from low-level building blocks to the reuse of very complex ones. Advanced System-on-Chip (SoC) designs are usually a mix of externally sourced Intellectual Property (IP) blocks and in-house developed standard functions and application specific blocks. We focus on wireless communication SoCs that strongly request both energy saving and real-time processing solutions to enable flexible designs according to customer demands. Our final goal is to design and implement a wireless engine [1] that needs to result in a multiprocessor on a chip including a general purpose processor, custom processors, memory, standard input/output component, digital baseband and analog frontend (Figure 1). Several projects contribute to accomplish this task.

Protocol Engine

Application Engine

DLC DLC

Power Management

Baseband Baseband

Test Engine

II. SYSTEM ARCHITECTURE AND CONFIGURATION LEON-2 processor system [8] is highly configurable, allowing the user to customize it for a certain application (selecting different cache sizes, multiplier performance, clock generation, etc.) or target technology. It is available as an open core in form of a VHDL model describing the SPARC V8 processor core, system bus and peripheral components [9]. New modules can easily be added using the on-chip system bus [10]. A graphical configuration tool based on UNIX kernel scripts is used to configure the system. The configuration environment is modified to include the IHP’s 0.25µm CMOS CDR3 library [11] as a target technology library. The architecture of the configured system is presented in Figure 2. The system is based on LEON-2 core connected through the AMBA bus to system peripherals. The core

RF RF

Figure 1: Illustration of the wireless engine approach

Wireless Broadband Network (WBN) project focuses on highly integrated broadband wireless modems according to

c 1-4244-0185-2/06/$20.00 2006 IEEE

50

is 512 x 32). Also, the data arrays have been implemented of a block of 8 KB (the size is 2048 x 32).

integrates both instruction and data cache memories (ICACHE and DCACHE) and corresponding cache controllers. It also includes an interface to the AMBA advanced high performance bus (AMBA AHB) and its controller. A memory controller is attached to the AHB. It provides an interface to an external flash memory and static RAMs. The slower AMBA advanced peripheral bus (AMBA APB) is attached to the AHB via a bridge. Two UARTs, timer, I/O port and interrupt controller are connected to the APB.

C. Memory Controller The external memory bus is controlled by a programmable memory controller. The controller acts as a slave on the AHB. The function of the memory controller is programmed through memory configuration registers through the APB. The memory bus provides a direct interface to PROM, memory mapped I/O devices and asynchronous static RAM (SRAM). Chip-select decoding is done for two PROM banks, one I/O bank and five SRAM banks. Therefore, there are eight chip-select signals in the memory controller.

I-Cache DSU CPU

DCL

D. Hardware Debug Support Units LEON-2 processor system includes hardware debug support to aid software debugging on target hardware. The support is provided through two modules: a debug support unit (DSU) and a debug communication link (DCL). The DSU can put the processor in debug mode, allowing read/write access to all processor registers and cache memories. The DSU also contains a trace buffer which stores executed instructions or data transfers on the AMBA AHB bus. For simplicity and area saving, we have not included this buffer in the implemented configuration. The debug communications link implements a simple read/write protocol and uses standard asynchronous UART communications. The debug support unit is used to control the processor debug mode. The DSU is attached to the AHB bus as a slave, occupying a 2 MB address space. Through this address space, any AHB master can access the processor registers. The DSU control registers can be accessed at any time, while the processor registers and caches can only be accessed when the processor has entered debug mode. In debug mode, the processor pipeline is held and the processor is controlled by the DSU. The debug communication link consists of a dedicated UART connected to the AHB bus as a master. A simple communication protocol is supported to transmit access parameters and data. A link command consists of a control byte, followed by a 32-bit address and optional write data. Through the communication link, a read or write transfer can be generated to any address on the AHB bus.

AHB Controller

AHB Irq Ctrl IO Port

D-Cache

Memory Controller

APB

AHB/APB Bridge

UARTs Timers

SRAM

Flash

Figure 2: System architecture

A. Integer Unit LEON-2 integer unit implements the full SPARC V8 standard, including all multiply and divide instructions. It is based on a 5-stage instruction pipeline, and separate instruction and data cache interfaces. The number of register windows is configurable within the limit of the SPARC standard (2 - 32). We have decided for an inferred register file (made of flip-flops) of 8 register windows. B. Cache Sub-system Separate instruction and data caches are provided, each configurable to 1 - 64 KB, with 16-32 bytes per line. Subblocking is implemented with one valid bit per 32-bit word. The instruction cache uses streaming during line-refill to minimize refill latency. The data cache uses write-through policy and implements a double-word write-buffer. Both cache types can be configured as a direct-mapped or as a multi-set cache with associativity of 2 - 4 implementing either the leastrecently-used or the random replacement policy (a 2-way associative cache implements the least-recently-replaced algorithm). We have implemented a configuration consisting of an 8 KB instruction cache and an 8 KB data cache with 16 bytes per line. Each of the caches consists of a tag array and a data array. As associativity is one, the tag array is 23-bit wide and the data array is 32-bit wide in both caches. Two embedded SRAM blocks with size of 8 KB and 2 KB are used for implementation of the tag and data arrays. The tag arrays have been implemented of a block of 2 KB (the size

E. AMBA on-chip Buses Two on-chip buses are provided: AMBA AHB and APB. The APB is used to access peripherals and on-chip registers, while the AHB is used for high-speed data transfers. The full AHB/APB standard is implemented. The processor is connected to the AHB through the instruction and data cache controllers. Access conflicts between the two cache controllers are resolved locally. The processor will perform burst transfers to fetch instruction cache lines or reading/writing data as results of double load/store instructions. Byte, half-word and word load/store instructions will perform single (non-sequential) accesses.

51

enabled during synthesis. For each option present, the corresponding register bit is hardwired to ‘1’.

Locked transfers are only performed on LDST and SWAP instructions. Double load/store transfers are however also guaranteed to be atomic since the arbiter will not re-arbitrate the bus during burst transfers. AHB is designed for high-performance, high-clockfrequency system modules. It acts as a high-performance system backbone bus. This bus supports the efficient connection of processors, on-chip memories and off-chip external memory interfaces with low-power peripheral functions. LEON-2 uses the AMBA-2.0 AHB to connect the processor cache controllers to the memory controller and other high-speed units. In our configuration, two masters are attached onto the bus: the processor and the UART of debug communication link, and three slaves are provided: the memory controller, the debug support unit and the AHB/APB bridge. AHB/APB bridge acts as the only master on the APB. All communication between masters on the AHB and slaves on the APB pass through this bridge. The APB is optimized for minimal power consumption and reduced interface complexity to support peripheral functions. It is configured to connect five slaves: interrupt controller, timer, two UARTs, and parallel I/O port.

K. Power-down Register The processor can be powered-down by writing an arbitrary value to the power-down register. Power-down mode will be entered on the next load or store instruction. To enter the power-down mode immediately, a store to the power-down register should be performed followed by a ‘dummy’ load. During power-down mode, the integer unit will effectively be halted. The power-down mode will be terminated (and the integer unit re-enabled) when an unmasked interrupt with higher level than the current processor interrupt level becomes pending. All other functions and peripherals operate as nominal during the power-down mode. III. SYSTEM IMPLEMENTATION AND VERIFICATION For system implementation and verification, we have used the original simulation and synthesis scripts [9] having provided necessary modifications. First, modifications have been done to incorporate custom SRAM Verilog simulation models into the original VHDL processor model. A. Synthesis The system is fully synthesizable with most synthesis tools. After the configured processor system including SRAM models had been verified, we have modified the synthesis scripts to map the design into the target library. The design with directly instantiated SRAM blocks and pads has been synthesized for a target frequency of 80 MHz using Synopsys Design Compiler [12]. An SDF (Standard Delay Format) file of the synthesized gate-level netlist has been generated too.

F. Interrupt Controller The interrupt controller is used to prioritize and propagate interrupt requests from internal or external devices to the integer unit. In total 15 interrupts are handled, divided on two priority levels. G. Timer Unit The timer unit implements two 24-bit timers, one 24-bit watchdog and one 10-bit shared prescaler. We do not use the watchdog.

B. Verification A generic testbench is provided for generation of a few testbench configurations: FUNC testbench performing a quick check of most on-chip functions, MEM testbench testing all on-chip memory with patterns of 0x55 and 0xAA, and FULL testbench combining memory and functional tests, suitable to generate test vectors for manufacturing testing [9]. Numerous simulations using these testbenches have been carried out after synthesis to prove the correct functionality of the design gatelevel netlist. All the simulations without and with the corresponding SDF file have been done using ModelSim Simulator [13]. The same simulations (using the original testbenches and self-made assembler program) are used for verification of the netlist of the generated layout.

H. UARTs Two identical UARTs are used for serial communications. The UARTs support data frames with 8 data bits, one optional parity bit and one stop bit. To generate the bit-rate, each UART has a programmable 12-bits clock divider. Hardware flow-control is supported through the RTSN/CTSN handshake signals. I.

Parallel I/O Port A partially bit-wise-programmable 32-bit I/O port is provided on the chip. The port is split in two parts - the lower 16-bits are accessible via the PIO[15:0] signal while the upper 16-bits uses DATA[15:0] and can only be used when all areas (ROM, RAM and I/O) of the memory bus are in 8- or 16-bit mode. We have used the lower 16 bits of the I/O port that can be individually programmed as an output or input.

C. Layout After functionality of the synthesized netlist had been verified, we have created a floorplan using Cadence First Encounter [14]. In floorplanning phase, the memory blocks have been placed as hard macros. Design layout has been generated using a standard sequence of the back-end process steps: power planning, placement, clock tree generation,

J.

Configuration Register Since LEON-2 processor system is synthesized from an extensively configurable VHDL model, a configuration register (read-only) is used to indicate which options were

52

For the inserted scan-chain (made of more than 11000 scanable flip-flops), we have generated more than 1300 manufacturing test vectors by Synopsys TetraMAX Automatic Test Pattern Generator in form of a WGL file. A Verilog DPV testbench has been prepared for serial simulation of all scan data too. All the tests (FULL test, BIST and scan test) are executed on the Agilent's chip tester 93000.

routing and verification of geometry. The processor system is fabricated in the IHP’s 0.25µm CMOS technology. The chip photo is shown in Figure 3. Geometrical and electrical features of the chip are summarized in TABLE 1. The data show high performance and low power of the implemented system-onchip. TABLE 1 CHIP FEATURES Area Number of transistors Number of ports Maximum frequency Power consumption

27.2 mm2 ~1.500.000 128 signal + 16 power 85 MHz [email protected], 60 MHz

IV. CONCLUSION This paper presents an experience in implementation of LEON-2 processor system configured to play the role of a general purpose processor for the IHP’s wireless engine. We have demonstrated the performance and features of this processor system (fabricated in the IHP’s 0.25µm CMOS technology) that meet requirements imposed by target application. The implemented processor system has been verified and become a reusable module of our modular library. REFERENCES 1. 2. 3. 4.

5.

Figure 3: Chip photo

6.

D. Testability The design is highly testable as in addition to functional testing of the complete system-on-chip, the SRAM blocks have been tested by integrated BIST and the rest of the logic by a chain of scanable flip-flops (a scan-chain). Each SRAM block includes the BIST logic, and subsequently, four additional ports: an enable signal, a reset signal, a ‘fail’ signal (which is asserted in case of a fault) and a ‘done’ signal (which is asserted when the test is finished). A Verilog BIST testbench has been prepared for simulation purposes.

7.

8. 9. 10. 11. 12. 13.

14.

53

IHP – Innovations for High Performance microelectronics, http://www.ihp-ffo.de/wireless/WLEindx.htm http://grouper.ieee.org/groups/802/11 http://www.hiperlan2.com E. Grass, K. Tittelbach-Helmrich, U. Jagdhold, A. Troya, G. Lippert, O. Krüger, J. Lehmann, K. Maharatna, K.F. Dombrowski, N. Fiebig, R. Kraemer, and P. Mähönen, “On the single-chip implementation of a Hiperlan/2 and IEEE 802.11a capable modem,” IEEE Personal Communications Magazine, vol. 8, pp. 48-57, 2001. M. Methfessel, K.F. Dombrowski, P. Langendörfer, H. Frankenfeldt, I. Babanskaja, I. Matthaei, and R. Kraemer, “Vertical optimization of data transmission for mobile wireless terminals,” IEEE Wireless Communications, vol. 9, pp. 36-43, 2002. P. Langendörfer, “Integration moderner Hand Implementierungstechniken in Codegeneratoren,” University of Erlangen, 2001. Z. Stamenković, G. Panić, U. Jagdhold, H. Frankenfeldt, K. TittelbachHelmrich, G. Schoof, and R. Kraemer, “Modular Processor: A Flexible Library of ASIC Modules,” Proc. The IASTED International Conference on Applied Simulation and Modelling, pp. 428-432, 2004. LEON-2 Processor User’s Manual, http://www.gaisler.com/ LEON-2 VHDL Model, http://www.gaisler.com/products/leon2 AMBA On-Chip Bus Standard, ARM Inc., http://www.arm.com/armtech/AMBA IHP – Innovations for High Performance microelectronics, http://www.ihp-ffo.de/ihpoffer/OFFindx.htm Synopsys Inc., http://www.synopsys.com Model Technology, http://www.model.com Cadence Design Systems, http://www.cadence.com

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.