A Self-Reconfigurable Multimedia Player on FPGA

June 2, 2017 | Autor: J. Bermudez Martinez | Categoría: Field-Programmable Gate Arrays, IT Security
Share Embed


Descripción

A Self-Reconfigurable Multimedia Player on FPGA Javier Castillo, Pablo Huerta, Cesar Pedraza, José Ignacio Martínez Universidad Rey Juan Carlos, Móstoles, Spain {javier.castillo, pablo.huerta, ca.pedraza, joseignacio.martinez}@urjc.es Abstract With the hype of multimedia devices, many different audio and video formats have appeared in recent years. The sales of portable multimedia players like Apple’s Ipod have also experiment and incredible growth. These devices are usually based on general purpose microprocessors and, when a new audio or video format appears, a software upgrade is needed or, in the worst case, the new format will not be supported by the player. This work presents a novel implementation where an FPGA based multimedia player makes use of the selfreconfiguration capabilities of modern FPGA families in order to support new multimedia formats. When the player needs to play a song with a non-supported format it securely downloads the required hardware from Internet and reprograms the FPGA with the new codec.

1. Introduction Recent years have seen the enormous increase of the online multimedia stores and portable players as well as the number and type of the multimedia formats, both proprietary and open. Today, a consumer who buys a portable multimedia player should be aware that the device is already attached to some to the formats and that should convert its multimedia files to these formats in order to be played [1]. Recent years has also seen the growth of the FPGA capabilities, making them a viable alternative for the implementation of complete System-on-Chip (SoC) solution. But FPGAs not only grew in size but also in features, being one of these new and powerful features the ability of partial and self reconfiguration of the FPGA whilst the rest of the device is still working. This ability has opened many research areas [2] [3] and applications [4]. This paper proposes a novel implementation of a self-reconfigurable multimedia player on FPGA. The FPGA implements an open source Soft-Core Processor (SCP) and runs uClinux as operating system. Every time a multimedia file (video or audio) is going to be

1-4244-0690-0/06/$20.00 ©2006 IEEE.

played, the system searches the local codec database to check for the existence of the appropriate hardware codec. If it doesn’t exist locally, the system searches the net in order to download the codec from a remote database into its local codec database, and then reconfigure itself to install or implement the new codec. This is very useful not only for supporting new multimedia formats that require firmware or hardware updates, but also when the computational requirements exceeds the computational power of the multimedia player processor. In section 2 the systems operation is described. Section 3 presents the system architecture. Section 4 details the system implementation and the main components involved. Section 5 shows the experimental results for the implementation. Finally the conclusions and future work are remarked.

2. Multimedia player operation The multimedia player operation is presented in figure 1. When the FPGA powers up the system is programmed with the initial bitstream that implements the SCP and all the peripherals needed for a correct operation. Then, the uClinux operating system is executed and presents all the multimedia files previously downloaded into the player memory. When the user wants to play a file, the system looks into the local codec database for the appropriate codec to play that type of file. If the codec is already there and implemented in the reconfigurable area the system plays the multimedia file. If the codec is in the local codec database but is not implemented, the SCP should reconfigure the reconfigurable area of the FPGA with the codec. The most complex situation arises when the user wants to play a new multimedia file which requires a codec not supported by the multimedia player. In this situation the system connects itself to a remote database and requests the codec that, if available, would be downloaded in a secure way and implemented in the FPGA. This way the player would be able to play the new multimedia format without upgrading the player firmware or even play a file that

the system wouldn’t have been able to play without that additional hardware.

resources of their individual areas are not big enough to implement the cryptographic or codec algorithms. Codec Area

FPGA Configuration

Security Area

Fixed Area

Bus Macro

OR1200 IP Database MULTIMEDIA CODEC

uClinux startup

AES + MD5 Communication Controller

ICAP Controller

User wants to play a file

Figure 2. Global system architecture YES

YES

Is the codec already programmed in the FPGA?

Is the multimedia codec available?

NO

NO

Secure download the codec partial bitstream from Internet

Self-Reconfigure the FPGA with the codec

The OpenRISC processor is connected to the net through an Ethernet link. To achieve the platform connection to the Internet it was necessary to improve the uClinux OS. It was also necessary to develop a uClinux driver for the ICAP controller to be able to easily access from the uClinux applications to the selfreconfiguration capabilities of the Virtex-II FPGA.

4. Architecture Implementation Play the file

Figure 1. Multimedia player operation

3. System Architecture The system architecture, shown in figure 2, has been implemented in a Virtex-II device, being the system SCP an OpenRISC 1200[5] microprocessor freely available at [6], running uClinux - a Linux release targeted for non-MMU processors. Although the OpenRISC processor has an MMU and there is a Linux Kernel port for it, the chosen OS was uClinux due to that its low memory requirements were very appropriate for the limited 4Mb of SRAM of the target platform. The system is divided in three different parts called slots. The right side slot, called SCP area, contains the Soft-Core Processor that controls the whole system operation including the reconfiguration of the other two parts. The second slot called Security area is reprogrammed with the Cryptographic algorithms needed to securely download the multimedia files and the partial bitstreams needed to play those files. The left side slot, called Codec area, is reserved for the required multimedia codec. The security slot and the codec slot can be combined together whenever the

The system architecture was implemented in an RC203 Expert board from Celoxica. This board features a Xilinx Virtex-II FPGA with 3,000,000 equivalent gates and is very appropriate for partial reconfiguration due to its pin arrangements. The implementation began with the decision of which multimedia codecs were going to be supported. For this implementation only audio codecs were selected to be implemented on board. The system starts with no codecs and it just can play WAV (Waveform Audio Format) files. The user can play these files or download new files in MP3 (MPEG-1 Audio Layer 3) format. When a MP3 file is going to be played for the first time, the system connects to the remote IP database and securely downloads an MP3 decoder partial bitstream, reprograms the FPGA with it and plays the song. The pin arrangements of the prototyping board and the ICAP port location restrict the potential locations of the slots in the FPGA floorplan. The ICAP port, the Ethernet Chip and the RS232 pins are on the right side of the FPGA, therefore the SCP must be implemented in that area. The board has two 4MB SRAM memory banks, one on the right side used as the SCP main memory and the other on the left side used by the MP3 codec. The RC203 board has also an AC97 compatible audio controller to play music connected to the left side of the FPGA. With these restrictions, the slot for cryptographic applications has to compulsory be in the

area between Codec an SPC. This slot is used when a partial bitstream is going to be downloaded from Internet in a secure way. The bus macro provided by Xilinx allows connecting only two adjacent modules, therefore a new bus macro based on TBUFs was developed in order to connect the three slots. This bus macro connects the Wishbone Bus signals used by the SCP to the slots peripherals which are Wishbone compatibles. 4.1. SCP Area As commented in the previous section the SCP area is fixed on the right side of the FPGA floorplan. It is made up of the following elements. The OpenRISC 1200 Soft-Core Processor, an RS232 controller, an SMC91111 LAN Chip Interface, a ZBT RAM memory controller, a boot ROM and a module to map the ICAP inside the OR1200 memory. The OpenRISC1200 is a soft-core microprocessor freely distributed under an LGPL license from the OpenCores website. The OR1200 is a 32-bit scalar RISC with Harvard architecture with a 5-stage integer pipeline. One of its main characteristics is its configurability. Using a configuration file anyone can add or remove more than ten optional units like data and instruction caches, memory management unit (MMU), power management unit, and many others. Many operating systems have been ported to OpenRisc Architecture: eCos, uClinux, Linux, RTEMS, microC/OS-II. In this implementation uClinux was selected as the software platform for the development. The uClinux distribution for the OpenRISC 1200 processor includes drivers for the peripherals available from the OpenCores website like the 16550 compatible UART. The other devices, the SMC91111 LAN Chip and the ICAP Controller were not included in the uClinux distribution, therefore one compulsory task was to develop the code to be able to use them both. The SMC91111 LAN driver was adapted from the code included in the uClinux distribution for the Microblaze processor. The ICAP driver is a simple character driver that makes possible to access the internal configuration port using standard Linux system calls like open() or write(). The uClinux image has to be downloaded every time the system boots because of the absence of onboard flash in the RC203 Expert prototyping board. This task can be performed in two different ways: using a hierarchy of boot-loaders or through the JTAG port. The first way is implemented using a block ram configured as a ROM in the design. This block ram contains an small boot-loader that helps to download a

program to the RAM through the RS232 port. Downloading the uClinux image – around 1MB in size – via an RS232 port would be too slow to be appropriate, therefore the first boot-loader downloads a more sophisticated boot-loader that helps to download the uClinux image from a TFTP server, being this second boot-loader the official OpenRISC release improved by the authors to support the SMC91111C LAN Chip. The second way merely consists of downloading a program to the prototyping board via the OR1200 JTAG port connected to the host computer through a GDB server called jp2. Then, using some GDB features the uClinux image can be downloaded and booted.

4.2. Security Area When the system needs to download a multimedia codec from the Internet to support a new format, this download has to be done in a secure way. The partial bitstream is a valuable information that has to be kept secret from competitors or malicious agents. In previous works [7] the authors have proposed methods to securely download a bitstream from Internet and use the self-reconfiguration capabilities of some FPGAs families to reprogram the device. The proposed system architecture for the multimedia player implements these ideas to ensure the confidentiality and integrity of the downloaded bitstream. The system architecture reserves area for one slot for the cryptographic algorithms needed to secure the communication. When the system connects to the remote IP database it challenges the database to be authenticate using the RSA asymmetric key algorithm. Once the system is authenticated, the database server uses the same method to be authenticated by the system. When both, the system and the server are mutually authenticated, they negotiate and agree the symmetric key cryptographic algorithm which is going to be used during the transmission and generates a session key using a random number generator. This key is sent to the other side using the RSA public key algorithm. Then, the system self-reconfigures the security slot with the symmetric key algorithm and the hash algorithm agreed before. In this implementation, the system can use a DES algorithm, an AES algorithm of 128 bits key length or an AES algorithm of 192 bits length. The hash algorithm used is MD5. When the slot is configured the partial bitstream is sent to the system ciphered with the session key. When the data is received the hash is recalculated and compared with the generated in the server side to ensure the integrity of the partial bitstream.

block diagram of the MP3 decoder is shown in figure 4.

4.3. Codec Area The left side of the FPGA is reserved for the multimedia codecs but it can be used for other purposes if needed. In this implementation, the codec area is made up of the modules shown in figure 3. The codec area is connected to the SCP using a custom bus macro described later. This bus macro connects the signals from the OR1200 Wishbone bus to the elements in the Codec area, mapping then into the processor memory. When the FPGA is configured with the initial bitstream the configured audio codec is just a bridge between the audio samples read by the RAM interface and the AC97 interface. This interface creates the frames that are applied to the CS4202 AC97 Audio Codec. If a WAV file is going to be played, the OR1200 load the samples inside the ZBT RAM of the Codec area through the Wishbone Interface, and then orders the system to start playing the song samples.

ZBT RAM 4Mb

ZBT RAM Interface

Audio Codec

Wishbone Interface

Wishbone Bus

Bit Reservoir Memory Scale Factors

SI_DATA SI_VALID SI_REQ

Synchronizer

Huffman Decoder

Requantizer

Reorder

Main Memory

IMDCT

FilterBank

I2S

DONE START

Alias Reduction

Controller

SO_DATA SO_CLK SO_SEL

Figure 4. MP3 decoder architecture The decoder is made up of a synchronizer that reads the frames from the serial input and puts them into a memory. From this memory the data goes through the modules that make the decoder. This data flow is controlled by a state machine. The Huffman decoder decodes the scale factors and the 576 frequency lines. This scale factors are used by the requantizer module to reconstruct the original frequency lines. The data is then reordered to sort the frequency lines first by subband and then by frequency. The antialias module reduces the alias effects introduced by the use of a nonideal filter. The Inverse Modified Discrete Cosine Transform, IMDCT, and the filter bank produces time samples from the frequency lines. Finally, the output samples are sent away via an I2S interface.

AC97 Interface

4.5. Custom Bus Macro AC97 Audio Codec

Figure 3. Audio Codec area If an MP3 file is going to be played, the SCP follows the operation flow described in section 1 and implements the MP3 codec through the ICAP port using the uClinux ICAP driver. Then it loads the MP3 file inside the RAM memory, just like in the WAV file case, decompresses the MP3 file and applies the samples to the AC97 audio codec.

4.4. MP3 Hardware Decoder The MP3 [8] hardware decoder reads an MP3 file and sends the samples through an I2S interface. The

The system architecture is divided in three slots, one for the SCP and some peripherals and the other two, Codec and Security, connected to the SCP through the Wishbone bus. This bus must cross the slots’ boundaries using a bus macro to keep the connections fixed in the same place whatever the modules implemented in the slots. The bus macro provided by Xilinx only allows to connect two adjacent modules. To solve this problem a custom bus macro was developed to connect the SCP area with the two reconfigurable slots, as shown in figure 5.

Codec Area

Security Area

Fixed Area

L(0)

L(1)

C(0)

C(1)

R(0)

R(1)

L(2)

L(3)

C(2)

C(3)

R(2)

R(3)

Figure 5. Custom bus macro architecture The architecture is based on TBUFs as in Xilinx’s bus macro. Even though some works [4] proposes bus macros based on LUTs instead of TBUFs, TBUFs were chosen because don’t consume logic resources and the data bus doesn’t need to be replicated for each slot because the tri-state buffers can be shared. Developing this bus macro can be done basically in two different ways: using the undocumented XDL [9] language or using VHDL [10] to instantiate the TBUF elements, inputs and outputs, and manually routing all the connections with the fpga editor tool. The bus macro for this application followed the second approach and was designed generic in order to allow its reusability in other applications. It has four inputs in each slot to take advantage of the structure of the tri-state buffers of the Virtex-II floor-plan. The Wishbone bus is made up of 42 signals: 32 data bits, 6 address bits and 4 control signals. To let this set of signals get through the slots a total of 11 bus macros has to be instantiated in the top file of the system.

4.6. IP Database The IP Database is a program written in Java running in the server side that implements the protocol defined before. The IP Database receives the requests from the remote reconfigurable system, carries the authentication process, ciphers the data with the session key and sends the bitstream to the remote reconfigurable system.

number of slice columns needed for each slot. The results are shown in Table 1, Table 2 and Table 3: Element OR1200 ICAP controller Ethernet Controller RS232 RAM Controller Total Table 1.

The system was implemented in an RC203 Expert board from Celoxica with a Virtex-II FPGA of 3,000,000 equivalent gates. This device has a total of 28,672 Look-Up Tables (LUT). Using TBUF bus macros makes compulsory to reserve an area of four slices in width for each slot. This restriction is not compulsory if the bus macros were designed with LUTs. It is necessary to know the area consumed by each module in order to calculate the

SCP area

The total number is not exactly the addition of all the elements because the CAD tools optimizes some logic. Element AES 128 bits AES 192 bits DES MD5 RNG Table 2.

LUTs 1617 2049 651 1691 113 Security area

The worst case for the Security area happens when the AES of 192 bit key length, the MD5 and the random number generator have to be implemented at the same time. This implies a total of 3853 LUTs, which is about a 13% of the FPGA area. Element MP3 AC97 Interface RAM Interface Wishbone Interface Table 3.

5. Results

LUTs 5423 14 66 761 42 5428 (19%)

LUTs 4673 684 53 36

Codec area

The worst case for the codec area happens when all the elements are implemented, consuming the 20% of the FPGA area. It is known [11] that each column of the XC2V3000 device has 64 CLB and that each CLB is made up of 4 slices in a 2 column arrangement, having each slice 2 LUTs. Therefore, each column has a total of 64*4*2=512 LUTs. As can be seen in Table 1, the SCP area consumes at least 11 CLB columns. To fit it into a four slices width, an area of 12 CLBs columns must be reserved.

The Security area consumes 8 CLBs columns and, finally, the Codec area needs at least 11 columns, which means 12 in terms of multiple of four number of slice columns. Taking this data into account the floor-plan of the FPGA is shown in Table 4: Area SCP Security Codec Table 4.

Columns/Total 16/56 12/56 28/56

implemented, validated and results have been presented. In future work the system will support more audio codec like Ogg Vorbis and will also support video algorithms to play video files. Another interesting idea is to connect a camera and give support for on-demand image compression algorithms like for example JPEG. The final system will be a complete multimedia player, like the ones currently in the market, but easily upgradeable and adaptable to the user requirements through the self-reconfigurable capabilities of the FPGA.

Slots area distribution

7. References The area assigned to the codec is the biggest because presumably there will be a need for complex codecs in the near future. The partial bitstream size of the MP3 decoder is 219 Kbytes. The reconfiguration time is 4,48ms in the best case assuming that the partial bitstream doesn’t have to be downloaded and deciphered and that the ICAP port works at the highest clock frequency possible - 50 MHz. But the system doesn’t work at 50 MHz and the SCP doesn’t deliver a byte each cycle to the ICAP port. The SCP has to read the data from the memory, and write it to the ICAP using the uClinux driver which introduces a significant overhead. In the real implementation, the measurements of the system working at 25 MHz, reprogramming the Security slot to decipher the bitstream, and accessing to the ICAP through the uClinux driver gives a reconfiguration time of 2.3 seconds. For some applications this time could be unacceptable but a multimedia player user can wait 2 seconds for the player to be reconfigured before listening to the song. The player is directly connected to the IP Database using a cross cable to minimize the data transmission overhead. Measurements taken of the implementation are around 2 seconds, but this transmission time could vary depending on the server’s work-load. This values were obtained using the uClinux time function.

6. Conclusions and future work This work presents a novel implementation for a multimedia player based on the new FPGA capabilities of reconfiguring a part of the device while the rest is working. The system architecture was implemented on a Celoxica’s RC203 Expert prototyping board. This implementation plays wav files and, whenever required by the user, securely downloads and reprograms an MP3 audio codec. The system was experimentally

[1]

S. Kaptain, “My Songs, My Format”, New York Times, Oct 6. 2005.

[2]

B. Blodget, P. James-Roxby, B. Keller, S. McMillan and P. Sundararajan, “Self-Rconfiguring Platform,” , in Proc. Of the 13th International Workshop on FieldProgrammable Logic and Applications,LNCS 2778, Sept. 2003, pp. 565–574.

[3]

H. Walder and M. Platzner, “Reconfigurable Hardware Operating Systems: From Design Concepts to Realizations,” , in Proc. Of the 3th International Conference on Engineering of Reconfigurable Systems and Architectures, June 2003, pp. 284–287.

[4]

I. Gonzalez, F.J. Gomez, S. Lopez, “Hardware Accelerated SSH on Self-Reconfigurable Systems”, in Proc.Of the IEEE 2005 Conference on FieldProgrammable Technolog, Dec 2005

[5]

OpenCores Web, http://www.opencores.org

[6]

D. Lampret, “OpenRISC 1000 Architecture Manual”, Jan. 2003.

[7]

Reference from authors

[8]

MP3 Decoder project, http://www.es.lth.se/home/tlt/dicp/2002/index.html.

[9]

J. Thorvinger, “Dynamic Partial Reconfiguration of an FPGA for Computational Hardware Support”, Lund Institute of Technology, June 2004

[10] A. Astarloa, U. Bidarte, J. Jimenez, J. Arias and I.

Kortabarria, “Bus-Macro compatible Wishbone para reconfiguración parcial Inter-Task”, in Proc.Of the V Jornadas de Computación Reconfigurable y Aplicaciones, Sept. 2005, pp. 17-23 (in Spanish) [11] Xilinx, “Virtex-II Platform FPGAs: Complete Datasheet

v3.4,”, March 2005

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.