A customizable FPGA IP core implementation of a general purpose Genetic Algorithm engine

July 27, 2017 | Autor: Pradeep Fernando | Categoría: Information Systems, Genetic Algorithms, Field-Programmable Gate Arrays, Genetic Algorithm, System Architecture, Hardware, Global Optimization, Software Implementation, Search Engine, Population Size, Field Programmable Gate Array, Real Time, Robustness, Real Time Application, Evolvable Hardware, Fitness Function, Experimental Tests, Hardware Implementation of Algorithms, Random Number Generation, Evolutionary, RANDOM NUMBER GENERATOR, Electrical And Electronic Engineering, Mutation Rate, Place and Route, Hardware, Global Optimization, Software Implementation, Search Engine, Population Size, Field Programmable Gate Array, Real Time, Robustness, Real Time Application, Evolvable Hardware, Fitness Function, Experimental Tests, Hardware Implementation of Algorithms, Random Number Generation, Evolutionary, RANDOM NUMBER GENERATOR, Electrical And Electronic Engineering, Mutation Rate, Place and Route

Share Embed

Laporkan tautan ini

Descripción

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

133

Customizable FPGA IP Core Implementation of a General-Purpose Genetic Algorithm Engine Pradeep R. Fernando, Student Member, IEEE, Srinivas Katkoori, Senior Member, IEEE, Didier Keymeulen, Ricardo Zebulum, and Adrian Stoica, Senior Member, IEEE

Abstract— Hardware implementation of genetic algorithms (GAs) is gaining importance because of their proven effectiveness as optimization engines for real-time applications (e.g., evolvable hardware). Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid predefined system architecture, and lack of support for multiple fitness functions. In this paper, we report the design of an IP core that implements a general-purpose GA engine that addresses these problems. Specifically, the proposed GA IP core can be customized in terms of the population size, number of generations, crossover and mutation rates, random number generator seed, and the fitness function. It has been successfully synthesized and verified on a Xilinx Virtex II Pro Field programmable gate arrays device (xc2vp30-7ff896) with only 13% logic slice utilization, 1% block memory utilization for GA memory, and a clock speed of 50 MHz. The GA core has been used as a search engine for realtime adaptive healing but can be tailored to any given application by interfacing with the appropriate application-specific fitness evaluation module as well as the required storage memory and by programming the values of the desired GA parameters. The core is soft in nature i.e., a gate-level netlist is provided which can be readily integrated with the user’s system. The performance of the GA core was tested using standard optimization test functions. In the hardware experiments, the proposed core either found the globally optimum solution or found a solution that was within 3.7% of the value of the globally optimal solution. The experimental test setup including the GA core achieved a speedup of around 5.16× over an analogous software implementation. Index Terms— Evolvable hardware, field programmable gate arrays, genetic algorithm, IP core.

I. I NTRODUCTION

A

GENETIC algorithm (GA) [1], [2] is a stochastic optimization algorithm modeled on the theory of natural evolution. Since originally proposed by Holland [1], GAs have been successfully used for handling a wide variety of problems including NP-complete problems such as the

Manuscript received August 8, 2008; revised January 4, 2009 and March 26, 2009; accepted April 27, 2009. First version published October 30, 2009; current version published January 29, 2010. This work was partially supported by NASA/JPL, funded Project No. 21081010, entitled Self-Reconfigurable Electronics for Extreme Environments (SRE-EE), performed during 2006– 2007. S. Katkoori’s research sponsors include the National Science Foundation, Honeywell, NASA JPL, and the Florida I4 High Tech Corridor Initiative. P. R. Fernando and S. Katkoori are with the Department of Computer Science and Engineering, University of South Florida, Tampa, FL 3320 USA (e-mail: [email protected]; [email protected]). D. Keymeulen, R. Zebulum, and A. Stoica are with Jet Propulsion Laboratory, Pasadena, CA 91109 USA (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TEVC.2009.2025032

traveling salesman problem [3], real-time problems such as reconfiguration of evolvable hardware (EHW) [4], and other optimization problems that have complex constraints [2]. GAs have been shown to be a robust search mechanism that can be used to explore multiple regions of the problem’s solution space concurrently at the expense of significant runtimes. A natural solution to speed up the runtime of a GA is to implement the GA in hardware. Another advantage of a hardware implementation of a GA is the elimination of the need for complex time-and resourceconsuming communication protocols needed by an equivalent software implementation to interface with the main application. This is particularly advantageous to real-time applications such as reconfiguration of EHW. Moreover, with the rapidly increasing field programmable gate array (FPGA) logic density, efficiently designed hardware GAs can be implemented on a single FPGA in addition to the target application, resulting in a less bulky apparatus. In this paper, a robust parameterized GA IP core is proposed that is readily synthesizable using standard FPGA design tools at both the register transfer (RT) level and gate-level. The purpose of this programmable IP core is to be integrated with any application that requires a search engine and can also be implemented in a system-on-a-chip configuration. The architecture of the IP core makes it extremely flexible and easy to integrate with target applications, allowing seamless integration of user-defined blocks such as fitness function modules. The core has been implemented on a Xilinx Virtex2Pro FPGA device (xc2vp30-7ff896). It has a very small footprint (only 13% slice utilization) and runs at a high speed (50 MHz). This IP core is highly suitable for EHW [24] applications. It was one of the building blocks of the self-reconfigurable analog array architecture and was used to compensate extreme temperature effects on VLSI electronics [34], [35]. More specifically, the novel contributions of this paper are that the proposed GA IP core: 1) is available at different design levels, RT-level and gatelevel, to provide the end-user flexibility in choosing the level at which to include the GA core; 2) can be used as a hard IP that can switch between eight different user-defined fitness functions without the need for re-synthesis of the entire design; 3) allows programming of the initial seed for the random number generator (RNG), enabling different convergence characteristics to be obtained for the same GA parameter settings;

1089-778X/$26.00 © 2010 IEEE

134

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

4) provides PRESET modes, allowing the user to experiment readily with a varied set of predefined GA parameter settings; and 5) can be directly implemented as a digital ASIC using standard ASIC design tools with simple scan chain testability built into the core and with basic fault tolerance in the form of PRESET modes to bypass parameter initialization failure in an ASIC implementation. In addition, the proposed GA IP core has several highly desirable features as given below. 1) Programmability—values of important GA parameters including population size, number of generations, crossover rate, and mutation rate can be programmed to accommodate the requirements of a wide variety of applications. 2) High Probability of Convergence to Optimal Solution— an elitist GA model is used which has been shown to have the ability to converge to the global optimum [17]. 3) Easy Interfacing—simple two-way handshaking protocols to interface with user defined fitness evaluation module and the target application that ensure ease of integration and ease of use. The rest of the paper is organized as follows. Section II gives a brief introduction to GAs, reviews the various architectures and hardware implementations of GAs proposed in the literature, and gives a brief introduction to EHW and its applications in space systems. Section III describes in detail the design methodology and the hardware implementation details of the GA core. Section IV reports the simulations and experiments conducted at various design levels to verify the functionality of the GA and demonstrate its effectiveness. Section V concludes with a brief summary of the features and advantages of the proposed GA IP core and discusses future directions such as ASIC development. II. BACKGROUND AND R ELATED W ORK A. GAs GA [1], [2] is a robust stochastic optimization technique that is based on the principle of survival of the fittest in nature. GAs are attractive because unlike simulated annealing that considers a single solution at a time, GAs maintain a population of solutions or individuals throughout the optimization process. GA begins its optimization cycle with an initial population of randomly generated individuals. Each individual is an encoding of a solution to the problem under consideration. Each individual is assigned a fitness measure that denotes the quality of the solution that it encodes. GAs evolve the population over generations with the use of operators such as selection, crossover, and mutation. In each generation, highly fit individuals from the current generation are selected as parents to produce new individuals for the next generation. Genetic information encoded in the parents’ representation is mixed to form one or more offspring using an operator called crossover. The offspring produced by the crossover operator replace individuals with the worst fitness in the current population. This ensures that the average fitness of the population increases over generations as the fitter individuals

will survive through the generations. To prevent premature convergence to the best individual in the initial population, an operator called mutation is applied that introduces random changes into the population. After a prespecified number of generations, the best individual found in the entire genetic optimization cycle is output as the solution to the problem. B. Previous Hardware Implementations of GAs There have been many hardware implementations of both general-purpose [5]–[10] and application-specific [9] GAs. This section will review previous FPGA implementations of general-purpose GAs. Table I summarizes the existing works on FPGA implementations of general-purpose GAs. Several application-specific hardware implementations of GAs [16] exist in literature. The application-specific GA implementations are tailored to the particular application in terms of chromosome encoding, crossover, and mutation operation. These implementations will not be discussed here as they cannot be reused with other applications even for prototyping purposes. The first FPGA implementation of a general-purpose GA engine was proposed by Scott et al. [5], who described a modular hardware implementation of a simple GA that used roulette wheel selection and one-point crossover and had a fixed population size of 16 and member width of 3-bits. The GA was broken into simpler modules and each module was described using behavioral Very High Speed Integrated Circuits Hardware Description Language (VHDL). The overall GA design was implemented on multiple Xilinx FPGAs on a BORG board. The main goal of that work was to illustrate the issues of hardware implementation of a general-purpose GA. Tommiska and Vuori [6] implemented a general-purpose GA system with round robin parent selection and one-point crossover and used a fixed population size of 32. The GA was implemented on Altera FPGAs mounted on Peripheral Components Interconnect (PCI) cards and connected to the host computer’s CPU using high-performance PCI buses. Experimentation on various fitness functions involved rewriting the Altera Hardware Description Language (AHDL) code and reprogramming the FPGAs. Shackleford et al. [7] implemented a survival-based, steadystate GA in VHDL to achieve higher performance and tested it on set-covering and protein folding problems. The prototype GA machine used for the set-cover problem was designed using the Tsutsuji logic synthesis system and implemented on an Aptix AXB-MP3 field programmable circuit board populated with six FPGAs. Yoshida et al. [8] implemented a GA processor with a steady-state architecture that supports efficient pipelining and a simplified tournament selection. Tang and Yip [9] implemented a PCI-based hardware GA system using two Altera FPGAs mounted on a PCI board. The PCI-based GA system has multiple crossover and mutation operators implemented with programmable crossover and mutation thresholds. Tang and Yip also discussed different parallel implementations of the PCI-based GA system. In contrast to the simple GA, Aportewan et al. [10] implemented a compact GA in Verilog HDL, as it is more amenable

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

135

TABLE I R EVIEW OF E XISTING L ITERATURE ON FPGA I MPLEMENTATIONS OF G ENERAL -P URPOSE GA S (“N/A”: N OT A PPLICABLE , “—” : UNKNOWN) Pop. size Fixed (16) Fixed (32) Fixed 64 or 128

No. gens

Selection

Crossover/ mutation rates

Crossover operators

RNG/ seed

Preset modes

Initialize mode

FPGA platform

Fixed

Roulette

Fixed

1-Point

CA/fixed

None

None

BORG board

Fixed

1-Point

None

None

Altera

Fixed

1-Point

LSHR/ fixed CA/fixed

None

None

Aptix

—

1-Point

CA/fixed

None

None

SFL (HDL)

Roulette

Prog.

1-Point, 4-Point, Uniform

Fixed

None

—

PCI card based system

N/A

N/A

N/A

N/A

CA/fixed

None

None

Prog. (32-bit)

Roulette

Prog. (4-bit)

1-point

CA/prog.

3 Diff. modes

Separate init. mode (two-way handshake)

Work

Elitist

[5]

N

[6]

N

[7]

N

[8]

N

[9]

—

Prog.

Prog.

[10]

N/A

Fixed (256)

Proposed

Y

Prog. (8-bit)

Fixed Fixed Fixed

Round robin Survival Simplified tourney

toward a hardware implementation. But compact GAs suffer from a severe limitation that their convergence to the optimal solution is guaranteed only for the class of applications that possess tightly coded nonoverlapping building blocks [10]. Hardware acceleration techniques such as pipelining and parallel architectures have also been applied to the design of hardware GAs [11]–[13]. The motivations of all previous FPGA implementations fall under one or more of the following categories. 1) Basic Hardware Acceleration—to obtain speedup over the corresponding software implementation [5]–[8]; or 2) Novel GA Templates—to propose a novel GA template or architecture [10] that is more suited for a hardware implementation; or 3) Advanced Hardware Acceleration Techniques—to accelerate a GA using pipelined, and/or parallel implementations of GA architectures [11]–[13]. The primary goal of the above efforts is to demonstrate the speedup achievable by a hardware GA implementation. As a result, the prototypes developed in the above implementations suffer from one or more of the following limitations. 1) Lack of Programmability for GA Parameters—Some or all of the GA parameters are fixed. To the best of the authors’ knowledge, the only FPGA implementation that has customizable parameters is the GA machine proposed by Tang and Yip [9]. 2) Lack of Scalability in Terms of Number of Fitness Functions Supported—A general-purpose GA engine can be used to address many issues that arise within the same application. This requires the ability to switch between different fitness functions. All the previous hardware GA implementations support only a single fitness function. The entire design has to be re-synthesized after including the new fitness function. A lot of time and design effort has to be spent again to make sure that the new design meets all the design specifications including timing. All the previous hardware GA implementations suffer from this limitation.

Xilinx Virtex1000 Xilinx Virtex2Pro FPGA

3) Predefined System Architecture/Organization— The architectures of many previous hardware GA implementations are defined on the basis of a specific development environment, imposing serious restrictions on the application using it. For example, in [9], the GA machine can only be implemented on a PCI card that contains two FPGAs with a local PCI bus providing the communication between the different modules. The proposed GA IP core overcomes all the above limitations using the following features. 1) Independent Parameter Initialization Phase—The proposed GA core has a separate initialization phase that enables the user to program the desired GA parameters including population size, number of generations, crossover threshold, mutation threshold, and the initial seed used by the RNG online. 2) Compact IP Core with Simple Communication Interfaces—The proposed GA core also does not impose any hardware requirements or system architecture on the user. It can be instantiated within the main application as a drop-in IP module and synthesized along with the application. 3) Support for Multiple Fitness Functions—The proposed core supports up to eight different fitness functions. Some of these fitness functions can be synthesized along with the GA engine and implemented on the same FPGA device. The proposed core also has additional I/O ports that allow the user to add more fitness functions that have been implemented on a second FPGA device or some other external device. The user can then select between the existing internal fitness functions and external fitness functions. This eliminates the need to resynthesize the entire design just to accommodate a new fitness function. Besides the FPGA implementations, ASIC implementations of GAs [14], [15] have also been proposed to improve the performance of the GA using hardware acceleration. Wakabayashi et al. [14] proposed a GA accelerator (GAA)

136

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

chip that implements an elitist generational GA with onthe-fly adaptive selection between the two-point and uniform crossover operators. The GAA chip was fabricated using 0.5 µm standard CMOS technology. Chen et al. [15] developed a GA chip using 0.18µm Taiwan Semiconductor Manufacturing Company cell library. They developed a software tool called Smart GA to tailor the GA module to the user specifications. Smart GA is a software application that accepts user values for the programmable GA parameters through a front-end Graphical User Interface and synthesizes a custom GA netlist using these fixed GA parameter values. A major disadvantage with such customization is that once an ASIC is obtained from a custom netlist, the GA parameters cannot be changed. Any reprogramming of the GA parameters will involve resynthesizing the GA netlist from scratch using the Smart GA application and repeating the physical design process to obtain the ASIC. This is a very critical problem, as, in most cases, the user cannot predict the best GA parameter settings for his/her application. If the current user settings do not offer the best performance, the user then has to resynthesize the entire GA netlist with a new combination of the GA parameter settings and re-design the entire ASIC. The GA IP core proposed in this paper eliminates the need for such resynthesis. C. Pseudo-Random Number Generation and GA Performance A GA needs a RNG for many purposes, including generating the initial population, to determine if crossover and mutation should be performed, and also by the crossover and mutation operators to determine the crossover and mutation cut points. “True” random numbers can be generated using specialized hardware that extracts the random numbers from a nondeterministic source such as clock jitter in digital circuits [18]. Pseudo-random number generators (PRNG) use a deterministic algorithm to generate the random numbers. Hence, the sequence of random numbers can be predicted if the initial seed is known. The choice of the RNGs depends upon the application at hand. Applications that require high security will use true RNGs. Applications that require a quick response and cannot afford the high area overhead of true RNGs will use PRNGs. The effect of the quality of the RNGs on the performance of GAs has been previously studied [19]–[21]. A high-quality RNG is generally characterized by a long period (before repetition of the random numbers), uniformly distributed random numbers, absence of correlations between consecutive numbers, and structural properties such as organization of the numbers in lattices. Meysenburg [19] and Meysenburg and Foster [20] reported little or no improvement in the performance of GAs using good PRNGs over those using poor PRNGs. But later, Cantu-Paz [21] found significant improvements on the performance of a simple binary GA when using a good PRNG. Cantu-Paz found that the quality of the random numbers used to generate the initial population had a major impact on the performance of the GA, while it did not affect the performance of the GA significantly for all the other operations such as crossover and mutation.

The seed used by a RNG influences the sequence of numbers generated. Although the RNG characteristics such as cycle length and uniform distribution will remain the same with a different seed, the sequence of random numbers generated will differ. A poorly chosen seed can lead to poor quality of results even for random numbers generated by a good RNG. Guidelines to choosing a good seed can be found in [22]. The performance of algorithms using random numbers has been shown to vary with the RNG seed. Elsner [23] studied the influence of RNG seeds on the performance of four different graph partitioning algorithms. In a particular instance, Elsner observed that the performance worsened by five times by changing the RNG seed. Theoretically, a good RNG will produce random numbers that are uniformly distributed for a large enough sample size. But due to time constraints (of real-time applications), the distribution of the random numbers generated might be nonuniform. This might lead to poor results for resource-constrained hardware GA. It has been observed by Meysenburg and Foster [20] and Cantu-Paz [21] that poor RNGs can sometimes outperform good RNGs for particular seeds. Thus, a user will have to experimentally determine the RNG seed value for his/her particular application. This is particularly necessary for hardware implementations where simple RNG implementations are preferred due to tight resource and response time requirements. The proposed IP core is the only hardware GA IP core that allows the user to program the RNG seed in addition to providing three in-built seeds to select from. D. Basics of EHW Space applications are increasingly employing EHW [24] to adapt on-board electronics according to changing environmental conditions. EHW is a class of hardware that adapts its functionality/characteristics using EAs. There are two major divisions of EHW, namely, extrinsic EHW and intrinsic EHW. Extrinsic EHW refers to hardware that is evolved using software simulations (and behavioral models of the hardware). The best configuration found in the simulations is then downloaded on to the actual hardware. Intrinsic EHW refers to the adaptation and reconfiguration of previously configured hardware because of changes observed or required in the actual hardware. The growing number of remote space applications has increased the demand for intelligent and adaptive space systems [32]. Thus, intrinsic EHW is becoming popular in space applications and has been targeted for different platforms including ASICs [25]–[28], and specialized platforms [29]. Due to the flexibility and scalability requirements of space applications, most of the existing work on intrinsic EHW have been implemented on FPGAs [37], [38]. Intrinsic EHW can be classified into four different classes based on the location of the reconfigurable hardware and the evolutionary algorithm as proposed by Lambert et al. [30]. 1) PC-Based Intrinsic EHW —In this class, the reconfigurable hardware application is located on an FPGA and the monitoring system is located in the PC. The reconfiguration of the EHW is done from the PC. This system

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

Behavioral description (VHDL)

RTL component library

AUDI

Datapath (Structural VHDL)

Controller (Beh. VHDL) VHDL 2 kiss translation

VHDL 2 verilog translation

Flattening to Gate-level

KISS format Logic synthesis (SIS)

Std Cell Library

Bdnet format Controller (Gate-level Verilog)

Datapath (Gate-level Verilog)

Logic synthesis (Controller)

allows the user to select between internal and external fitness functions. Hence, even if the existing system is implemented on an ASIC, new fitness functions can be added externally. It is to be noted that the proposed IP core can realize all classes of intrinsic EHW systems (except PC-based) both on an FPGA and an ASIC without losing out on the flexibility of supporting multiple fitness functions.

Bdnet 2 Verilog translation

III. P ROPOSED GA FPGA IP C ORE

Merge netlists

Place and route using cadence tools

137

Std Cell Layout Library

Digital ASIC layout

This section describes in detail the design methodology as well as implementation and interfacing details of the proposed core.

Spice netlist extraction

A. Design Methodology HSPICE Simulation

Fig. 1.

Design flow used by AUDI HLS tool.

suffers from slow speeds because of communications with the PC which are typically very slow. 2) Complete Intrinsic EHW —In this class of EHWs, both the reconfigurable hardware and the EA are situated on the same FPGA chip. This system will yield the best performance as the communication delays are due to intra-chip wires. But due to the finite logic resources available on a single FPGA chip, the complexity and number of fitness functions supported are limited. 3) Multichip Intrinsic EHW —The reconfigurable hardware and the EHW are located on different FPGA chips. The performance of this system is worse than the complete intrinsic EHW, as the communication delays are due to inter-chip wires. 4) Multiboard Intrinsic EHW —The reconfigurable hardware and the EHW are located in FPGA chips on different boards. The performance of this system is generally slower than the completely intrinsic and multichip intrinsic EHW classes, as the communication delays are due to inter-board wires. Although the multichip and multiboard intrinsic EHW implementations suffer from high communication delays, they are useful in applications where the fitness evaluation time dominates the communication time between the FPGA chips. Moreover, these implementations can take advantage of the dynamic reconfiguration features available in FPGAs to provide more flexibility (see [30]). Although the complete intrinsic EHW implementation (especially on an ASIC) yields the best performance and smallest form factor, it is not widely adopted as it suffers from low scalability. When building a complete intrinsic EHW on an ASIC, all the design details have to be fixed at a very early stage including the different fitness functions that need to be supported. These details cannot be changed later. The proposed core alleviates this problem by supporting the interfacing of fitness functions housed on other chips (or boards) to the existing system. It can thus realize a hybrid system, as shown in Fig. 5, combining the complete and multichip (or multiboard) intrinsic EHW implementations and

The proposed GA IP core was synthesized using a highlevel circuit synthesis design flow as shown in Fig. 1. Highlevel synthesis (HLS) is the process of automatically synthesizing an RT-level description of a system from its behavioral description. This process consists of extracting the dataflow graph (DFG) of the system from its behavioral description, scheduling all the operations in the DFG, allocating functional unit resources to handle the various types of DFG operations, binding each operation in the DFG to an allocated resource, and generating control signals to enable correct operation of the synthesized datapath. The RT-level description of the GA core was synthesized using the automatic design instantiation (AUDI) HLS tool [39]. The AUDI HLS tool takes as input a behavioral (VHDL) description of the system. The output of the AUDI circuit synthesis tool consists of the following. 1) Datapath—Structural VHDL description of the circuit’s datapath in terms of simple components. 2) Controller—Behavioral VHDL description of a finite state machine that provides the control signals to all the datapath components for proper operation. 3) Top-level Design—Structural VHDL description instantiating the datapath and controller with appropriate port mappings. The advantages of using a HLS tool such as AUDI are the following. 1) Easy Synthesis Using Different Tools—The synthesized circuit descriptions are structural descriptions that use simple components such as adders, multiplexers, etc. This ensures that these netlists will synthesize easily using tools from many vendors such as Altera and Xilinx. 2) Easy Addition of New Features to Existing Design—The entire circuit can be resynthesized within a few minutes if any changes are made to the behavioral description to include new features to the existing design. Although features such as pipelining cannot be automatically generated by the current version of the AUDI HLS tool, there exist HLS tools such as SEHWA [36] that can automatically generate a pipelined datapath from the behavioral description of a system. 3) Ease of ASIC Implementation—The AUDI HLS tool also contains circuit flattening scripts that can generate

138

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

Parent 1

Start 1 Initialize params?

1

Initialize programmable GA parameters

1

0

Parent 2

0

1

0

1

0

1

0

1

0

1

0

1

0

0

0

1

cutpoint

No Generate Initial Population(currPop) 1

1

1

1

genID = 0 Offspring 1 Fig. 3.

Parent selection

Offspring 2

Example illustrating single-point crossover.

scripts and the Berkeley SIS tool [31] as shown in Fig. 1. The gate-level Verilog model uses simple Boolean gates such as NAND , NOR , AND , OR , XOR, and SCAN _ REGISTER. The gatelevel Verilog model was also simulated using Cadence NCVerilog to verify the functionality and timing of the design. The design methodology adopted ensures that the RT-level and the gate-level netlists are completely synthesizable by standard synthesis tools such as the Xilinx ISE tool. Thus, a synthesizable GA FPGA IP core is available to the user at two levels of design abstraction, namely RT-level and gate-level.

Crossover

Mutation

Evaluate fitness of offspring

Store offspring in newPop

B. Behavioral Modeling and Implementation Details newPop full?

No

Yes currPop ↔ newPop

Yes

genID < nGENs? No

Output best solution

Stop Fig. 2.

High-level view of the GA optimization cycle.

a gate-level netlist from the RT-level netlist. This gatelevel netlist can be directly used by industry standard place-and-route tools to generate an ASIC as shown in Fig. 1. The proposed GA core implements the GA optimization cycle shown in Fig. 2. The behavior of the GA optimizer was modeled in VHDL and simulated to test its correctness. An RT-level VHDL model of the GA core was synthesized from the behavioral model using the AUDI HLS tool. The RT-level VHDL model was simulated thoroughly to test the correctness of the synthesized netlist. A gate-level Verilog model was then synthesized from the RT-level model using in-house flattening

In this section, the implementation of the various features of the proposed GA core in the behavioral model will be discussed in detail. 1) Initial Population: An initial population of randomly generated individuals is formed. A 16-bit cellular automatonbased PRNG, similar to the implementation in [5], is used to generate all the random numbers required. In each generation, a new population of candidate solutions is generated using crossover and mutation operators. Elitism is provided by copying the best individual in the current population into the next population. 2) Parent Selection: The parent individuals required for crossover are selected from the current population of individuals using the proportionate selection scheme [2]. In proportionate selection, a threshold fitness value is computed by scaling down the sum of the fitness values of all the individuals in the current population using a random number. A cumulative sum of the fitness values of the individuals in the new population is computed and compared to the threshold fitness value. The individual whose fitness causes the cumulative fitness sum to exceed the scaled fitness threshold is selected as the parent. This ensures that highly fit individuals have a selection probability that is proportional to their fitness. To speed up the computations, the sum of the fitness values of the new population is accumulated when the fitness of the offspring is computed. 3) Crossover: The GA core implements the single-point binary crossover technique [2], illustrated in Fig. 3, to combine parents from the current generation and produce offspring for the next generation. The GA core performs crossover only if the random number generated is lesser than the specified

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

139

TABLE II P ORT I NTERFACE OF THE P ROPOSED GA C ORE No.

Port

Input/output

Width in bits

Functionality

1

reset

I

1

System reset

2

sys_clock

I

1

System clock

3

ga_load

I

1

Load GA parameters

4

index

I

3

Index of GA parameter

5

value

I

16

init. value bus

6

data_valid

I

1

Initialization Handshake signal

7

data_ack

O

1

Initialization Handshake signal

8

fit_value

I

16

Fitness value

9

fit_request

O

1

Fitness request signal

10

fit_valid

I

1

Fitness value validity signal

11

candidate

O

16

Candidate solution bus

12

mem_address

O

8

GA Memory address

13

mem_data_out

O

32

Data to GA memory

14

mem_wr

O

1

GA Memory Write signal

15

mem_data_in

I

32

Data from GA memory

16

start_GA

I

1

GA Start signal

17

GA_done

I

1

GA completion signal

18

test

I

1

Scan chain Test enable

19

scanin

I

1

Scan chain input

20

scanout

O

1

Scan chain output

21

preset

I

2

Preset Mode Selector

22

rn

I

16

Random number

23

fitfunc_Select

I

3

Fitness module select signal

24

fit_value_ext

I

16

Fitness value from external FEM

25

fit_valid_ext

I

1

fit_valid signal from external FEM

crossover threshold. Since the 4-bit crossover threshold is userprogrammable, the user can control the crossover rate in the GA core. The single-point crossover is implemented by using a bit mask vector that generates the first portion of the offspring from the first parent and the other portion of the offspring from the second parent. A random number n is generated to denote the random cut point on the parents’ chromosomes. A mask is then generated with 1s from position 0 to n − 1 and 0s after n. This mask is then used to perform a logical AND operation with the first parent’s chromosome to obtain the first part of the offspring chromosome. The mask is then logically inverted, and the inverted mask is used to perform a logical AND operation with the second parent’s chromosome to obtain the second part of the offspring chromosome. The crossover operation produces two offspring as illustrated in Fig. 3. 4) Mutation: The mutation operator in the proposed GA core consists of inverting a single bit in the offspring obtained after crossover. The GA core generates a 4-bit random number and compares it with the programmed mutation threshold to decide if mutation should be performed. If the random number is smaller than the mutation threshold, a random bit mutation is performed. A randomly chosen mutation point dictates the appropriate bit mask to be used in an XOR operation with the candidate solution. This XOR operation essentially flips the bit at the mutation point.

5) Fitness Evaluation: The fitness of the resultant offspring is computed using a simple two-way handshaking communication between the GA core and the fitness evaluation module. The GA core can handle up to eight different fitness evaluation modules, and the user can select the required fitness evaluation module by providing a 3-bit fitness selection value. The candidate and its fitness are stored in the GA memory as part of the new population. The above cycle of parent selection, crossover, and mutation is repeated in every generation until the new population is completely filled. The GA optimization ends when the generation index is equal to the user-programmed number. Then, the GA core exits the optimization cycle and outputs the best individual found. 6) Programmable GA Parameters: The proposed GA core has a port interface as shown in Table II. The performance and runtime of a GA depend on the GA parameters, namely population size, number of generations, crossover rate, and mutation rate. Large values for population size and number of generations generally yield the best results at the expense of long runtimes. But if the target application is simple, a few generations and a small population size may suffice to find the best solution. An efficient GA core implementation must have the ability to cater to the needs of the individual applications by allowing the user to change these parameters. The crossover and mutation rates that produce the best results in the shortest amount of time also vary with the application. Hence, the

140

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

Programmable parameter

0

Number of generations [15:0]

1

Number of generations [31:16]

2

Population size

3

Crossover rate

4

Mutation rate

5

RNG seed

proposed GA core also has the capability to program both the crossover and mutation rates. The quality of the random numbers generated for the execution of the genetic operators also has an impact on the performance of the GA. The proposed core allows the user to program the initial seed of the RNG, which enables the user to obtain different sequences of random numbers using the same RNG module. All the programmable parameters of the GA core must be initialized before it can be used. Initialization of the GA core is done using a simple two-way handshake. The user first asserts the init_GA signal to put the GA core in the initialization mode. Then, all the programmable parameters can be initialized using a simple handshaking process. Each programmable parameter has an index associated with it as shown in Table III. The user places the value of the programmable parameter on the fit_value bus and the corresponding index value on the index bus and asserts the data_valid signal. The GA core reads the fit_value bus, decodes the index and stores the value in the appropriate register. It then asserts the data_ack signal and waits for data_valid to be de-asserted before de-asserting data_ack. 7) Interfacing Details of the GA Core: The overall GA optimizer consists of three modules, namely, the GA core, the GA memory, and the RNG. Additionally, the GA core communicates with a fitness evaluation module and the actual application using simple two-way handshaking operations. A typical system setup with the communication between all the modules is shown in Fig. 4. The GA memory module is a single-port memory module that stores both the individuals and their fitness values. To store an individual and its fitness, the GA core places them on the memory bus and asserts the memory write signal. To read an individual and its fitness, the GA core places the memory address on the address bus and reads the memory contents in the next clock cycle. The GA memory module is synthesized using the dedicated block RAM modules present in the Xilinx Virtex-II Pro device. The RNG module implements a cellular automaton-based RNG. It is to be noted that the operation of the GA core is independent of the RNG implementation. The initial seed of the RNG module can either be provided by the user or selected from one of three different preset initial seeds. The GA core reads the output register of the RNG module when it needs a random number. Based on the number of random bits needed, the GA selects the bits from predefined positions. The GA core uses a simple two-way handshaking protocol for its communication with the fitness evaluation module.

signals 8–11, 16–17, 23–25

signals 5, 22

proposed GA core

Application module

signals 8, 12–15

Initialization module

signals 1–7, 18–21

GA memory

Index

Fitness evaluation module

RNG module

TABLE III I NDEX VALUES OF THE GA C ORE ’ S P ROGRAMMABLE PARAMETERS

GA module

Fig. 4. Typical system showing the communication between the different modules and the GA core (signal numbers are with reference to Table II).

When the GA core requires the fitness of a candidate solution, it places the individual on the candidate bus and then asserts the fit_request signal. The FEM module of the target application should then read the candidate port and compute the fitness of the individual. The computed fitness is then placed on the fit_value port of the GA core and the fit_valid signal is asserted by the target application. On assertion of the fit_valid signal, the GA core reads the fitness value and de-asserts the fit_request signal. The simplicity of all the interfacing protocols is a major advantage to the user, as it reduces timing issues during implementation of the entire application. 8) Usage Details of the GA Core: The GA core starts its optimization cycle when it receives the start_GA pulse from the target application. If the programmable parameters of the GA have been initialized, it uses these values. Otherwise, the GA core can use one of three available preset modes. During its optimization cycle, the GA core requests fitness computations for candidate individuals from the fitness evaluation module. Once the GA core has computed the best candidate, it is placed on the candidate bus and the GA_done signal is asserted. C. Design Considerations for ASIC Implementation and Space Applications The proposed GA core is well suited for ASIC development as it is available as a Verilog gate-level netlist. A digital ASIC can be implemented from the proposed GA core using industry standard place-and-route tools, given a standard cell layout library, as shown in Fig. 1. 1) Preset Modes: The proposed GA core has three preset modes as shown in Table IV. The values for the programmable GA parameters in the preset modes have been set so that the GA core can be used for a varied set of applications without compromising on performance or runtime. The user can select any one of the three different preset modes based on the target application. When the 2-bit preset signal value is set to “00,” the GA core uses the user-programmed values for all the programmable parameters. The preset modes are useful for the following purposes. a) Limited fault tolerance in digital ASIC applications— When building a digital ASIC application using the proposed GA core, failure of the GA parameter initialization

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

TABLE IV P RESET M ODES AVAILABLE IN THE P ROPOSED GA C ORE Mode User Preset

(Add-on FPGA or ASIC) (External) fitness evaluation module

Thresholds

No. of Gens.

Xover

Mutn.

00

< 256

< 232

0–15

0–15

01

32

512

12

1

10

64

1024

13

2

11

128

4096

14

3 (Internal) fitness evaluation module Application module

GA memory

logic can be tolerated by running the GA core in one of the three preset modes. b) User experimentation—GAs are used widely to find solutions for problems with unknown characteristics. For such problems and many other problems, the end-user will have to experimentally determine the best values for all the GA parameter settings. The preset modes in the proposed GA core present three sets of GA parameter values that can be used as effective starting points for such experimentation. 2) Scan Chain Testing: The AUDI HLS system can also automatically insert scan chain testability when synthesizing the IP core. The proposed GA core has a scan chain connecting all the registers in the design. A scan chain test can be run on the core by asserting the test signal and feeding the user test pattern in the scanin port. The output of the scan chain can be observed on the scanout port. This scan chain can also be connected to a top level scan chain in a system-level design. 3) Design Considerations for Space Applications: The increasing number of remote space missions has necessitated autonomous spacecrafts that are capable of handling unexpected situations and adapting to new environments [32]. This requires deployment of intrinsically evolvable hardware whose adaptation and reconfiguration are controlled by on-board evolutionary algorithms. In [32], Stoica et al. identify the following characteristics as the most critical in space oriented EHW. a) Systems approach to EHW design—The EHW system should help to reconfigure and adapt the higher level application to the changing environment. b) Flexibility in fitness calculation—The means of computing and the context of a candidate solution’s fitness value need to be considered. c) Response time—Most of the space applications are timecritical applications that must adapt to the changing environment quickly before serious damage is done to the system and/or the mission itself. d) Safety—Space systems are very expensive systems that are highly sensitive to even small errors and/or environmental changes. Hence, safety of the space systems is the most critical characteristic, as they can be permanently lost or damaged with the slightest of problems. The proposed GA core addresses these issues in the following ways. a) Design of the GA core as a compact drop-in IP module and its capability to be integrated at various design levels

Proposed GA core

RNG module

Pop. size

141

Initialization module

GA module (FPGA or ASIC)

Fig. 5. Implementation of a hybrid EHW system (with internal and external fitness modules) using the proposed GA core.

enable a systems approach to the design of the EHW application. b) The proposed GA core supports external fitness functions as described earlier. c) By allowing the user to program the number of generations, the runtime of the GA can be controlled. Moreover, the best candidate of every generation is always output to the application to use in case of an emergency. 4) Implementation of a Hybrid Intrinsic EHW System: The proposed GA core can be used to implement a hybrid intrinsic EHW system as shown in Fig. 5. The GA core allows the user to multiplex between internal and external fitness functions. The fitness value (shown in bold in Fig. 5) and the handshaking signals are available to the external fitness function module. The external fitness function, which may be on a different chip or board, can be added on later to expand the system functionality. It is possible to support up to seven external fitness functions in addition to the internal fitness function, if the required multiplexers are added in the external hardware or if the external fitness function is housed on a reconfigurable fabric such as an FPGA. D. Scalability of the GA Core The synthesized RT-level and gate-level netlists of the proposed GA core support chromosome encodings of length up to 16-bits. In this section, two methods are discussed that allow the proposed GA core to support chromosome lengths of more than 16-bits. a) Re-synthesis of the GA core netlists—The most efficient method of obtaining a GA core that supports chromosome lengths of more than 16-bits is to modify the behavioral description of the GA core, available in VHDL, and resynthesize the entire netlist using a HLS tool. The entire process of resynthesis using the AUDI

142

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

16

16

GA_Core2 (LSB - 16-bits)

GA_Core1 (MSB - 16-bits)

16

16

RNG1

RNG2

16

32

32

candidate ports mem WR

candidate ports 16

fit_valid_2

16 fit_value_2

32

fit_valid_1

fit_value_1

fit_request_1

16

candidate

Fitness Evaluation Module (FEM)

fit_request_2

scalingLogic_ parsel

16

16

16

fit_value Fitness

Candidate GA Memory

Fig. 6.

Implementation of a 32-bit GA core using two 16-bit cores.

HLS tool takes only a few minutes. The only disadvantage with such resynthesis is that the functionality and timing details of the newly synthesized GA core need to be verified using extensive simulations. b) Supporting longer chromosomes using multiple GA cores—If resynthesis of the entire netlist is undesirable, two or more GA cores, which support chromosome lengths of 16-bits, can be used together to support chromosome lengths of more than 16-bits. The following section describes how to obtain a GA core that supports chromosome lengths of up to 32-bits using two instances of the proposed GA core. 1) Implementation of a 32-Bit GA Core Using Two 16-Bit GA Cores: A GA core that supports chromosome lengths of up to 32-bits can be built using two of the proposed GA cores using the setup shown in Fig. 6. Below is a discussion on how each operation of the 32-bit GA core is affected by this setup. a) Initial population—The 32-bit individuals of the initial population are generated by the concatenation of the 16-bit individuals produced by each of the 16-bit GA cores. As shown in Fig. 6, each of the GA cores has its own RNG that produce random 16-bit vectors that form the LSB and MSB of the 32-bit individuals. b) Parent selection—If both GA cores are allowed to perform independent parent selection, the resulting 32-bit individual might be the concatenation of a 16-bit

Least Significant Bits (LSB) and a 16-bit Most Significant Bits (MSB) that belong to two different individuals! Hence, only GA_Core1 is allowed to perform parent selection while the GA_Core2 is forced to complete its parent selection when a valid parent is chosen by GA_Core1. This is accomplished by using external logic, called scalingLogic_parSel in Fig. 6, to interact with GA_Core2 when it requests the current individual’s fitness from the GA memory. Instead of supplying the stored fitness of the individual, the scalingLogic_parSel module supplies a fitness of zero until GA_Core1 has found the parent individual. This ensures that both GA_Core1 and GA_Core2 select the same individual as its parent. Hence the MSB and LSB passed on to the crossover phase will belong to the same individual selected as the parent by GA_Core1. c) Crossover—Both the GA cores are allowed to perform independent one-point crossover operations on their 16-bit individuals. This corresponds to a threepoint crossover operation on the entire 32-bit individual. Since both crossover operations are independent of each other, the crossover probability for the 32-bit GA core, xovProb32, is given by the equation below xovProb32 = xovProb16 (MSB) + xovProb16 (LSB) − (xovProb16 (MSB) ∗ xovProb16 (LSB)). The individual crossover probabilities of the two GA cores xovProb16 (MSB) and xovProb16 (LSB) should be programmed according to the equation above. It should be noted that since three-point crossover can be more disruptive than one-point crossover, lower crossover probabilities should be used to prevent loss of good schemata. d) Mutation—Both the GA cores are allowed to perform independent mutation operations. This implies that at most 2-bits can be flipped in the 32-bit chromosome due to mutation. Since both mutation operations are independent of each other, the mutation probability for the 32-bit GA core, mutProb32, is given by the equation below mutProb32 = mutProb16 (MSB) + mutProb16 (LSB) − (mutProb16 (MSB) ∗ mutProb16 (LSB)). The individual mutation probabilities of the two GA cores mutProb16 (MSB) and mutProb16 (LSB) should be programmed according to the equation above. e) Fitness evaluation—The MSB and LSB from the two GA cores are concatenated and sent to the fitness evaluation module. The “fit_request” signal is generated from GA_Core1. The “fit_valid” signal is sent to both the GA cores while the “fitness” value is sent only to GA_Core1. f) Candidate and fitness storage—The MSB and LSB from the two GA cores are concatenated to form the 32-bit candidate that is stored in the GA memory. The write signal for the GA memory is generated from GA_Core1. The fitness value is written only from GA_Core1.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

143

TABLE V RT-L EVEL S IMULATION R ESULTS O BTAINED FOR THE T HREE T EST F UNCTIONS (BF6, F2, AND F3) U NDER VARIOUS GA PARAMETER S ETTINGS

Test function

Best fitness

Convergence (gen. num.)

Run number

RNG seed

Population size

Crossover threshold

1

45890

32

10

4047

1

8

2

45890

64

10

4271

14

30

3 4

10593 1567

32 32

10 10

4271 4146

3 2

16 26

5

1567

32

12

4047

2

10

6

45890

32

10

3060

15

18

7

45890

64

10

2096

1

10

8

10593

64

10

3060

10

26

9

10593

32

12

3060

5

12

10

1567

32

10

3060

16

20

BF6

F2

F3

IV. E XPERIMENTS AND R ESULTS

3200.02 3200.01

A. RT-Level Simulations At the RT-level, the GA core was simulated using Cadence NC-Launch to verify the functionality. The effectiveness of the GA core was tested by optimizing three maximization test functions shown below using various parameter settings. All the experiments use a chromosome length of 16. Hence all the single variable experiments have an X-variable range of 0 to 65535 and the two variable experiments have equal ranges (0 to 255). All the experiments used a mutation rate of 0.0625 and ran for 32 generations. The values of the other GA parameters were varied to obtain different convergence characteristics and are listed in Table V along with the results of the corresponding experiments. 1) Test Function #1 (Binary F6): B F6(x) = ((x 2 + x) ∗ cos(x)/4000000) + 3200 This is a very difficult test function that has numerous local maxima as can be seen in Fig. 7 and has exactly one global maxima with a value of 4271 when x = 65522. This is a standard test function used to test the effectiveness of GAs and other optimization algorithms [33]. 2) Test Function #2: F2(x, y) = 8 ∗ x − 4 ∗ y + 1020 This is a simple mini-max test function that has to maximize one variable (x) and minimize the other variable (y) to obtain the optimal objective function value of 3060. 3) Test Function #3: F3(x, y) = 8 ∗ x + 4 ∗ y This is a simple maxi-max test function that has to maximize both the variables (x and y) to obtain the optimal objective function value of 3060. The convergence plots for the three test functions under different parameter settings are shown in Figs. 8–12. In the plots, the x-axis denotes the generation number and the y-axis denotes an individual’s fitness. Each point P(i, j ) is a population member in generation i with a fitness value

Generation

3200.03

BF6(X)

The GA core was simulated and tested thoroughly at each level of design abstraction (behavioral, RT-level, and gatelevel). This section will discuss in detail the various experiments conducted at each design level and the simulation and hardware results obtained.

Value

3200 3199.99 3199.98 3199.97

Fig. 7.

0

50

100

150 X

200

250

300

(Zoomed in) Plot of the modified Binary F6 [33] test function.

of j . For the sake of clarity, the plots show only one of multiple members with the same fitness in any generation. Hence, as the population converges to the best few candidates in the latter generations, the number of points will be decreased. Although many inferior members are present in the initial random population, the final generations contain highly fit individuals and very few inferior individuals (due to mutation). Table V summarizes the best results obtained for the three test functions under various parameter settings. The “convergence” column indicates the generation number when the difference in average fitness between the current generation and next generation is less than 5%. It can be clearly seen that the proposed GA core finds the optimal values for all the test functions. But the optimum is found only for certain parameter settings, underlining the need for programmability of the GA core’s parameters. It can also be seen that the random numbers used by the GA play a vital role in determining the performance. For instance, in run #1 from Table V, the GA core converges prematurely to a local optimum. But when the RNG seed is changed from 45890 to 10593 (Run#3), the convergence of the GA is better and the global optimum is found under the exact same settings for the other parameters.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

4000

4000

3500

3500

3000

3000

2500

2500

Fitness

Fitness

144

2000 1500

1500

1000

1000

500

500

0

0

3

6

9

12

15 18 21 Generation

24

27

30

33

Fig. 8. Convergence plot for the BF6 test function using number of generations = 32—Run#3 of Table V.

4000 3500 3000 Fitness

2000

2500 2000 1500 1000 500 0

0

3

6

9

12

15 18 21 Generation

24

27

30

33

Fig. 9. Convergence plot for the BF6 test function with initial seed for RNG = 1567—Run#4 of Table V.

Fig. 11 shows the convergence plot for test function #2, which is the mini-max objective function. Fig. 12 shows the convergence plot for test function #3, which is the maxi-max objective function. Figs. 11 and 12 show that small population sizes and fewer generations are sufficient to solve simple problems. The convergence characteristics for the “hard” binary F6 test function are shown in Figs. 8–10. The experimental runs show that finding the optimal parameter settings for a difficult problem might be nontrivial and that the initial seed for the RNG module is an important factor in GA convergence. For instance, changing the initial seed for the RNG module from 45890 in run #1 to 10593 in run #3 (while using the same values for all the other programmable parameters) improved the best solution found for the BF6 test function by about 5.5%, while for test function F2 the optimal result was found quickly with the initial RNG seed set at 45890. Thus, it is clear that the optimal GA parameter settings differ widely from function to function reiterating the need for a customizable GA core. B. FPGA Implementation Results The gate-level Verilog netlist of the GA core was synthesized using the Xilinx ISE 10.1i tool and mapped to a Virtex2Pro (xc2vp30-ff896, speed grade-7) device. Table VI

0

0

3

6

9

12

15 18 21 Generation

24

27

30

33

Fig. 10. Convergence plot for the BF6 test function with crossover rate = 0.75—Run#5 of Table V.

shows the area utilization, clock speed, and block memory utilization for the placed and routed GA core on this Xilinx device. Block memory utilization is reported as it is not included in the logic utilization computation. It is to be noted that the dedicated block memory in the Xilinx VirtexII Pro device implements both the GA memory module and the lookup-based fitness evaluation module. The post placeand-route simulation model for the designed GA IP core was extracted and simulated using ModelSim to verify the functionality and timing. The design was then downloaded on to the FPGA device and its functionality was verified using Chipscope Pro 10.1i tools. The effectiveness of the GA core was then tested by optimizing three difficult maximization test functions shown below. For the FPGA experiments, complex test functions have been used to test the effectiveness of the GA core. These functions have been modified to enable easy hardware implementation (floating coefficients have been changed so that they can be realized using shift and add, etc.). 1) Modified and Scaled Binary F6: mBF6_ 2(x) = 4096 + ([(x 2 + x) ∗ cos(x)]/220 ), 0 ≤ x ≤ 65535. This function is a modified and scaled version of the maximization test function from Haupt and Haupt [33]. It has a single globally optimal solution at x = 65521 with a value = 8183. 2) Modified Binary F7: mBF7_2(x, y) = 32768 + (56 ∗ [x ∗ sin(4x) + 1.25 ∗ y ∗ sin(2y)]) , 0 ≤ x, y ≤ 255. This function is a modified version of the minimization function BF7 in [33]. It has been modified into a maximization function that has a single optimal solution with a value = 63904 at x = 247 and y = 249.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

3000

TABLE VI P OST P LACE - AND -ROUTE S TATISTICS FOR THE P ROPOSED GA C ORE AND

2500

F ITNESS M ODULE ON V IRTEX II P RO D EVICE ( XC 2 VP 30-7 FF 896)

2000 Fitness

145

Design attribute

1500 1000

Value

Logic utilization (% slices used)

13%

Clock period

50 MHz

Block memory utilization (GA memory)

1%

Block memory utilization (fitness lookup module)

48%

500 0

0

3

6

9

12

15 18 21 Generation

24

27

30

33

Fig. 11. Convergence plot for the test function F2 with population size = 32—Run #6 of Table V.

3000 2500

Fitness

2000 1500

TABLE VII B EST F ITNESS VALUES O BTAINED BY THE GA FOR THE M BF6_2 F UNCTION FOR D IFFERENT PARAMETER S ETTINGS (XR = C ROSSOVER R ATE ) PopSize = 32

PopSize = 64

RNG_Seed (hexadecimal)

XR = 10

XR = 12

XR = 10

XR = 12

2961

7999

7813

7824

7819

061F

6175

7578

8134

8129

B342

7612

7497

7612

7719

AAAA

7534

7534

7578

7864

A0A0

8104

7406

8135

8039

FFFF

7291

7623

7847

7669

1000

The application module contains a simple FSM to perform the two-way handshaking with the GA core and the 0 hardware implementation of the fitness function. A lookup0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 based implementation has been used for the fitness functions Generation as this resulted in better operational speed than a combinaFig. 12. Convergence plot for the test function F3 with initial seed for tional implementation. In the lookup-based fitness computation RNG = 1567—Run #10 of Table V. method, block ROMs within the FPGA device are populated with the fitness values corresponding to each solution encoding. The approach is used only to demonstrate the effectiveness 3) Modified 2-D Shubert Function: of the proposed GA IP core in optimizing difficult maximization test functions without having to implement the actual test mShubert2D(x1 , x2 ) 2 5 functions in hardware. The entire experimental setup was implemented on the {i. cos[(i + 1).x k + i ]} , = 65535 − 174 ∗ 150 + Xilinx Virtex2Pro (xc2vp30-7ff896) FPGA device. Chipscope k=1 i=1 Pro 10.1 tools were used to build cores to observe and 0 ≤ x 1 , x 2 ≤ 255. record the “best fitness” and “sum of fitness” values for each This function is a minimization function (derived from [15]) generation on the FPGA. modified into a maximization function with a global optimal The proposed GA core was run with 12 different parameter value = 65535. The function has 48 global optimal solutions settings as shown in Tables VII–IX. It is to be noted that and numerous local maxima. mutation rate and number of generations were set to 0.0625 The experimental setup is similar to the one shown in and 64, respectively, for all the experiments. The number of Fig. 4. The Xilinx ISE 10.1i tool achieved a clock speed of generations was set to 64, as the population converged within 50 MHz for the GA module (GA core, RNG module, and the 64 generations for all three fitness functions. GA memory). The initialization module and the application Table VII shows the results obtained by the GA core for (fitness) module are separate entities that communicate with the mBF6_2 test function using different settings for the the GA module using handshaking. The Xilinx ISE tool was programmable GA parameters. In the experiments conducted, able to achieve a clock speed of 200 MHz for these modules. A the best solution found by the proposed GA core for the digital clock manager core is used to generate the two clocks mBF6_2 test function was 65345. This solution evaluates to from the on-board 100 MHz clock. a fitness of 8135 which is approximately 0.59% less than the The initialization module consists of a simple finite state globally optimal fitness value of 8183. It is to be noted that machine (FSM) to perform the two-way handshaking operation the best solution found by the proposed GA core lies within using the “data valid” and “data ack” signals to initialize the approximately 0.27% distance of the globally optimal solution various GA parameters one by one. in the solution space. 500

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

TABLE VIII B EST F ITNESS VALUES O BTAINED BY THE GA FOR THE M BF7_2

9000

F UNCTION FOR D IFFERENT PARAMETER S ETTINGS

8000

PopSize = 64

XR = 10

XR = 12

XR = 10

XR = 12

2961

56 835

56 835

48 135

56 456

061F

59 648

53 432

59 648

60 656

B342

55 000

59 928

59 480

57 184

AAAA

55 560

52 704

55 000

61 496

A0A0

58 136

53 040

58 024

56 624

FFFF

60 880

61 384

56 344

60 768

TABLE IX B EST F ITNESS VALUES O BTAINED BY THE GA FOR THE S HUBERT F UNCTION FOR D IFFERENT PARAMETER S ETTINGS

6000 5000 4000

Best Fitness Avg Fitness

3000

Generation Fig. 13. Convergence plot for the test function mBF6_2(x) with initial RNG seed = (061F)16 , crossover threshold = 10, and popSize = 64 (data collected from hardware execution).

(XR = C ROSSOVER R ATE )

9000 PopSize = 64

RNG_Seed (hexadecimal)

XR = 10

XR = 12

XR = 10

XR = 12

2961

56 835

56 835

48 135

56 835

061F

56 835

55 095

65 535

58 227

B342

56 487

56 487

54 051

63 795

AAAA

63 795

56 487

65 535

65 535

A0A0

56 835

63 795

65 535

53 355

FFFF

53 355

65 535

48 135

56 835

8000 Fitness value

PopSize = 32

7000

7000 6000 5000 Best Fitness

4000 3000

Avg Fitness

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61

PopSize = 32

RNG_Seed (hexadecimal)

Fitness value

(XR = C ROSSOVER R ATE )

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61

146

Generation

Table VIII shows the results obtained by the GA core for the mBF7_2 test function using different settings for the programmable GA parameters. The best candidate found by the proposed GA core for the mBF7_2 test function was 65 516. The corresponding solution is y = (FF)16 and x = (EC)16 with a fitness of 61496. This is approximately 3.7% less than the globally optimal fitness value of 63904. The best solution found by the proposed GA core lies within 4.3% and 2.35% distance of the globally optimal solution along the x-direction and y-direction respectively of the solution space. Table IX shows the results obtained by the GA core for the mShubert2D test function using different settings for the programmable GA parameters. The proposed GA core found more than one globally optimal solution for many different parameter settings as shown in bold in Table IX. The GA core found two different globally optimal solutions, (x 1 = C2, y1 = 4A) and (x 2 = DB, y2 = 4A), during the experimental run with RNG seed = (A A A A)16, population size = 64, and crossover threshold = 10. Figs. 13–16 plot the data collected from the hardware measurements and illustrate the convergence of the GA optimizer for the three test functions. Both the best fitness and average fitness values are plotted for every generation. It can be seen that the GA core finds the best solution within the first 10 generations for all three test functions. From Figs. 13 and 14, it can be seen that the GA core evaluates at most ({10 generations + 1 initial population = 11} × {population size = 64}) 704 candidate solutions before finding the best solution. Although the size of the entire solution space is only 65 536,

Fig. 14. Convergence plot for the test function mBF6_2(x) with initial RNG seed = (A0A0)16 , crossover threshold = 10, and popSize = 64 (data collected from hardware execution).

the GA core evaluates less than 1.1% of the solution space before finding the best solution. This is a major speedup over an exhaustive search and is very important for real-time applications and other applications that have time-consuming fitness evaluation procedures such as EHW. From Fig. 15, it can be seen that the GA core evaluates at most ({18 generations + 1 initial population = 19} × {population size = 64}) 1216 candidate solutions before finding the best solution for the mBF7_2 test function. From Fig. 16, it can be seen that the GA core evaluates at most ({12 generations + 1 initial population = 13} × {population size = 64}) 832 candidate solutions before finding the best solution for the mShubert2D test function. Thus, it can be seen that the GA core quickly converges to a good solution after evaluating just a small fraction of the solution space even for very difficult test functions. It is expected that the GA will find good solutions quickly due to its population-based search mechanism. The GA then converges toward the good solutions and tries to find better solutions in their vicinity. But it has to be noted that the GA core finds the optimal solutions only for certain settings of the GA parameters. This reiterates the necessity for the ability to change the values of these parameters according to the application.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

65000

147

product of the counter value and the clock period. The hardware GA implementation achieved a speedup of approximately 5.16× over the software implementation.

60000 55000

Fitness value

50000 45000

V. C ONCLUSION

40000 35000 30000

Best Fitness

25000

64

Avg Fitness 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61

20000

Generation

Fig. 15. Convergence plot for the test function mBF7_2(x, y) with initial RNG seed = (AAAA)16 , crossover threshold = 12, and popSize = 64 (data collected from hardware execution). 70000 65000 60000

Fitness value

55000 50000 45000 40000 35000 Best Fitness

30000

Avg Fitness

25000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64

20000

A readily synthesizable, robust GA core for FPGAs has been designed that is easy to interface and use with a wide range of applications. The efficiency of the designed core is illustrated by the low area utilization and the high clock speed. The effectiveness of the designed GA core is evident from the convergence characteristics obtained for the standard test function. The gate-level Verilog implementation of the GA core is advantageous in that it can be directly used by commercial layout tools for chip layout generation. The availability of preset modes and scan chain testability provides some basic fault tolerance to an ASIC designed using the proposed GA core. The proposed core has been used as part of a self-reconfigurable analog array architecture to compensate extreme temperature effects on VLSI electronics [34], [35]. The GA module, an internal (slew-rate) fitness function, and the other monitoring and controlling modules were implemented as a digital ASIC. The digital ASIC passed all the Design Rule Check, Electrical Rule Check, antenna, and soft-power tests and has been successfully fabricated using a radiation-hardened SOI technology. The fabricated chips have to be tested and incorporated in the Self Reconfigurable Analog Array architecture.

Generation

Fig. 16. Convergence plot for the test function mShubert2D(x1 , x2 ) with initial RNG seed = (AAAA)16 , crossover threshold = 10, and popSize = 64 (data collected from hardware execution).

C. Runtime Comparison With Software Implementation A software implementation of a GA optimizer, similar to the GA optimization algorithm in the IP core, was developed in the C programming language. GAs, when used for hardware applications such as EHW, need to communicate with the application to evaluate the fitness of the candidate solutions. This communication overhead can be effectively modeled using the Xilinx Virtex2Pro board as it contains PowerPC processor IP cores that can execute software programs. The experimental setup consists of the GA software running on the PowerPC processor in the Xilinx Virtex2Pro board and the fitness evaluation module (implemented as the same lookup table using block RAM) on the Xilinx Virtex2Pro FPGA. This setup gives a fair comparison between the software and hardware implementations as both are implemented using the same technology node. The runtime, averaged over six runs, for the GA program for a population size of 32, crossover rate of 0.625, mutation rate of 0.0625, and running for 32 generations to optimize the modified binary F6 (mBF6_2) function was 37.615 ms. In hardware, a 32-bit counter was implemented that is clocked using the 50 MHz clock used for the GA IP core. The GA execution time on the hardware was computed as the

R EFERENCES [1] J. H. Holland, Adaptation in Natural and Artificial Systems. Cambridge, MA: MIT Press, 1992. [2] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Boston, MA: Addison-Wesley, 1989. [3] J. J. Grefenstette, R. Gopal, B. J. Rosmaita, and D. Van Gucht, “Genetic algorithms for the traveling salesman problem,” in Proc. 1st Int. Conf. Genetic Algorithms, 1985, pp. 160–168. [4] M. Vellasco, R. Zebulum, and M. Pacheco, Evolutionary Electronics: Automatic Design of Electronic Circuits and Systems by Genetic Algorithms. Boca Raton, FL: CRC Press, Dec. 2001. [5] S. D. Scott, A. Samal, and S. Seth, “HGA: A hardware-based genetic algorithm,” in Proc. ACM/SIGDA 3rd Int. Symp. Field Programmable Gate Array, 1995, pp. 53–59. [6] M. Tommiska and J. Vuori, “Implementation of genetic algorithms with programmable logic devices,” in Proc. 2nd Nordic Workshop Genetic Algorithm, 1996, pp. 71–78. [7] B. Shackleford, G. Snider, R. Carter, E. Okushi, M. Yasuda, K. Seo, and H. Yasuura, “A high-performance, pipelined, FPGA-based genetic algorithm machine,” Genetic Algorithms Evolvable Mach., vol. 2, no. 1, pp. 33–60, Mar. 2001. [8] N. Yoshida and T. Yasuoka, “Multi-GAP: Parallel and genetic algorithms in VLSI,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 1999 (SMC ’99), pp. 571–576. [9] W. Tang and L. Yip, “Hardware implementation of genetic algorithms using FPGA,” in Proc. 47th Midwest Symp. Circuits Syst., 2004, pp. 549–552. [10] C. Aporntewan and P. Chongstitvatana, “A hardware implementation of the compact genetic algorithm,” in Proc. Congr. Evol. Comput., 2001, pp. 624–629. [11] P. Graham and B. E. Nelson, “Genetic algorithms in software and hardware a performance analysis of workstation and custom machine implementation,” in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 1996, pp. 216–225. [12] M. S. Jelodar, M. Kamal, S. M. Fakhraie, and M. N. Ahmadabadi, “SOPC-based parallel genetic algorithm in evolutionary computation,” in Proc. IEEE Congr. CEC, Vancouver, BC, 2006, pp. 2800–2806.

148

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 1, FEBRUARY 2010

[13] N. Nedjah and L. De Macedo Mourelle, “Massively parallel hardware architecture for genetic algorithms,” in Proc. 8th Euromicro Conf. Digital Syst. Design, 2005, pp. 231–234. [14] S. Wakabayashi, T. Koide, K. Hatta, Y. Nakayama, M. Goto, and N. Toshine, “GAA: A VLSI genetic algorithm accelerator with on-thefly adaptation of crossover operators,” in Proc. IEEE Intl. Symp. Circuits Syst., 1998, pp. 268–271. [15] P.-Y. Chen, R.-D. Chen, Y.-P. Chang, L.-S. Shieh, and H. Maliki, “Hardware implementation for a genetic algorithm,” IEEE Trans. Instrumentation Measurement, vol. 57, no. 4, pp. 699–705, Apr. 2008. [16] K. Sai Mohan and B. Nowrouzian, “A diversity controlled genetic algorithm for optimization of FRM digital filters over DBNS multiplier coefficient space,” in Proc. IEEE Int. Symp. Circuits Syst., 2007, pp. 2331–2334. [17] G. Rudolph, “Evolutionary search for minimal elements in partially ordered finite sets,” in Proc. 7th Int. Conf. Evol. Programming, 1998, pp. 345–353. [18] J. Holleman, B. Otis, S. Bridges, A. Mitros, and C. Diorio, “A 2.92W hardware random number generator,” in Proc. IEEE ESSCIRC, Montreux, Switzerland, 2006, pp. 134–137. [19] M. Meysenburg, “The effect of pseudo-random number generation quality on the performance of a simple genetic algorithm,” M.S. thesis, Univ. Idaho, Moscow, ID, 1997. [20] M. Meysenburg and J. A. Foster, “Randomness and GA performance, revisited,” in Proc. Genetic Evol. Comput. Conf., vol. 1. 1999, pp. 425–432. [21] E. Cantú-Paz, “On random numbers and the performance of genetic algorithms,” in Proc. Genetic Evol. Comput. Conf., Jul. 2002, pp. 311–318. [22] S. Garfinkel and G. Spafford, Practical UNIX and Internet Security. 2nd ed. Sebastopol, CA: O’Reilly Books, Apr. 1996. [23] U. Elsner, “Influence of random number generators on graph partitioning algorithms,” Electron. Trans. Numerical Anal., vol. 21, pp. 125–133, 2005. [24] H. De Garis, “Evolvable hardware: Genetic programming of a darwin machine,” in Proc. Intl. Conf. Artificial Neural Network Genetic Algorithms, New York: Spinger-Verlag, 1993, pp. 441–449. [25] I. Kajitani, T. Hoshino, D. Nishikawa, H. Yokoi, S. Nakaya, T. Yamauchi, T. Inuo, N. Kajihara, M. Iwata, D. Keymeulen, and T. Higuchi, “A gate-level EHW chip: Implementing GA operations and reconfigurable hardware on a single LSI,” in Proc. Int. Conf. Evolvable Syst.: From Biology Hardware, LNCS vol. 1478. 1998, pp. 1–12. [26] N. J. Macias, “The PIG paradigm: The design and use of a massively parallel fine grained self-reconfigurable infinitely scalable architecture,” in Proc. 1st NASA/DoD Workshop Evolvable Hardware, 1999, pp. 175–180. [27] A. Stoica, D. Keymeulen, D. Vu, R. Zebulum, I. Ferguson, T. Daud, T. Arsian, and G. Xin, “Evolutionary recovery of electronic circuits from radiation induced faults,” in Proc. Congr. Evol. Comput., vol. 2. 2004, pp. 1786–1793. [28] J. Langeheine, K. Meier, J. Schemmel, and M. Trefzer, “Intrinsic evolution of digital-to-analog converters using a CMOS FPTA chip,” in Proc. NASA/DoD Conf. Evolvable Hardware, 2004, pp. 18–25. [29] V. Baumgarte, G. Ehlers, F. May, A. Nückel, M. Vorbach, and M. Weinhardt, “PACT XPP a self-reconfigurable data processing architecture,” J. Supercomputing, vol. 26, no. 2, pp. 167–184, 2003. [30] C. Lambert, T. Kalganova, and E. Stomeo, “FPGA-based systems for evolvable hardware,” in Proc. World Academy Sci., Eng. Technol., vol. 12. Mar. 2006, pp. 123–129. [31] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. Sangiovanni-Vincentelli, “Sequential circuit design using synthesis and optimization,” in Proc. IEEE Int. Conf. Comput. Design: VLSI Comput. Processors, Oct. 1992, pp. 328–333. [32] A. Stoica, A. Fukunaga, K. Hayworth, and C. Salazar-Lazaro, “Evolvable hardware for space applications,” in Proc. ICES, LNCS vol. 1478. 1998, pp. 166–173. [33] R. Haupt and S. E. Haupt, Practical Genetic Algorithms. 2nd ed. Hoboken, NJ: Wiley, 2004. [34] A. Stoica, D. Keymeulen, M. Mojarradi, R. Zebulum, and T. Daud, “Progress in the development of field programmable analog arrays for space applications,” in Proc. IEEE Aerospace Conf., Mar. 2008, pp. 1–9. [35] A. Stoica, D. Keymeulen, R. Zebulum, S. Katkoori, P. Fernando, H. Sankaran, M. Mojarradi, and T. Daud, “Self-reconfigurable analog array integrated circuit architecture for space applications,” in Proc. NASA/ESA Conf. Adaptive Hardware Syst., Noordwijk, The Netherlands, Jun. 2008, pp. 83–90.

[36] N. Park and A. C. Parker, “Sehwa: A software package for synthesis of pipelines from behavioral specifications,” IEEE Trans. Comput.-Aided Design Integrated Circuits Syst., vol. 7, no. 3, pp. 356–370, Mar. 1988. [37] A. Thompson, “Exploring beyond the scope of human design: Automatic generation of FPGA configurations through artificial evolution,” in Proc. 8th Annu. Advanced PLD FPGA Conf., 1998, pp. 5–8. [38] L. Sekanina and S. Friedl, “On routine implementation of virtual evolvable devices using COMBO6,” in Proc. 2004 NASA/DoD Conf. Evolvable Hardware, 2004, pp. 63–70. [39] C. Gopalakrishnan, “High level techniques for leakage power estimation and optimization in VLSI ASICs,” Ph.D. dissertation, Dept. Comput. Sci. Eng., Univ. South Florida, Tampa, FL, 2003. Pradeep Fernando (S’02–M’09) received the B.E. degree in computer science and engineering from Sri Venkateswara College of Engineering, University of Madras, India, in 2002 and the M.S. degree in computer engineering from the University of South Florida, Tampa, FL in 2006, where he is currently working toward the Ph.D. degree in computer science and engineering. From 2002–2008, he has served as a Graduate Teaching Associate and a Graduate Research Associate at the University of South Florida. He has assisted in teaching many hardware courses and labs, and instructed the undergraduate VLSI Design Automation course many times and was awarded a certificate of recognition for outstanding teaching by the USF Provost in 2006. He has co-authored several technical papers in refereed conferences and has served as a reviewer for many technical conferences and journals including the IEEE T RANSACTIONS ON V ERY L ARGE S CALE I NTEGRATION (VLSI) S YSTEMS . His research interests include VLSI circuit design and design automation, genetic algorithms, multiobjective optimization algorithms, and evolvable hardware. Srinivas Katkoori (S’92–M’97–SM’03) received the B.E. degree in electronics and communication engineering from Osmania University, Hyderabad, India, in 1992 and the Ph.D. degree in computer engineering from the University of Cincinnati, Cincinnati, OH, in 1998. In 1997, he joined the Department of Computer Science and Engineering, University of South Florida, Tampa, as an Assistant Professor. In 2004, he was tenured and promoted to an Associate Professor. His research interests are in the general area of VLSI CAD algorithms and design methodologies. Specific research areas include high-level synthesis, low-power synthesis, FPGA synthesis, and radiation-tolerant CAD for FPGAs. To date, he has published over 50 peerreviewed journal and conference papers. He holds one U.S. patent. Dr. Katkoori is a recipient of 2001 NSF CAREER award. He is a Senior Member of ACM. He serves on technical committees of several VLSI conferences and as a peer reviewer for several VLSI journals. Since 2006, he is serving as an Associate Editor of the IEEE T RANSACTIONS ON V ERY L ARGE S CALE I NTEGRATION (VLSI) S YSTEMS . He is the recipient of the inaugural 2002–2003 University of South Florida Outstanding Faculty Research Achievement Award. He is also the recipient of the 2005 Outstanding Engineering Educator Award from the IEEE Florida Council (Region 3).

Didier Keymeulen received the BSEE, MSEE, and Ph.D. degrees in electrical engineering and computer science from the Free University of Brussels, Brussels, Belgium. In 1996, he joined the computer science division of the Japanese National Electrotechnical Laboratory as a Senior Researcher. Currently, he is a Principal Member of the technical staff of the NASA Jet Propulsion Laboratory (JPL), Pasadena, CA, in the Bio-Inspired Technologies Group. At JPL, he is responsible for DoD and NASA applications on evolvable hardware for adaptive computing that leads to the development of fault-tolerant electronics and autonomous and adaptive sensor technology. He participated also as Test Electronics Lead in the Tunable Laser Spectrum instrument on Mars Science Laboratory. He has served as Chair, Co-Chair, and Program Chair of the NASA/ESA Conferences on Adaptive Hardware.

FERNANDO et al.: CUSTOMIZABLE FPGA IP CORE IMPLEMENTATION OF A GENERAL-PURPOSE GA ENGINE

Ricardo Zebulum received the Bachelors, Masters, and the Ph.D degrees in electrical engineering in 1992, 1995, and 1999, respectively, all from the Department of the Catholic University of Rio, Brazil. He stayed two years at Sussex University, U.K., from 1997 to 1999. He is currently a member of the Jet Propulsion Laboratory (JPL), California Institute of Technology, Pasadena. He has been involved with research in evolvable hardware and genetic algorithms since 1996 and has authored more than 50 papers in this area. Besides evolutionary hardware research, he also works as a Reliability Engineer at JPL.

149

Adrian Stoica (SM’06) received the BS degree in 1986 and obtained the MSEE degree from the Technical University of Iasi, Iasi, Romania, in 1986, and the Ph.D. degree from Victoria University of Technology, Melbourne, Australia, in 1996. He is currently a Senior Research Scientist at the NASA Jet Propulsion Laboratory, Pasadena, CA. His main research interests are in learning and adaptation in autonomous systems. He has six patents, has published over 100 papers, and has been keynote speaker at several conferences. He has started four conferences, one of which, the NASA/ESA Conference on Adaptive Hardware and Systems (previously NASA/DOD Conference on Evolvable Hardware), has its 10th anniversary in 2009.

Lihat lebih banyak...

A customizable FPGA IP core implementation of a general purpose Genetic Algorithm engine

Descripción

Comentarios