A Pattern Recognition Demonstrator based on a Silicon Neural Chip

June 15, 2017 | Autor: Leonardo Reyneri | Categoría: Pattern Recognition, Image recognition, Artificial Neural Network
Share Embed


Descripción

A Pattern Recognition Demonstrator Based on a Silicon Neural Chip D. Del Corso, F. Gregoretti, L.M. Reyneri, A. Allasia Dipartimento di Elettronica, Politecnico di Torino C.so Duca degli Abruzzi, 24 - 10129 TORINO - ITALY

Abstract

This paper describes a self-standing hardware pattern recognition system based on neural algorithms. The system uses a dedicated VLSI neural chip which implements a vector-matrix multiplier built of an array of 16  8 multiplying D/A converters with an 8-bit digital storage cell each. The conversion principle is based on an aperiodic clock which rotates data through a weighting shift register. A prototype chip has been fabricated, tested and assembled together with an array of photodetectors for simple image recognition purposes. The system has been conceived as a stand-alone demonstrator of pattern recognition capabilites of Arti cial Neural Networks.

1 INTRODUCTION Image recognition may take bene ts from the theory of Arti cial Neural Networks (ANN). The power of the method results from a large count of simple and identical computing elements, called neurons, which primarily compute the weighted sum of a number of input values. This method nds practical applications only provided that the number of neurons is very large. Silicon technology seems to be well suited to the implementation of large arrays of neurons because of the high integration density allowed. Although literature presents several VLSI chips for ANN [2, 3, 4, 5], most of them are still at the development stage. This paper presents a VLSI neural chip which has been developed by the authors. The chip has been used in a small image recognition system, where it is tightly interfaced to an array of photodetectors. A simple stand-alone prototype board containing an array of 4  4 detectors has been developed and fully tested (see g. 1). A loose connection to a Personal Computer provides a cost-e ective user interface.

2 IMAGE PROCESSING USING NEURAL NETWORKS A raster image can be described as a vector of N pixels I = (1; i1 ; i2 ; : : : ; iN ). An elementary N  M neural image transform is a non-linear correlation operator: Y = F (W  I ) (1) where the output vector Y = (y1 ; y2 ; : : : ; yM ). This transform is completely de ned by a weight matrix W and by a vector F of identical squashing functions F (xj ). Several works [2, 4] have shown that a number image processing tasks can be eciently performed by cascading an appropriate number of elementary image transformations (usually from 2 to 5), possibly with di erent sizes of the input and output domains.

Since neural transformations are rather regular operations (i.e. vector-matrix multiplications followed by vectors of identical transfer functions), they are very well suited to massive VLSI implementations. Next section presents such an implementation developed by the authors as a mixed analog/digital CMOS VLSI chip which uses a very ecient computation technique based on the use of Pulse Stream Modulations [3, 5, 6]. Measured performance of the board allow a complete vector-matrix multiplication (i.e. 128 operations) in about 200 s, which is equivalent to about 600 KFlops with an accuracy of about 7 bits. This value is not better than other more traditional implementations, because of the limited size of the chip. Obviously the speed of the photodetector array is negligible. An improved version of the chip with a speed about 200 times higher is currently under construction, which uses another Pulse Stream modulation technique. These results allow an average throughput of 5.000 image transformations per second (16 to 8 pixels). Learning is currently o chip (an interface to a Personal Computer has been included for this purpose). An on-chip self-test facility (not described further) has also been included.

Figure 1: Photograph of the image recognition board

3 PULSE STREAM MODULATIONS Input and output signals are encoded as streams of binary pulses (see g. 3), because this allows to process continuous information using binary signals. Pulse Rate Modulation (PRM) has been used in this work, although several other pulse modulation techniques have been used in literature [3]. PRM activation signals are encoded as asynchronous streams of pulses of xed width Ton = 100ns and average frequency fi. The frequency is made proportional to the activation i (either input ii or output yj ): fi = fmax i (2) where i and fi are normalized in the ranges [0 : : : 1] and [0 : : : fmax ], respectively. In the proposed prototype fmax = 500 kHz. To improve the response time of the circuit, which is proportional to the inverse of the minimum input frequency, all frequencies below fmin  10 kHz are considered as zero.

4 SILICON IMPLEMENTATION This section describes the VLSI implementation of an aggregate of neurons (see g. 5), which has been used in the pattern recognition demonstrator. Figg. 2 and 3 show the simpli ed block diagram of a PRM neuron and the corresponding timing diagram.

Figure 2: Block diagram of a PRM neuron

Figure 3: Timing diagram of the PRM neuron The ANN chip contains three major blocks: an input conditioner (not shown), which receives from outside the input activation signals and produces few internal signals (MMK and MEN); a synaptic matrix, which is the kernel of the aggregate and computes the product of input activations I by synaptic weights W ; an output generator (not shown) which computes the squeezing functions F and converts the computed results into output signals compatible with the selected modulation (PRM). The synaptic matrix is a bidimensional array of synapses, where all synapses of a column receive the same input activation, while all synapses in a row cooperate to compute the same output.

Working Principle of a Synapsis

The purpose of a synapsis is to store a weight and to multiply it by the input activation, producing the result in such a form (a current) that it can be directly summed up with contributions from other synapses. The working principle of a synapsis is based on a charge modulation multiplication technique. The weight is stored in an 8-bit shift register (digital weight storage) which is divided in two

halves (RW1 and RW2). At every input pulse, the input conditioner generates a sequence of 8 pulses of the aperiodic clock MMK (see g. 3), the periods of which follow a xed sequence To ; 2To ; 4To ; 8To ; To ; To ; To ; To . Since the shift register is connected in a closed loop, its content is intrinsically restored after the 8th pulse. The four former pulses are active (signal MEN) while the other four are needed to complete weight restoring. The outputs of the two shift registers drive either ON or OFF two current gererators of intensity Id and 16Id , respectively. It can be proven that, due to the particular shape of the clock MMK, at the end of the cycle a quantity of charge: qij = Id To wij = Kw wij (3) where Kw = (Id To ). In the prototype wij is in the range [,127 : : : + 127], while 100nA < Id < 10A and To = 100ns, therefore 10,14 C < Kw < 10,12 C. Each time a synapsis receives a pulse, it generates an electric charge qij = Kw wij proportional to the synaptic weight wij . This charge is injected into a common summing node S in the neuron kernel (see g. 2) at a rate fi proportional to input activation ii (in practice two summing nodes are used for positive and negative weights, respectively). This process is equivalent to the injection of a (noisy) current of average value aji equal to the amount of charge per unit of time, namely: aji = qij fi = Kw wij fmax ii = (Kw fmax )wij ii

(4)

The current aji is then proportional to the product of input activation by synaptic weight. An additional serial link (not shown) is used to load weight values from an external input.

Working Principle of a Neuron

The neuron kernel sums up the contributions (currents of all synapses aji ), squeezes the resulting value through a non-linear activation function F (x) and produces an output signal which is function of the neural activation: X yj = F ( wij ii ) (5) i

The currents are summed up by a transimpedance integrator (i.e. a CMOS inverter with an 8 pF capacitive feedback (see g. 2). The circuits connected to its output behave as a non-linear VCO which converts the output voltage again into a PRM signal. The summing nodes receive synaptic currents, which are integrated on the capacitor C. The output voltage Vo drives two threshold comparators. As long as the total current It is positive, the voltage Vo increases. When it crosses the upper threshold Vu , the corresponding comparator triggers and fully discharges the capacitor C. An output pulse is also generated on output line yj . The VCO transfer function F (x) is non-linear and approximates a sigmoid function with an output range [0 : : : 1]. The actual shape of the function F (x) can be derived from a detailed analysis of the behavior of the integrator and the output oscillator. The results are given directly while a more detailed analysis can be found in [6]. The output frequency (thus the output activation) can be related to the internal activation P ij ii by means of the following relationship: j x = i wwmax 1 yj = (6) Vdd ,V +RI T f xj RCfmaxlog( Vdd2 ,V b +RId To fmax xj ) u o max 2

d

The parameters C = 8pF, Vu = 4V, Vb = 1V, To = 100ns, fmax = 500kHz, Vdd = 5V, R and Id allow to tailor the actual shape of F (x) to the speci c network requirements. The 6

former parameters are xed, while the last two are trimmable and can be varied continuously by means of external currents. Fig. 4 shows the theoretical shape of F (x) together with some measurements performed on a set of 10 samples. See [6] for more details on the working principle of the device.

Figure 4: Theoretical and measured transfer function of a neuron: a) theor. Id = 10A, R = 330k , b) theor. Id = 10A, R = 600k , c) chip1 Id = 10A, R = 330k , d) chip2 Id = 10A, R = 330k , e) chip3 Id = 10A, R = 600k .

5 CONCLUSION A Pulse Rate neural chip with digital memory has been fabricated and tested. It has been used in a demonstrator of a neural image recognition system. Although the performance are still comparable with other technologies, the modulation method (Pulse Streams) and the implementation technique are promising. A much more performant version of the chip has been developed and is currently under fabrication, with an expected performance of about 100 MFLOPS per chip (with about 8 bit accuracy).

References

[1] Wassermann P.D., "Neural Computing: Theory and Practice", ed. Van Nostrand Reinhold, New York, 1989. [2] J. C. Lupo, "Defense Applications of Neural Networks", IEEE Communications Magazine, Vol 27, no 11, pp. 82-88, November 1989. [3] A. F. Murray, D. Del Corso and L. Tarassenko, "Pulse-Stream VLSI Neural Networks Mixing Analog and Digital Techniques", in IEEE Trans. on Neural Networks, Vol. 2, No. 2, March 1991. [4] -, IEEE COMPUTER, Special issue "Arti cial Neural Systems", March 1988. [5] A.F. Murray, A. V. W. Smith, "Asynchronous VLSI Neural Networks using Pulse Stream Arithmetic", IEEE JSSC, Vol 23, no 3, pp 688-697, June 1988. [6] D.Del Corso, F. Gregoretti, L.M. Reyneri, C. Pellegrini, "A Pulse Stream Synapsis Based on a Closed-Loop Shift Register", in Parallel Architectures and Neural Networks, ed. World Scienti c, pp. 231-244, May 1990. [7] L.M. Reyneri and M. Sartori, "A Neural Vector Matrix Multiplier Using Pulse Width Modulation Techniques", in Proc. of 3rd Italian Workshop on Neural Networks, Vietri sul Mare, May 1990.

Figure 5: Microphotograph of the Neural chip

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.