# Hardware Realization of a General Purpose FFT Processor in a Distributed Processing Configuration K. Ganesan, T. S. Govardhanarajan, A. Dhurkadas & S. Veerabhadraiah Naval Physical & Oceanographic Laboratory, Cochin-682004 Received 21 October 1980 Abstract. An efficient hardware implementation of a 1024-point sequential FFT processor is described. The processor has been configured in a distributed processing system with an INTEL 8086 microprocessor. In case of real signals, it performs a 1024-point transform of two real signals, simultaneously within the same period and subsequently does spectral separation in 2.6 msec. Its general purpose capability and the case of interfacing it with standard mini/micro computers are highlighted with a specific illustrations using the INTEL 8086 microprocessor. #### 1. Introduction Fourier transform has long been a principal analytical tool for signal processing applications in speech, telephony, radar, sonar etc. and the Fast Fourier Transform (FFT) has thus evolved as an efficient algorithm for computing the discrete Fourier transform. An efficient hardware implementation of a 1024 point FFT processor has been described. The processor has been realised as a 'sequential' processor using the Cooley-Tukey algorithm, with the data input arranged in bit-reversed form. It can transform a 1024-point time slice into a 1024-point complex frequency spectrum in 26.3 millisec. The processor is configured in a distributed processing system with an INTEL 8086 microprocessor. ## 2. System Description The FFT processor is connected to a common bus system into which the other functional components of the system are (Fig. 1). The microprocessor acting as the system controller does the scheduling and in a predetermined sequence hands over the control of the bus to the individual units as required. Figure 1. FFT system configuration. In the first phase, data is written into the memory from the Data Acquisition Unit (DAU) in a scrambled (bit-reversed) order. The data segment of the memory is divided into two $1k \times 16$ pages. The first page is for the real data and the second for the imaginary data. The quadrature components of a complex data occupy the same location in the two pages. In the second phase, the memory control is handed over to the FFT processor. After the computation of the 1024-point complex transform, the real and the imaginary parts of the complex spectrum are stored in place and the bus control is then handed over to the microprocessor. In the third phase, the microprocessor computes the magnitude spectrum and displays the same on a CRT or a raster scan TV monitor. #### 3. Fast Fourier Transform Processor The FFT processor unit consists of input latches, arithmetic data, twiddle factor, and butterfly control units as shown in Fig. 2. The input latches are used as temporary storage locations for the data to perform in-place-of computation. The data and the twiddle factor addressing units generate the proper addressing sequence for the computation of the butterfly operation to be performed in the arithmetic unit. The butterfly control unit generates the arithmetic unit controls, the data memory controls and the input latch controls to perform the required butterfly operation. The data memory addresses, the twiddle factors and the butterfly controls are stored in various bipolar PROMs addressed from a chain of counters which is operated at a basic clock frequency of 12.5 MHz. ## (a) Butterfly Equations The output nodes of the butterfly shown in Fig. 3 are defined as in Eqns. (1) and (2) for a radix-2 decimation-in-time algorithm $$X_m(k) = X_{m-1}(k) + W_p X_{m-1} \left( k + \frac{N}{2^m} \right)$$ (1) Figure 2. FFT processor unit. Figure 3. Dual node (butterfly) computation. $$X_{m}\left(k+\frac{N}{2^{m}}\right)=X_{m-1}(k)-W^{p}X_{m-1}\left(k+\frac{N}{2^{m}}\right) \tag{2}$$ In general the input and the output nodes of the butterfly are complex nodes. Therefore, if $$X_{m}(k) = X'_{1} + jY'_{1}; X_{m}\left(k + \frac{N}{2^{m}}\right) = X'_{2} + jY'_{2}$$ $$X_{m-1}(k) = X_{1} + jY_{1}; X_{m-1}\left(k + \frac{N}{2^{m}}\right) = X_{2} + jY_{2}$$ and $$W^{p} = e^{-j} (2/N) nk = e^{-j\theta} = \cos \theta - j \sin \theta$$ then $$X_1' = X_1 + X_2 \cos \theta + Y_2 \sin \theta \tag{3}$$ $$Y_1' = Y_1 - X_2 \sin \theta + Y_2 \cos \theta \tag{4}$$ $$X_2' = X_1 - X_2 \cos \theta - Y_2 \sin \theta \tag{5}$$ $$Y_2' = Y_1 + X_2 \sin \theta - Y_2 \cos \theta \tag{6}$$ A TRW (1003J) $12 \times 12$ Bit Multiplier-cum-Accumulator LSI chip is used as the arithmetic unit for the computation of the butterfly Eqns. (3) – (6). The processor computes each butterfly operation in 64 basic clock periods, which corresponds to 5.12 microsecs (Fig. 4). ## (b) Spectrum Computation The FFT processor discussed here computes only the real and the imaginary parts of the spectrum. To compute the magnitude function of the spectrum, one has to combine the real and the imaginary parts as $$X(W) = \sqrt{X_{rs}^2 + X_{im}^2} \tag{7}$$ Figure 4. Input latches and arithmetic unit of the processor. frequency: 10 kHz. Figure 5a. 1 kHz sine wave with its spectrum sampling Figure 5c. 200 Hz square wave with its spectrum—sampling frequency: 10 kHz. Figure 5b. 3.3 kHz sine wave with its spectrum—sampling frequency: 10 kHz. Figure 5d. Spectrum of an LFM waveform with centre frequency 3.5 kHz and bandwidth 250 Hz—sampling frequency: 10 kHz, SNR: 10 dB. The INTEL 8086 based microprocessor system is used to compute the spectrum. Out of several possible approximations of Eqn. (7) to reduce computational efforts<sup>1</sup>, the following equation is chosen $$X(W) = L + \frac{3}{8}S \tag{8}$$ Where L is the largest and S the smallest of $X_{re}$ and $X_{im}$ . The spectrum thus computed is output to the display unit. ## 4. Experimental Results The system incorporating the FFT processor was tested with various types of waveforms. The results of these experiments are presented in Figs. 5(a) - 5(d) for performance evaluation. The FFT processor works in real time upto 20 kHz sampling frequency. The present limitation is due to the 12.5 MHz basic clock rate and since the bit reversing algorithm has been implemented in software. The maximum limit of 12.5 MHz basic clock rate is due to the TTL gates used in the implementation. #### 5. Discussion The applications of FFT in digital signal processing are enormous. Some of the applications are: power spectrum estimation, filter simulation, convolution, correlation, passive detection and classification of radar and sonar signals, and Doppler measurement of radar and sonar signals. The frequencies encountered in sonar are less than 15 kHz. Hence this processor enables real time applications for sonar signal processing. The FFT processor has been used in some of the sonar systems developed at Naval Physical and Oceanographic Laboratory. ## Acknowledgements The authors express their deep sense of gratitude to Dr. D. Srinivasan, Director, Naval Physical and Oceanographic Laboratory, Cochin for his continuous encouragement towards the progress of the work. ### Reference 1. Filip, A. E., IEEE, Trans on Aerospace and Electronics System., AES-12 No. 1 (1976), 87-89.