AN OPTIMIZED SQUARE ROOT ALGORITHM FOR IMPLEMENTATION IN FPGA HARDWARE

This paper presents an optimized digit-by-digit calculation method to solve complicated square root calculation in hardware, as a proposed simple algorithm for implementation in field programmable gate array (FPGA). The main principle of proposed method is two-bit shifting and subtracting-multiplexing operations, in order to achieve a simpler implementation and faster calculation. The proposed algorithm has conducted to implement FPGA based unsigned 32-bit and 64-bit binary square root successfully. The results have shown that proposed method is most efficient of hardware resource compare to other methods. In addition, the strategy can be expanded to larger number easily.


INTRODUCTION
It is well-known that the direct torque control method (DTC) for AC motors has simple structure and good behaviors such as fast torque response, no requirements for PWM pulse generation, no requirements for coordinate transformation, no position encoder and current regulators [1][2][3][4][5][6][7].
The DTC algorithm is usually implemented by serial calculations based on a Microcontroller or Digital Signal Processing (DSP) [8][9][10][11].These are truly software-based platform and not adequate to implement a control methods which require very high speed response.As suitable solution, it is proposed FPGA to support execution very fast tasks [12][13][14].However, it is not easy to implement DTC in FPGA hardware.One of problem has been addressed mainly in complicated square root calculation.It is hard to implement on FPGA [15][16][17].
There many algorithms has proposed to solve square root, such as Rough estimation [18], Babylonian method [19], exponential identity [20], Taylor-Series Expansion Algorithm [21], Newton-Raphson method [22][23][24], and sequential algorithm (digit-by-digit calculation method) [25][26][27][28][29]. Nevertheless, the methods above usually do not focus to solve square root problem in DTC implementation based on FPGA.This paper proposes digit-by-digit calculation method as a simple strategy to solve complicated square root.The proposed implementation strategy is different compared to strategies in [25][26][27][28][29].An optimization is also done by eliminates circuitry that is not needed.It is addressed to support DTC implementation in FPGA hardware, and in hopes that it gives rise simpler implementation and faster calculation.

DIGIT-BY-DIGIT CALCULATION METHOD
In digit-by-digit calculation method, the each digit of the square root is found in a sequence where it only one digit of the square root is generated at each iteration [29].It has several advantages, such as: every digit of the root found is known to be correct and it will not have to be changed later; if the square root has to expand, it will terminate after the last digit is found; and the algorithm works for any number base (of course the process depends on number base).In general, this method can be divided in two classes, i.e. restoring and non restoring digit-by-digit algorithm [29].In restoring algorithm, the procedure is composed by taking the square root obtained so far, appending 01 to it and subtracting it, properly shifted, from the current remainder.The 0 in 01 corresponds to multiplying by 2; the 1 is a new guess bit.The new root bit developed is truly 1, if the resulting remainder is positive, and vice versa is 0, which the remainder must be restored by adding the quantity just subtracted.It is different, in non restoring algorithm does not restore the subtraction if the result was negative.Instead, it appends a 11 to the root developed so far and on the next iteration it performs an addition.If the addition causes an overflow, then on the next iteration you go back to the subtraction mode [30].The Figure 1 is the example gives to take the binary square root of 01011101 (equivalent with 93 decimal).
A little different than non restoring digit-by-digit algorithm in Figure 1 (b), a modification as shown on Figure 2 can be conducted to give simpler implementation and faster calculation.In this modification, it only uses subtract operation and append 01, while add operation and append 11 is not used.This paper adopts this modification to implement unsigned 64-bit binary square root based on FPGA.ALGORITHM Samavi,et al. [29] has improved classical non-restoring digit-by-digit square root circuit by eliminate redundant blocks.Their circuit is referred to as the reduced area non restoring circuit.However, it still based on constant digit of 01 or 11 and add-subtract as the main building block (still refer to Figure 1 b).This paper offers a simple alternative solution that it only uses subtracts operation and appends 01.As consequent, the subtract-multiplex is used as the main building block (refer to Figure 2).The principle of proposed algorithm can be described as shown in Figure 3.

PROPOSED SQUARE ROOT
Step 0. Start Step 1. Initialization radicand (the n-bit number will be squared root), quotient (the result of squared root), and remainder.To calculate square root of a 2n bit number, it needs n stage pipelines to implement the proposed algorithm.
Step 2. Beginning at the binary point, divide the radicand into groups of two digits in both direction.
Step 3. Beginning on the left (most significant bit), select the first group of one or two digit (If n is odd then the first groups is one digit, and vice versa) Step 4. Choose 1 squared, and then subtract.
Fist developed root is "1" if the result of subtract is positive, and vice versa is "0" Step 5. Shift two bits, subtract guess squared with append 01.
Nth-bit squared is "1" if the result of subtract is positive, and Because of subtract operation is done else Nth-bit squared is "0", and not subtract Step 6. Go to step 5 until end group of two digits Step 7. End Figure 3.The principle of proposed algorithm to solve square root A simple hardware implementation of the non-restoring digit-by-digit algorithm for unsigned 6-bit square root by an array structure is shown in Figure 4.The radicand is P (P5,P4,P3,P2,P1,P0), U (U2,U1,U0) as quotient and R (R4,R3,R2,R1,R0) as remainder.It can be shown that the implementation needs 3 stage pipelines.The main building blocks of the array are blocks called as controlled subtract-multiplex (CSM).Figure 5 present the details of a CSM.Input of the building block is x,y,b and u, and as output is bo (borrow) and d (result).If u=0, then d<=x-y-b else d<=x.
The generalization of simple implementation of the non-restoring digit-by-digit algorithm for unsigned n-bit square root by an array structure is shown in Figure 6.Each row (stage) of the circuit in Figure 6 executes one-iteration of the non-restoring digit-by-digit square root algorithm, where it only uses subtracts operation and appends 01.To be optimizer hardware resource saving of the implementation above, specialized entities can be created as building block components.It will eliminate circuitry that is not needed.As example, the implementation in Figure 6 for unsigned 6-bit square root can be optimized become as shown in Figure 7 (in this case, the remainder is ignored, because in the DTC drive, it is not required).The specialized entities A, B, C, D and E are minimized CSM when input ybu=100, yu=00, u=0, yu=10, and y=0 respectively, and the remainder is ignored.
The generalization of optimized simple implementation of the non-restoring digit-by-digit algorithm for unsigned n-bit square root is shown in Figure 8.

RESULTS AND ANALYSIS
In the previous sections, optimized simple hardware implementation method of the nonrestoring digit-by-digit algorithm for square root and the difficult task in DTC to calculate square root were explained.In this section, simulation results of 32-bit and 64-bit square root based on Altera APEX 20KE FPGA by using method above are presented, as shown in Figure 9.In this simulation, P is radicand and U is quotient.The results showed that the implementation has succeeded and worked properly.
Based on compilation report, to implement 32-bit and 64-bit square root using optimized simple hardware implementation method of the non-restoring digit-by-digit algorithm are needed 256 and 1023 logic element (LE) respectively.The comparison of results obtained from different implementation method is shown in Table 1.This comparison of LE or logic cell (LC) usage is listed based on references [29] and [30].The number of employed LE indicates the size of the implemented circuit "hardware resource".Table 1 showed that proposed method is most efficient of hardware resource.Based on Figure 8, the strategy is very easy to be expanded for larger number to solve complicated square root problem in FPGA implementation.

CONCLUSION
This contribution presented digit-by-digit calculation method as a proposed simple strategy for implementation in field programmable gate array (FPGA) hardware mainly to solve complicated square root.The main principle of proposed method is two-bit shifting and subtracting-multiplexing operations.The proposed strategy has conducted to implement FPGA based unsigned 32 bit and 64-bit binary square root successfully.The results have shown that proposed method is most efficient of hardware resource compare to other methods.The method also can be expanded to larger number easily, to solve complicated square root problem in FPGA implementation.

Figure 1 .
Figure 1.The example of digit-by-digit calculation to solve square root: (a) restoring algorithm; (b) non restoring algorithm

Figure 4 .Figure 5 .
Figure 4.A simple hardware implementation of the non-restoring digit-by-digit algorithm for unsigned 6-bit square root

Figure 7 .
Figure 7. Optimized simple hardware implementation of the non-restoring digit-by-digit algorithm for unsigned 6-bit square root

Figure 9 .
Figure 9. Simulation result of n-bit square root using optimized simple hardware implementation method of the non-restoring digit-by-digit algorithm: (a) 32-bit in decimal display, (b) 32-bit in binary display, (c) 64-bit in decimal display, (d) 64-bit in binary display

Table 1 .
The comparison of logic element usage