# New High-Performance Full Adders Using an Alternative Logic Structure

# Nuevos Sumadores de Alto Desempeño Utilizando una Estructura Lógica Alternativa

Mónico Linares Aranda<sup>1</sup> and Mariano Aguirre Hernández<sup>2</sup>

<sup>1</sup>Instituto Nacional de Astrofísica, Óptica y Electrónica Apartado postal 51 y 216, C.P. 72000, Puebla, Pue. México mlinares@inaoep.mx, <sup>2</sup>Intel Corporation, Communications Research Center, México mariano.aguirre@intel.com

Article received on March 24, 2009; accepted on October 20, 2009

**Abstract.** This paper presents two new high-speed lowpower 1-bit full-adder cells using an alternative logic structure, and the logic styles DPL and SR-CPL. The adders were designed using electrical parameters of a 0.35 $\mu$ m Complementary Metal-Oxide-Semiconductor (CMOS) process, and were compared with various adders published previously, with regards of power-delay product. To validate the performance simulation results of one of the proposed adders, an 8-bits pipelined multiplier was fabricated using a 0.35 $\mu$ m CMOS technology, and it showed to provide superior performance.

Keywords: Full-adder, Low-power, Multiplier, Pipeline.

**Resumen.** En este artículo se presentan dos nuevos sumadores de 1-bit de alta velocidad y bajo consumo de potencia, utilizando en su diseño una estructura lógica alternativa y los estilos lógicos de circuitos DPL y SR-CPL. Los nuevos sumadores fueron comparados con diversos sumadores recientemente publicados en la literatura considerando el producto potencia-retardo, principal figura de mérito de circuitos aritméticos. Con el fin de validar los resultados obtenidos de simulación, uno de los sumadores fue aplicado al diseño y fabricación de un multiplicador en "pipeline" de 8-bits utilizando la tecnología CMOS de 0.35µm. Los resultados experimentales obtenidos mostraron un desempeño superior.

**Palabras clave:** Sumador completo, Baja potencia, Multiplicador, Pipeline.

# 1 Introduction

Addition is a fundamental arithmetic operation widely used in many VLSI systems, such as applicationspecific DSP architectures and microprocessors. This module is the core of operations such as subtraction, multiplication, division, address generation, etc. In the majority of these systems, the adder is part of the critical path that determines the system's overall performance. Therefore, enhancing the speed of the full adder cell results of great interest [Nishitani T. *et al.*, *1999*].

On the other hand, the ever-increasing demand for mobile products working with a high throughput capability and a limited source of power, makes the design of low-power adder cells another significant goal to be attained. There are three major components of power dissipation in CMOS circuits [Him C. *et al., 2001*]: switching power, short-circuit power and leakage power. Reducing whichever of these components will end up with lower power consumption for the whole system.

This paper presents two new high-speed lowpower adder cells using an alternative logic structure and two different logic styles. The resultant full adders show to be more efficient on regards of power consumption and delay, when compared with other ones recently reported as good candidates to build low-power arithmetic modules.

The rest of this paper is organized as follows: Section 2 reviews the published work about the design of 1-bit full adders, and the logic structure adopted like standard to implement those cells. Section 3 presents an alternative logic scheme to build full adders. Section 4 presents two new 1-bit full adders using the alternative internal logic structure proposed. Section 5, explains the features of the simulation environment used to obtain the power-delay performance of the full adders being compared, and shows the analysis of the simulation results from the comparison carried out. Section 6 shows the implementation of a 2's-complement 8-bits pipelined multiplier array used to validate the results of one proposed adder. Finally, Section 7 concludes this work.

### 2 Previous CMOS 1-bit full adder cells

Several papers have been published regarding the design of low-power full adders, dealing with the logic style (Standard CMOS: *CMOS* [Weste N. *et al.*, 1988], Differential Cascode Voltage Switch: *DCVS* [Chu K. *et al.*, 1987], Complementary Pass-Transistor Logic: *CPL* [Yano K. *et al.*, 1990], Double Pass-Transistor Logic: *DPL* [Suzuki M. *et al.*, 1993], Swing Restored CPL: *SR-CPL* [Zimmerman R. *et al.*, 1997], etc.), and with the logic structure used to build the adder module [Shams A. *et al.*, 2002], [Goel S. *et al.*2006], [Moalemi V. *et al.*, 2007].

The internal logic structure shown in Figure 1(a) has been adopted as the standard configuration in most of the enhancements developed for the 1-bit full adder module. This structure was built based on the *transmission function theory* [Zhuang N. *et al., 1992*]. The adder module is formed by three main logical blocks: a XOR-XNOR gate to obtain  $A \oplus B$  and its complement (Block 1), and XOR blocks or multiplexers to obtain the SUM (*So*), and CARRY (*Co*) outputs (Blocks 2 and 3).





| С   | В | Α | So | Со |
|-----|---|---|----|----|
| 0   | 0 | 0 | 0  | 0  |
| 0   | 0 | 1 | 1  | 0  |
| 0   | 1 | 0 | 1  | 0  |
| 0   | 1 | 1 | 0  | 1  |
| 1   | 0 | 0 | 1  | 0  |
| 1   | 0 | 1 | 0  | 1  |
| 1   | 1 | 0 | 0  | 1  |
| 1   | 1 | 1 | 1  | 1  |
| (b) |   |   |    |    |

Fig. 1. Full-adder cell (a) standard logic structure, (b) true table

Computación y Sistemas Vol. 14 No. 3, 2011 pp 213-223 ISSN 1405-5546

Since the proposal presented in [Zhuang, 1992], several papers have introduced new full adder cells proposing different realizations for the three logic blocks of Figure 1(a). Chronologically, some of them are: 14TA [Shams A., 1995], 14TB [Shams E. et al.,1995], Wu\_Ng [Wu A. et al.,1997], 16T [Shams A. et al., 1998], 10TA [Mahmoud H. et al., 1999], 10TB [Shams A. et al., 2001], Full\_ Rest [Rhadakrishnan D. et al., 2001], Mux\_Based [Alhalabi D. et al., 2001], Wey Chow [Wey I. et al., 2002], Chang-Gu [Chang C. et al., 2003], and Goel-Kumar [Goel S. et al., 2006]. After a deep comparative study presented in [Aguirre, 2004], the most efficient realization for the block 1 was the one implemented with SR-CPL logic style. However, in the same paper [Aguirre M. et al.2004], another important conclusion was pointed out: "the major problem regarding the propagation delay for a full adder built with the logic structure shown in Figure 1, is that it is necessary to obtain intermediate  $A \oplus B$ 

and  $\overline{A \oplus B}$  signals, which are then used to drive other blocks in order to generate the final outputs". Thus, the propagation delay and power consumption of the full adder depends on the delay and voltage

swing of the  $A \oplus B$  and  $\overline{A \oplus B}$  signals generated within the cell. Moreover, this internal logic structure produce non-balanced delay paths respect to generation of *So* and *Co* signals, which is responsible of generation of unwanted glitches in tree structured circuits [Agarwal S. *et al., 2008*]. Therefore, to increase the operational speed of the full adder, it is necessary to develop a logic structure that does not require the generation of intermediate signals to control the selection or transmission of other signals located on the critical path, and avoids producing non-balanced delay paths.

# 3 An alternative logic structure for a full adder cell

Examining the full adder's true-table in Figure 1(b), it can be seen that the *So* output is equal to the  $A \oplus B$ 

value when C=0, and it is equal to  $\overline{A \oplus B}$  when C=1. Thus, a multiplexer can be used to obtain the respective value taking the *C* input as the selection signal. Following the same criteria, the *Co* output is equal to the  $(A \cdot B)$  value when C=0, and it is equal to (A + B) value when C=1. Again, *C* can be used to select the respective value for the required

condition, driving a multiplexer. Hence, an alternative logic scheme to design a full adder cell [Aguirre, 2005] can be formed by a logic block to obtain the  $(A \oplus B)$  and  $(\overline{A \oplus B})$  signals, other block to obtain the  $(A \bullet B)$  and (A + B) signals, and two multiplexers being driven by the *C* input to generate the *So* and *Co* outputs, as shown in Figure 2.



Fig. 2. Alternative internal logic structure for full adder cells

The features and advantages of this alternative logic structure are:

- There are no internally generated signals controlling the selection of the output multiplexers. Instead, the *C* input signal is used to drive the multiplexers, thus reducing the overall propagation delay.
- The capacitive load for the *C* input has been reduced, as it is connected only to some transistor gates and no longer to some drain or source terminals, where the diffusion capacitance is very large for sub-micrometer technologies. Therefore, the overall delay for larger modules where the *C* signal falls on the critical path can be reduced.
- The propagation delay for the So and Co outputs can be tuned up individually by adjusting the XOR/XNOR and the AND/OR gates; this feature is advantageous for applications where the skew between arriving signals is critical for a proper operation (e.g. wave-pipelining).
- The inclusion of buffers at the full adder outputs can be implemented by interchanging the XOR/XNOR signals, and the AND/OR gates to NAND/NOR gates at the input of the multiplexers, improving in this way the performance for load-sensitive applications.
- By using this scheme, there is no need to wait for the computation of one output signal (e.g. carry signal) to obtain the other one, as it just selects

between the outputs of simple logic functions, leading to a faster operation.

### 4 New 1-bit full adder cells

Based on [Aguirre, 2004] and the analysis presented in section 3, two new full adders have been developed using the logic styles *DPL* and *SR-CPL*, and the logic structure presented in Figure 2. Figures 3 and 4 show the schematics for the proposed adders. Figure 3 presents a full adder designed using a *DPL* logic style to build the XOR/XNOR gates, and Figure 4 shows a full adder using the *SR-CPL* logic style to build these gates. In both cases, the AND/OR gates have been built using a powerless and groundless pass-transistor configuration, respectively, and also pass-gates based multiplexers to get the *So* and *Co* outputs.

The reason we used pass-transistor logic families is due to the nature of these designs, to allow good control over leakage paths. Intuitively, because logic operations are carried-out by passing charge from one node to another without dumping any to the ground, pass-transistor logic is expected to yield lower energy consumption. Moreover, since the connections to either, a voltage supply source or ground, are at the boundaries of the logic in a purely pass-transistor based circuit, the possible leakage paths are limited to these locations. Since leakage power is becoming an increasingly difficult problem to address, this can be very useful. In most processes, transistors with higher threshold voltages can be placed at either the head or foot of a circuit to reduce leakage, and as long as they are not in the critical path it makes a more effective leakage control. With the use of pass-transistor based logic families this procedure is further simplified since the number of paths to power and ground are limited [Mandeep S. et al., 2007].

With regards of propagation delay, it is worth to mention the influence that signal feed-through phenomenon will have over these full-adders, when they are designed with modern deep submicron technologies. Although at the output nodes the signal feed-through effect is reduced due to the usage of pass-gates based multiplexers, where the positive and negative feed-through are self cancelled, more care should be put over the design of XOR/XNOR and AND/OR gates to reduce this effect, ensuring that the  $C_{gs}$  and  $C_{gd}$  coupling capacitances are smaller than the capacitances at

the gate's output nodes, as the case of our proposals [Hodges, 2003].



Fig. 3. Full adder designed with the proposed logic structure and a *DPL* logic style (*Ours1*)



Fig. 4. Full adder designed with the proposed logic structure and a *SR-CPL* logic style (*Ours2*)

### 5 Simulation results and analysis

Several full adders were compared on regards of power consumption and delay. They were named: *Cmos26* and *Cmos28* [Weste N. *et al.*, 1988], *Cpl* [Yano K. *et al.*, 1990], *Cpl\_sr* [Zimmerman R *et al.*, 1997], *Dcvs* [Chu K. *et al.*, 1987], *Bay10a* [Mahmoud H. *et al.*, 1999], *Bay10b* [Shams A. *et al.*, 2001], *Bay14a* [Shams A. *et al.*, 1995], *Bay14b* [Shams E.

Computación y Sistemas Vol. 14 No. 3, 2011 pp 213-223 ISSN 1405-5546 et al., 1995], Bay16 [Shams A. et al., 1998], Full\_Rest [Rhadakrishnan D. et al., 2001]. Mux\_based [Alhalabi B. et al., 2001], Tran\_Funct [Zhuang N. et al., 1992], Wey\_Chow [Wey I. et al., 2002], Wu\_Ng [Wu A. et al., 1997], our first proposal (Figure 3) using a XOR/XNOR gate designed with logic style DPL (Ours1), and the second proposal (Figure 4) using a XOR/XNOR gate designed with logic style SR-CPL (Ours2). The full adders were designed using an AMS 0.35 µm CMOS technology, simulated using the BSIM3v3 model (level 49), and supplied with 3.3 volts. Simulations were carried out using Nanosim [Nanosim, 2008] to determine the power consumption features, and Hspice [Hspice, 2007 to measure the propagation delay for the output signals (the longest time required for one output signal to reach the 50% of its voltage swing measured from the moment when one of the input signals reached the 50% of its voltage swing). The Figure 5 shows the test bed used for the adder's comparison. This simulation environment has been commonly used to compare the performance of the full adders analyzed in [Shams A. et al., 1998] and [Shams A. et al., 2002].



Fig. 5. Simulation setup

| Scheme     |     | Avg.   | Pwr.   | Dynamic | Static | Short-  | Delay | Pwr*   | ∑width | Vdd <sub>min</sub> |
|------------|-----|--------|--------|---------|--------|---------|-------|--------|--------|--------------------|
|            | -   | power  | supply |         |        | circuit |       | Delay  |        |                    |
| Cmos26     | top | 1286.3 | 1286.3 | 875.2   | 4.3    | 406.9   | 0.703 | 904.3  | 67.8   | 1.3                |
|            | add | 1051.1 | 811.5  | 683.8   | 4.3    | 363.0   |       |        |        |                    |
| Cmos28     | top | 1736.8 | 1736.8 | 1420.7  | -      | 315.8   | 0.984 | 1709.2 | 184.8  | 1.3                |
|            | add | 1024.7 | 1024.7 | 746.8   | -      | 277.9   |       |        |        |                    |
| Cpl        | top | 2975.9 | 2975.9 | 984.4   | 1991.6 | 37.6    | 0.781 | 2324.3 | 113.2  | 1.8                |
| _          | add | 2650.2 | 2504.4 | 702.6   | 37.6   | 1910.0  |       |        |        |                    |
| Cpl_sr     | top | 2264.1 | 2264.1 | 1097.9  | -      | 1165.9  | 0.812 | 1838.4 | 116.4  | 1.4                |
|            | add | 1937.8 | 1804.8 | 810.2   | -      | 1127.6  |       |        |        |                    |
| Cpl uye    | top | 2179.9 | 2179.7 | 1190.3  | -      | 989.3   | 0.853 | 1859.5 | 116.4  | 1.4                |
|            | add | 1946.0 | 1688.6 | 982.7   | -      | 963.3   |       |        |        |                    |
| Dcvs       | top | 2965.5 | 2965.4 | 1579.7  | -      | 1385.3  | 1.107 | 3282.8 | 182.4  | 1.3                |
|            | add | 2515.3 | 2515.6 | 1179.1  | -      | 1336.2  |       |        |        |                    |
| Bay10a     | top | 1576.4 | 1576.4 | 632.0   | 55.1   | 889.4   | 1.955 | 3081.9 | 51.4   | 2.8                |
| -          | add | 1321.7 | 1237.5 | 436.9   | 55.1   | 829.6   |       |        |        |                    |
| Bay10b     | top | 1565.5 | 1565.5 | 960.6   | 2.0    | 602.6   | 1.157 | 1811.3 | 84.8   | 2.4                |
| -          | add | 1336.2 | 953.7  | 773.2   | 2.0    | 561.0   |       |        |        |                    |
| Bay14a     | top | 1221.0 | 1221.0 | 848.8   | 0.7    | 371.6   | 1.220 | 1489.6 | 68.7   | 2.4                |
| -          | add | 989.7  | 684.8  | 658.0   | 0.7    | 331.0   |       |        |        |                    |
| Bay14b     | top | 1290.6 | 1290.6 | 975.5   | 0.7    | 314.2   | 1.366 | 1763.0 | 84.0   | 2.4                |
| -          | add | 1061.6 | 663.3  | 785.7   | 0.7    | 275.2   |       |        |        |                    |
| Bay16      | top | 1343.8 | 1343.8 | 919.4   | 0.3    | 423.7   | 1.688 | 2268.3 | 80.4   | 2.8                |
| -          | add | 1118.0 | 699.6  | 724.4   | 0.3    | 393.4   |       |        |        |                    |
| Full Rest  | top | 1795.5 | 1795.5 | 894.3   | 266.3  | 634.6   | 1.022 | 1835.0 | 72.7   | 1.8                |
| _          | add | 1569.2 | 1027.0 | 687.7   | 266.3  | 615.1   |       |        |        |                    |
| Mux Based  | top | 1560.2 | 1560.2 | 750.8   | 4.6    | 804.5   | 1.362 | 2125.0 | 65.6   | 2.4                |
| _          | add | 1329.2 | 1100.2 | 566.6   | 4.6    | 758.0   |       |        |        |                    |
| Tran Funct | top | 1225.0 | 1225.0 | 796.6   | -      | 428.0   | 0.932 | 1141.7 | 63.0   | 1.3                |
| _          | add | 992.0  | 808.0  | 612.2   | -      | 379.8   |       |        |        |                    |
| Wey Chow   | top | 1346.4 | 1346.4 | 942.2   | -      | 404.3   | 1.024 | 1378.7 | 75.8   | 1.7                |
|            | add | 1119.4 | 739.9  | 747.1   | -      | 372.2   |       |        |        |                    |
| Wu Ng      | top | 1161.6 | 1161.6 | 959.6   | -      | 202.0   | 1.067 | 1239.4 | 79.4   | 1.7                |
| _ 0        | add | 910.5  | 728.6  | 741.8   | -      | 168.6   |       |        |        |                    |
| Ours1      | top | 843.8  | 843.8  | 751.1   | -      | 92.7    | 0.716 | 604.2  | 52.8   | 1.5                |
|            | add | 567.2  | 280.2  | 510.8   | -      | 56.4    | ]     |        |        |                    |
| Ours2      | top | 835.6  | 835.6  | 710.8   | -      | 124.7   | 0.734 | 613.3  | 50.4   | 1.5                |
|            | add | 556.4  | 364.7  | 466.6   | -      | 89.8    |       |        |        |                    |

Table 1. Comparison of simulation results of adders (Power in  $\mu W$ , Delay in *nS*, Width in  $\mu m$ , V<sub>DD</sub> in *Volts*)

### 218 Mónico Linares Aranda and Mariano Aguirre Hernández

Table 1 shows the simulation results from the full adder's performance comparison regarding power dissipation and delay. The meanings of the columns are:

- scheme: it indicates the device under test (DUT), and separates the measured values for the whole test-bed (top) including the inverters connected at the inputs and outputs, and the adder cell (DUT) alone.
- avg. power: it shows the average power taken from the power supply and the DUT inputs.
- *pwr. supply*: it shows the average power portion taken only from the power supply.
- *dynamic*: it indicates the power dissipation due to charging and discharging the capacitances within the DUT.
- static: it refers to the power consumption incurred when the input signals are fixed, but there are some turned-on transistors, which lead to a direct path from Vdd to Gnd.
- short-circuit: it reflects the power dissipation due to direct paths from Vdd to Gnd that are created momentarily when input signals are transitioning with a finite slope.
- *delay*: it indicates the longest propagation delay of the adder cell under test.
- pwr\*delay: this metric relates the performance regarding the power consumption and propagation delay of a cell, providing the energy required to perform the logic function.
- $\Sigma$  width: It is the sum of all transistors width for each circuit, giving an idea of the required implementation area; all the transistor lengths are kept at minimum (0.35 µm).
- Vdd min: The minimum voltage supply that maintains the correct functionality of the full adder, being able to drive the buffers connected at the outputs with proper logic values.

The following statements about power consumption and delay can be extracted from the results on Table 1:

- The full adders designed with pass transistor logic styles (*Cmos26*, *Cpl*, *Cpl\_sr*, *Ours1* and *Ours2*) exhibit a shorter delay than the other ones; this can be expected because of the fact of reduced internal parasitic capacitances as stated in [Zimmerman, 1997] for these logic styles.
- On the other hand, the full adders designed using a different logic structure that the shown in Figure 1 (*Bay10a*, *Bay10b*, *Bay14a*, *Bay14b*, *Bay16*, *Full\_Rest*, *Tran\_Funct*, *Wey\_Chow*) have

ISSN 1405-5546

larger propagation delays (around or exceeding 1 ns) as expected, due to the internal XOR/XNOR gates that generate intermediate signals having an extra delay, used to control the output blocks.

- The full adders presenting an incomplete voltage swing (Bay14a, Bay14b, Bay16) present lower power consumption than other ones (Cmos26, Tran\_Funct), but only when the surrounding circuitry dissipation is neglected (row "add"). If the whole test-bed dissipation (row "top") is considered, then those proposals have no longer better performance than the other ones. Even more, the propagation delay for those adders is longer, due to the current-driving capability degradation of the multiplexers being controlled by the nodes exhibiting an incomplete voltage swing, making the powerdelay product even worse than the value exhibited for the other adders.
- Furthermore, the minimum voltage supply that maintains the right operation (column "Vdd min") for those circuits is higher than the supply required for the ones having internal nodes with a full voltage swing.
- Regarding the proposals in this work, it can be clearly seen the advantage of the alternative logic structure derived above, since both realizations designed using this scheme (*Ours1* and *Ours2*) exhibit the lowest power consumption, and power-delay product.
- In addition, since these realizations have neither static dissipation, nor internal direct paths from Vdd to Gnd (except for the inverters at the inputs, which could be avoided if the inputs come from flip-flops with complementary outputs), they are good candidates for battery-operated applications where low dissipation modules at stand-by modes are required. Even more, the power consumption can be further reduced for these circuits, as they can operate properly with voltage supplies as low as 1.5V.
- Now, addressing the required implementation area (column "*Dwidth*"), it can be noticed that the pass-transistor based circuits occupy less area than the static ones. In particular, the proposed full adders require the smallest area, which can also be considered as one of

the factors for presenting lower delay and power consumption, as it implies smaller parasitic capacitances being driven. In this analysis, as there are no layouts available, the summatory of transistor widths " $\Sigma$ width" is preferred over transistor counts to compare relative implementation area, because it considers that for some logic styles the same speed can be achieved with smaller transistors because of the reduced parasitic capacitances.

 The importance of the simulation setup and the inclusion of the power dissipation components for the surrounding circuitry are now evident, as some realizations reported previously as low-power cells, have shown no to perform better than other ones when considering the whole test-bed dissipation.

Figure 6 shows the *So* and *Co* output waveforms for the proposed full adder *Ours1*, for an input pattern corresponding to the adder true-table shown in Figure 1. With the power and delay features shown in the simulations performed, the proposed adders look to be good candidates to build lowpower high-speed arithmetic modules.



Fig. 6. Sum (So) and Carry (Co) output waveforms for the full adder Ours1

Table 2 shows the power consumption when operated at 125 MHz, and the highest operational frequency attained for five 8-bit carry-ripple adders designed using some of the 1-bit full- adders compared. As can be seen, the results for the power dissipation and maximum operational frequency for each 8-bit adder are consistent with the results obtained for the individual 1-bit full adder cells. It is worth to point out that the first proposal in this paper (*Ours1*) exhibits the least power dissipation and the highest operational frequency among the others, as expected.

| Adder      | Power@125MHz<br>(mW) | F <sub>max</sub><br>(MHz) |
|------------|----------------------|---------------------------|
| Cmos26     | 3.73                 | 416                       |
| Bay16      | 4.38                 | 166                       |
| Full_Rest  | 4.56                 | 384                       |
| Tran_Funct | 3.68                 | 384                       |
| Ours1      | 3.42                 | 454                       |

 Table 2. Simulation results for 8-bits carry-ripple carry

# 6 Application: An 8x8-bits pipelined multiplier

To validate the full-adders simulation results with a more complex structure, the proposed adder *Ours1* was used to implement a 2's-complement 8-bits pipelined multiplier (Figure 7). This architecture was selected because the registers placed between the rows of the array, allow the maximum operating frequency to be determined by the longest delay of the full adder cell being used. The Figure 8 shows the layout for the full adder *Ours1* used. The outputs of the adder are latched (C<sup>2</sup>MOS inverting latches), required for the pipelined scheme.

The multiplier was designed using an AMS 0.35  $\mu$ m CMOS technology, and was simulated using the *BSIM3v3* model (level 49) and supplied with 3.3 Volts. Several Hspice and Nanosim simulations were carried-out to obtain the performance of the pipelined multiplier for the following scenarios:

- With transitioning data at the inputs, and the clock signals (*CLK* and  $\overline{CLK}$ ) activated, to simulate a normal operation.
- With stable data at the inputs, and the clock signals activated, to simulate a *module isolation* technique applied to the multiplier and its surrounding circuitry.
- With stable data at the inputs, and the clock signals stopped, to simulate a *clock gating* technique applied to the multiplier.

220 Mónico Linares Aranda and Mariano Aguirre Hernández



Fig. 7. Block diagram of an 8×8 bits pipelined multiplier



Fig. 8. Layout of 1-bit full adder cell Ours

Table 3 summarizes the results obtained for the Nanosim simulations. The meanings of the columns are the same as explained in Section 5. It is worth to mention that the highest operational frequency obtained for the multiplier using the proposed full

Computación y Sistemas Vol. 14 No. 3, 2011 pp 213-223 ISSN 1405-5546 adder was able to operate up to 1.2GHz, with a power dissipation of 195 mW.

| Table 3. Power consumption results from NanoSim      |
|------------------------------------------------------|
| simulation for the pipelined multiplier (mW@1.0 GHz) |

| Operation<br>mode | Average<br>power | Power<br>supply | Dynamic<br>power | Static<br>power | Short-<br>circuit<br>power |
|-------------------|------------------|-----------------|------------------|-----------------|----------------------------|
| Normal            | 161              | 161             | 151              | 0               | 10                         |
| Stable<br>inputs  | 130              | 30              | 121              | 0               | 9                          |
| Clock<br>gated    | 26               | 26              | 14               | 0               | 6                          |

The pipelined multiplier using the proposed adder *Ours1* was fabricated using a 0.35  $\mu$ m CMOS technology from Austria Micro Systems. A photograph of the fabricated chip is shown in Figure 9. The whole chip size including the I/O pads is 1.2mm<sup>2</sup>, while the core area is 0.6mm<sup>2</sup>.

The core logic includes the following components [Aguirre, *2006*]:

- An 8×8-bits array multiplier.
- A parallel-to-serial converter; this is a 20-bits shift register that takes the multiplier's parallel output and converts it into a serial stream. This circuit was incorporated in order to reduce the I/O pin count. Once the shift register is loaded with the multiplier's result, its content is rotated from rightto-left, and the most-significant bit (MSB) is connected to one of the output pads to be monitored with an external oscilloscope. Four guard bits were included in the serial stream to distinguish the beginning of the 16-bits result between rotations through the shift-register.
- A Voltage-Controlled Oscillator (VCO) [Pacheco, 2004]; this circuit generates a high-frequency clock signal in the range of 200MHz to 1.6GHz for control voltages between 0.8V and 1.6V, which is enough to exercise the multiplier operation at its highest operation frequency (1.2GHz).
- A Clock distribution network; it is a set of large inverters that drive the *CLK* and  $\overline{CLK}$  signals through every other row of the multiplier array.
- A Clock divider; this circuit divides the frequency of the clock signal that is being applied to the multiplier and to the parallel-to-serial converter. It was included mainly to reduce the frequency of the clock signal that the VCO is generating, so it can be measured using an oscilloscope, as the

maximum frequency response for I/O pads is 250  $\,$  MHz.

A Clock signal multiplexer; due to the limitation of maximum frequency response for the I/O pads, it is not possible to get the serial stream with the multiplication result through an output pin when the VCO is generating frequencies higher than 250MHz. So, this multiplexer is used to change the operation frequency of the multiplier array and the parallel-to-serial converter by switching to an external supplied clock signal, once the multiplication result has been captured into the shift-register.

Table 4 presents the power consumption values obtained for the multiplier array operating in normal conditions scenario. It was not possible to get consistent measurements for other operation scenario, as the shift-register exhibited a very high short-circuit current in clock-gating condition, due to its internal construction.

Figures 10 and 11 present experimental waveforms of the multiplier's function. The operations were performed with the multiplier array running at 1.2 GHz, and the result was stored in the shift-register by activating its load signal. Then, the clock multiplexer was switched to a 50 MHz clock signal applied at the CLK<sub>ext</sub> pin, in order to get the result in a serial stream at the Z<sub>serial</sub> pin. The CLKdiv wave is the clock signal divided by 16, which marks the boundaries of the 16-bits word for the result.



| Frequency<br>(GHz) | Average power<br>(mW) |
|--------------------|-----------------------|
| 1.0                | 153                   |
| 1.2                | 180                   |



Fig. 9. Photograph of the fabricated chip



**Fig.10.** Measured waveforms for a multiplication of  $29 \times 19 = 551 (0000 \ 0010 \ 0010 \ 0111)_2$  @ 50 MHz



Fig.11. Measured waveforms for a multiplication of  $-65 \times 91 = -5915$  (1110 1000 1110 0101)<sub>2</sub> @ 50 MHz

Computación y Sistemas Vol. 14 No. 3, 2011 pp 213-223 ISSN 1405-5546 222 Mónico Linares Aranda and Mariano Aguirre Hernández

## 7 Conclusions

The design and performance comparison of two high-speed low-power full adder cells based upon an alternative logic approach, and *DPL* and *SR-CPL* logic styles, have been presented. The full adders exhibit a delay around 720ps and power consumption around 840 $\mu$ W, for an overall reduction of 30% respect to the best featured one of the other adders been compared, but in general about 50% respect to the other ones.

In order to validate the adder's performance, a 2's complement 8×8-bits pipelined multiplier was designed and fabricated using a 0.35µm CMOS technology. Hspice and Nanosim simulations showed that this multiplier is able to work up to 1.2GHz, with power dissipation around 195mW when supplied with 3.3V. The multiplier also exhibits savings up to 20%, 25% and 80% when operating at 1GHz for normal, stable data input, and clock-gated modes, respectively, when compared to other three pipelined multipliers designed using full adder cells reported previously. The fabricated chip showed to be functional when it was clocked at 1.2GHz, and exhibited a power consumption of 180mW. The measured power when operating in normal conditions at 1GHz was around 153mW.

## Acknowledgment

This work was partially supported by Conacyt-Mexico under grant #51511-Y and scholarship #112784.

### References

- Abu-Shama E., Elchouemi A., Sayed S. & Bayoumi M. (1995). An efficient low power basic cell for adders, *IEEE International Symposium on Circuits and Systems*, Rio de Janeiro, Brazil, 306-308.
- 2. Agarwal S., Pavankumar V. K., & Yokesh R. (2008). Energy-efficient, high performance circuits for arithmetic units, *21st International Conference on VLSI Design*, Hyderabad, India, 371-376.
- 3. Aguirre-Hernández, M. & Linares-Aranda, M. (2004). Low-power low-voltage 1-bit CMOS full adder for energyefficient multimedia applications. *ICED/CASTOUR*, Veracruz, México. Retrieved from: <u>http://www-elec.inaoep.mx/iced04/eng/program.htm</u>
- Aguirre H. M. (2006). Diseño de un codificador fractal de imágenes con uso eficiente de energía, Tesis de

Computación y Sistemas Vol. 14 No. 3, 2011 pp 213-223 ISSN 1405-5546 Doctorado, Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Pué., México.

- Aguirre M. & Linares M. (2005), An alternative logic approach to implement high-speed low-power full adder cells, 18th Symposium on Integrated Circuits and Systems Design SBCCI'05, Florianopolis, Brazil, 166-171.
- Alhalabi B. & Al-Sheraidah A. (2001), A novel low-power multiplexer-based full adder cell, 8<sup>th</sup> IEEE International Conference on Electronics, Circuits and Systems, Malta, 1433-1436
- Chang C. H., Zhang M., & Gu J. (2003). A novel low power low voltage full adder cell. 3<sup>rd</sup> International Symposium on Image and Signal Processing and Analysis, Rome, Italy, 454-458.
- Chu K. M. & Pulfrey D. L. (1987). A comparison of CMOS circuit techniques: differential cascode voltage switch logic versus conventional logic, *IEEE Journal of Solid-State Circuits*, 22(4), 528-532.
- Fayed A. & Bayoumi M. (2001). A Low-Power 10transistor full adder cell for embedded architectures, *IEEE International Symposium on Circuits and Systems* ISCAS 2001, Sydney, NSW, Australia, 226-229.
- Goel S., Kumar A., & Bayoumi M. A. (2006). Design of robust, energy-efficient full adders for deep-submicrometer design using hybrid-CMOS logic style, *IEEE Transactions* on Very Large Scale Integration Systems, 14(12),1309-1321.
- Him C., Kim H. & Ha S. (2001). Dynamic voltage scheduling technique for low-power multimedia applications using buffers, *International Symposium on Low Power Electronics and Design. ISLPED'01*, Huntington Beach, CA, USA, 34-39.
- 12. Hodges, D. A., Jackson H. G. & Saleh, R. A. (2003). Analysis and Design of Digital Integrated Circuits in Deep Submicron Technology, (Third edition), Boston: McGraw-Hill Higher Education. HSPICE User Guide: Simulation and analysis, Version B-2008.09. Synopsys September 2008. Retrieved\_from: http://cseweb.ucsd.edu/classes/wi10/cse241a/assign/hspi ce\_sa.pdf
- Mahmoud H. A. & Bayoumi M. A. (1999). A 10-transistor low-power high-speed full adder cell, *IEEE International Symposium on Circuits and Systems* ISCAS 1999, Orlando, FL, USA, 43-46.
- 14. Moalemi V. & Afzali-Kusha A. (2007). Subthreshold 1-bit full adder cells in sub-100 nm technologies, *IEEE Computer Society Annual Symposium on VLSI*, Porto Alegre, Brasil, 514-515. *Nanosim User Guide, Version V-*2004.06, June 2004. Synopsys, June 2004 Retrieved from:

http://www.utdallas.edu/~poras/courses/ee6325/lab/nanosi m/nanosimug.pdf

- Nishitani T. (1999). VLSI for digital signal processing lowpower architectures for programmable multimedia processors, *IEICE transactions on fundamentals of electronics, communications and computer sciences*, E82-A(2),184-196.
- Pacheco B. D. & Linares A. M. (2004). A low power and high speed CMOS voltage-controlled ring oscillator, *IEEE International Symposium on Circuits and Systems ISCAS* 2004, Vancouver, Canada, (4) 752-755.

### New High-Performance Full Adders Using an Alternative Logic Structure 223

- Radhakrishnan D. (2001). Low-voltage low-power CMOS full adder, *IEE Proceedings on Circuits Devices and* Systems, 148(1), 19-24.
- Shams A. M. & Bayoumi M. A. (1998). A structured approach for designing low Power adders, *Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers.* Pacific Grove, CA, USA, 1, 757-761.
- Shams, A. M. & Bayoumi M. A. (1998). A novel lowpower building block CMOS cell for adders, *IEEE International Symposium on Circuits and Systems ISCAS* '98, Monterey, CA, USA, 2, 153-156.
- 20. Shams A. M., Darwish T. K. & Bayoumi M. A. (2002). Performance analysis of low-power 1-bit CMOS full adder cell, *IEEE Transactions on Very Large Scale Integration Systems (VLSI)*, 10(1), 20-29.
- Singh M., Giacomotto C., Zeydel B. & Oklobdzija V. (2007). Logic style comparison for ultra low-power operation in 65nm technology, *International Workshop on Power and Timing Modeling, Optimization and Simulation*, Gothenburg, Sweden, 181-190.
- 22. Suzuki M., Ohkubo N., Yamanaka T., Shimizu A. & Sasaki K. (1993). A 1.5ns 32-b CMOS ALU in double pass-transistor logic, *IEEE Journal of Solid-State Circuits*, 28(11), 1145-1151.
- 23. Weste, N. & Eshraghian, K. (1993). Principles of CMOS VLSI design: a system perspective, Second edition. Addison-Wesley.
- 24. Wey I., Huang C. & Chow H. (2002). A new low-voltage CMOS 1-bit full adder for high performance applications, IEEE Asia-Pacific Conference on ASIC 2002, Taipei, Taiwan. 21-24.
- Wu A. & Ng C. K. (1997). High performance low power low voltage adder, *Electronic Letters*, 33(8), 681-682.
- 26. Yano K., Yamanaka, T., Nishida, T., Saito, M., Shimohigashi, K., & Shimizu, A. (1990). A 3.8ns CMOS 16 × 16-b multiplier using complementary pass-transistor logic", *IEEE Journal of Solid-State Circuits*, 25(2), 388-395.
- Zhuang N. & Wu H. (1992). A new design of the CMOS full adder, *IEEE Journal of Solid-State Circuits*, 27(5), 840-844.
- Zimmerman R. & Fichtner W. (1997). Low-power logic styles: CMOS versus pass-transistor logic, *IEEE Journal* of Solid-State Circuits, 32(7), 1079-1090.



#### Mónico Linares Aranda

(M'01) received the B. S. degree in electronics engineering from the Autonomous University of Puebla, Mexico and the Ph.D. degree in electrical engineering from Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico in 1986 and 1996 respectively. He joined the National Institute for Astrophysics, Optics and Electronics, Mexico, in 1986, where he is a Titular Professor. He has published more than 30 papers in international conferences and journals in the field of VLSI circuit design. His research interests include development of new techniques, architectures and methodologies for high performance CMOS integrated circuits, and verification of signal integrity.



#### Mariano Aguirre Hernández

(S'04–M'06) received the B.S. degree in electronics and communications engineering from the National Polytechnic Institute (IPN), Mexico, in 1996, and the M.S. and Ph.D. degrees from the National Institute for Astrophysics, Optics and Electronics (INAOE), Mexico, in 2000 and 2006, respectively. He joined Intel Corporation's Guadalajara Design Center, Mexico, as a Hardware Engineer in 2000, and rejoined in 2006 as a Research Scientist working for the Communications Research Center-Mexico (CRCM). His research interests include high-performance and low-power data-path circuits, and reconfigurable DSP architectures for wireless communications systems.