New Architectures for Arithmetic Circuits

CAD for datapath circuits is an area that has historically not received a great deal of attention. In our work in this area, we have developed a routing approach for datapath designs which was shown to be 6X faster for routing commercial datapath circuits, than a commercial tool.

The second half of this work focused on CAD for arithmetic Sums of Products (SOPs), a very general class of arithmetic expressions. We have developed efficient techniques to determine the number and width of sub-adders in the final adder stage of a SOP computation. We have also developed efficient datapath synthesis tools for barrel shifters and mutually exclusive multipliers, MACs, adders and subtractors. In addition to SOPs, we have also focused on POS expressions. In addition, we have developed a specialized adder which is comparable to a Kogge-Stone adder in speed, with the area comparable to a Brent-Kung adder. All our experimental comparisons in this body of work are performed against commercial tools, and demonstrate significant improvements.

In addition, we have developed design approaches for Multiple Constant Multiplication (MCM) problems, as well as for FPGA based fast matrix multiplication and logarithm/antilogarithm computation.

Publications, patents and artefacts:

“Efficient Arithmetic Sum-of-Product (SOP) Based Multiple Constant Multiplication (MCM) for FFT”, Karkala, Wanstrath, Lacour, Khatri. International Conference on Computer-Aided Design (ICCAD) 2010, San Jose, CA. pp.735~738. This paper utilizes a SOP based approach to design a MCM circuit. The partial products are reduced using a partial Max-SAT formulation by exploring MSD based alternatives to realize coefficients. The area and delay results are significantly better than the adder-cascade based existing approaches
“A Fast Hardware Approach for Approximate, Efficient Logarithm and Antilogarithm Computations”, Paul, Jayakumar, Khatri. IEEE Transactions on Very Large Scale Integration Systems, vol. 17, number 2, Feb 2009, pp. 269-277. Also appeared in the International Workshop on Logic and Synthesis (IWLS) 2007, May 30 – June 1, San Diego, CA. This paper reports an FPGA based log/antilog engine, which is based on table lookup followed by interpolation (without needing to perform multiplication or division). Presentation slides.
“An Efficient and Regular Routing Methodology for Datapath Designs Using Net Regularity Extraction”. Das, Khatri. Short paper, IEEE Transactions on CAD, Vol 21, Number 1. Special issue on Physical Design, Jan 2002, pp 93-101. This paper reports a fast datapath router which routes datapath designs by first extracting routing patterns that are repeated, After these patterns are routed, they are propagated throughout the design, yielding significant speedup over commercial routers.
“A Regularity-driven Fast Gridless Detailed Router for High Frequency Datapath Designs”, Das, Khatri. Presented at International Symposium on Physical Design (ISPD-01), April 1-4, 2001, pp 130-135, Sonoma, CA, pp. 130-135. This paper reports a datapath router which routes only a single bit slice, and then propagates the routes across other slices. It achieves 7X speedup over a commercial router, on commercial datapath routing instances. Presentation slides.
Invited paper “A Routing Technique for Structured Designs which Exploits Regularity”. Khatri, Das. VLSI Design and Test Workshop (VDAT-2001), Aug 2001. This paper presents a router which first extracts net regularity in a structured design, and then routes a significantly smaller subset of the nets of the design. In a final phase, it propagates these routed nets across the design to complete the routing task. A significant speedup is obtained over a commercial router, with similar wirelength and via count. Presentation slides.
U.S. Patent 06598215: “Datapath Design Methodology and Routing Apparatus”. Das, Khatri. Issued July 2003.
“A Timing-Driven Approach to Synthesize Fast Barrel Shifters”, Das, Khatri. IEEE Transactions on Circuits and Systems II (TCAS-II), vol. 55, number 1, Jan 2008, pp. 31-3. In this paper, we implement fast barrel shifters which merge stages of a traditional barrel shifter, yielding a 10% faster shift operation. Presentation slides.
“A Novel Hybrid Parallel-Prefix Adder Architecture with Efficient Timing-Area Characteristic”, Das, Khatri. IEEE Transactions on Very Large Scale Integration, vol. 16, number 3, March 2008, pp. 326-33. This paper reports a parallel-prefix adder which has the speed of a Kogge-Stone adder, with the area similar to a Brent-Kung adder. Presentation slides.
“Resource Sharing among Mutually Exclusive Sum-of-Product Blocks for Area Reduction”, Das, Khatri. ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 13, number 3, July 2008, pp. 51:1-51:7. This paper reduces the area of SOP based realization of multiple operations (which are mutually exclusive) by resource sharing. Our approach attains 37% area improvement and 18% power improvement over the result of a commercial datapath synthesis tool.
“Generation of the Optimal Bit-Width Topology of the Fast Hybrid Adder in a Parallel Multiplier”, Das, Khatri. International Conference on Integrated Circuit Design and Technology (ICICDT) 2007, May 30 – June 1, Austin, TX, pp. 1-6. In a datapath synthesis problem, the final step requires a fast two-operand adder. This paper finds the best realization of sub-adders to exploit the variation in the input arrival times of the two operands. For industrial examples, our method finds the best sub-adder configuration in 6 minutes compared to 19 hours for the competing approach. Presentation slides.
“Area-reducing Sharing of Mutually Exclusive Multiplier, MAC, Adder and Subtractor blocks”, Das, Khatri. IASTED Fifth International Conference on Circuits, Signals and Systems (CSS) 2007, July 2-4, Banff, Alberta. In this paper, multiple arithmetic operations are realized in a manner that shares logic between them, resulting in a reduced area. Presentation slides.
“A Timing-Driven Hybrid-Compression Algorithm for Faster Sum-of-Products”, Das, Khatri. IASTED Fifth International Conference on Circuits, Signals and Systems (CSS) 2007, July 2-4, Banff, Alberta. This paper implements a fast compressor approach for partial product reduction in SOP based datapath synthesis. Presentation slides.
“Timing-Driven Decomposition of a Fast Barrel Shifter”, Das, Khatri. IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) 2007, August 5-8, Montreal, Canada, pp. 574-577. This paper implements a fast barrel shifter by merging the different stages of a traditional barrel shifter. Our result is 11% faster than that obtained by a commercial tool. Presentation slides.
“A Merged Synthesis Technique for Fast Arithmetic Blocks Involving Sum-of-Products and Shifters”, Das, Khatri. 21st International Conference on VLSI Design 2008, January 4-8, Hyderabad, India, pp. 572-579. In this paper, we perform operator merging of the SOP and shifter blocks, yielding a delay improvement of about 13% over that obtained by a commercial tool, with a 3.2% area overhead.
“A Timing-Driven Synthesis Technique for Arithmetic Product-of-Sum Expressions”, Das, Khatri. 21st International Conference on VLSI Design 2008, January 4-8, Hyderabad, India, pp. 635-640. This paper reports our work on POS synthesis, which yields a delay improvement by about 11% over a commercial datapath synthesis tool.
“An Inversion-Based Synthesis Approach for Area and Power Efficient Arithmetic Sum-of-Products”, Das, Khatri. 21st International Conference on VLSI Design 2008, January 4-8, Hyderabad, India, pp. 653-659. This paper generates inverted partial products for the SOP operation, yielding an area improvement of about 8.5% and a power improvement of 6% over the result of a commercial datapath synthesis tool.
“A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products”, Das, Khatri. IEEE Midwest Symposium on Circuits and Systems (MWSCAS) 2008, August 10-13, 2008, Knoxville, TN, pp. 507-510. In this paper, the final adder in an SOP block is synthesized using 4 sub-adders. Our approach automatically determines the bit-widths of these 4 sub-adders for optimal delay, achieving a 14% faster result than that obtained by a commercial datapath synthesis tool. Presentation slides.
“Resource and Delay Efficient Matrix Multiplication using Newer FPGA Devices”, Campbell, Khatri. IEEE/ACM Great Lakes Symposium on VLSI (GLSVLSI), April 30 – May 2, 2006, Philadelphia, PA, pp. 308-311. This paper presents a delay and resource efficient matrix multiplication engine based on new features present in modern FPGAs
“Sorting Binary Numbers in Hardware – a Novel Algorithm and its Implementation”, Alaparthi, Gulati, Khatri. International Symposium on Circuits and Systems (ISCAS) 2009, Taipei, Taiwan. May 24-27, 2009. In this paper, we present a fast special function unit for sorting, which is based on a column scan, and is significantly faster than the best known existing approach, with lower area (for larger numbers).