Network-on-Chip and Special Function Units
Contemporary computer architectures are increasingly using multiple computing cores. This decision is primarily driven by the fact that the total power consumption of the chip has a hard constraint. It is envisioned that in the future, we will have a very large number of heterogeneous cores on the same die.
This leads to two key problems. The first problem is to design a fast, power-efficient Network on Chip (NoC) to interconnect these cores. The second problem arises from an opportunity. With the large number of available cores, there is the opportunity to design special function units tuned for specific computational tasks. Our work addresses the first problem by using a source-synchronous ring-based NoC. The data in the rings is transmitted in a source-synchronous fashion, strobed off of an extreme high speed, low power ring-based standing-wave resonant clocking paradigm. Our results indicate a 4.5X improvement in bandwidth and about 7.5X improved contention free latency using this approach, compared to the best existing approach. Ongoing work includes exploring network topologies, and benchmarking the NoC against real and synthetic traffic.
To address the second problem, my group has developed special purpose units for computational tasks such as sorting, comparison of two numbers, logarithm and antilogarithm computation, cryptographic key generation and Boolean Satisfiability.
Publications, patents and artefacts:
- “A Fast, Source-synchronous Ring-based Network-on-Chip Design”, Mandal, Khatri, Mahapatra. Design Automation and Test in Europe (DATE) conference 2012. Mar 12-26, Dresden, Germany. In this paper, we report an extremely fast NoC design using a source-synchronous data transfer. The clock used is an extremely fast, low power resonant clock.
- “CMOS Comparators for High-Speed and Low-Power Applications”, Menendez, Maduike, Garg, Khatri. IEEE International Conference on Computer Design (ICCD), Oct 1-4, 2006, San Jose, CA, pp. 76-81. We present two novel ways to design hardware comparators, yielding about 37% improvement over competing approaches.
- “Sorting Binary Numbers in Hardware – a Novel Algorithm and its Implementation”, Alaparthi, Gulati, Khatri. International Symposium on Circuits and Systems (ISCAS) 2009, Taipei, Taiwan. May 24-27, 2009. In this paper, we present a fast special function unit for sorting, which is based on a column scan, and is significantly faster than the best known existing approach, with lower area (for larger numbers).
- “A Novel Cryptographic Key Exchange Scheme using Resistors”, Lin, Ivanov, Johnson, Khatri. IEEE International Conference on Computer Design (ICCD) 2011, Amherst, MA, Oct 2011. pp 451-452. In this paper, we report a practical FPGA based implementation of the Kish cipher, intended to use over the internet. Given a single shared secret between Alice and Bob, they are both able to generate a new shared secret (cryptographic key).
- “VLSI Implementation of a Non-Linear Feedback Shift Register for High-Speed Cryptography Applications”, Lin, Khatri. Great Lakes Symposium on VLSI (GLS-VLSI) 2010. Providence, RI May 16-18, 2010. This paper presents an extremely fast LFSR based cryptographic key generator, which can operate at rates which match OC-768 optical fiber communication rates.
- “A Fast Hardware Approach for Approximate, Efficient Logarithm and Antilogarithm Computations”, Paul, Jayakumar, Khatri. IEEE Transactions on Very Large Scale Integration Systems, vol. 17, number 2, Feb 2009, pp. 269-277.
- “An Efficient, Scalable Hardware Engine for Boolean Satisfiability and Unsatisfiable Core Extraction”, Gulati, Waghmode, Khatri, Shi. IET Computers and Digital Techniques, vol. 2, number 3, May 2008, pp. 214-229. This paper represents a hardware custom IC based implementation of a SAT solver. Boolean constraint propagation is done in hardware, in a fast, scalable manner.