A hybrid nano-CMOS architecture for defect and fault tolerance

by Muzzafer Simsir, Srihari Cadambi, Franjo Ivancic, Martin Rötteler, and Niraj Jha

As the end of the semiconductor roadmap for CMOS approaches, architectures based on nanoscale molecular devices are attracting attention. Among several alternatives, silicon nanowires and carbon nanotubes are the two most promising nanotechnologies according to the ITRS. These technologies may enable scaling deep into the nanometer regime. However, they suffer from very defect-prone manufacturing processes. Although the reconfigurability property of the nanoscale devices can be used to tolerate high defect rates, it may not be possible to detect all defects. With very high device densities, testing each component may not be possible because of time or technology restrictions. This points to a scenario in which even though the devices are tested, the tests are not very comprehensive at locating defects, and hence the chips are still defective. Moreover, the devices in the nanometer range will be susceptible to transient faults which can produce arbitrary soft errors. Despite these drawbacks, it is possible to make nanoscale architectures practical and realistic by introducing defect and fault tolerance. In this paper, we propose and evaluate a hybrid nanowire-CMOS architecture that addresses all three problems -- namely high defect rates, undetected defects and transient faults -- at the same time. This goal is achieved by using multiple levels of redundancy and majority voters. A key aspect of the architecture is that it contains a judicious balance of both nanoscale and traditional CMOS components. Companion to the architecture is a compiler with heuristics to quickly determine if logic can be mapped onto partially defective nanoscale elements. The heuristics make it possible to introduce defect-awareness in placement and routing. The architecture and compiler are evaluated by applying the complete design flow to several benchmarks.

ACM Journal on Emerging Technologies in Computing Systems, vol. 5, no. 3, pp. 14:2-14:26, 2009.



Using hardware transactional memory for data race detection

Shantanu Gupta, Florin Sultan, Srihari Cadambi, Franjo Ivancic, and Martin Rötteler

Widespread emergence of multicore processors will spur development of parallel applications, exposing programmers to degrees of hardware concurrency hitherto unavailable. Dependable multithreaded software will have to rely on the ability to dynamically detect nondeterministic and notoriously hard to reproduce synchronization bugs manifested through data races. Previous solutions to dynamic data race detection have required specialized hardware, at additional power, design and area costs. We propose RaceTM, a novel approach to data race detection that exploits hardware that will likely be present in future multiprocessors, albeit for a different purpose. In particular, we show how emerging hardware support for transactional memory can be leveraged to aid data race detection. We propose the concept of lightweight debug transactions that exploit the conflict detection mechanisms of transactional memory systems to perform data race detection. We present a proof-of-concept simulation prototype, and evaluate it on data races injected into applications from the SPLASH-2 suite. Our experiments show that this technique is effective at discovering data races and has low performance overhead.

Proceedings IEEE International Parallel and Distributed Processing Symposium (IPDPS'09), pp. 1-11, 2009.



Algebraic signal processing theory: Cooley-Tukey type algorithms on the 2-D hexagonal spatial lattice

by Markus Püschel and Martin Rötteler

Recently, we introduced the framework for signal processing on a nonseparable 2-D hexagonal spatial lattice including the associated notion of Fourier transform called discrete triangle transform (DTT). Spatial means that the lattice is undirected in contrast to earlier work by Mersereau introducing hexagonal discrete Fourier transforms. In this paper we derive a general-radix algorithm for the DTT of an n x n 2-D signal, focusing on the radix-2 x 2 case. The runtime of the algorithm is O(n^2 log(n)), which is the same as for commonly used separable 2-D transforms. The DTT algorithm derivation is based on the algebraic signal processing theory. This means that instead of manipulating transform coefficients, the algorithm is derived through a stepwise decomposition of its underlying polynomial algebra based on a general theorem that we introduce. The theorem shows that the obtained DTT algorithm is the precise equivalent of the well-known Cooley-Tukey fast Fourier transform, which motivates the title of this paper.

Applicable Algebra in Engineering, Communication and Computing, vol. 19, Issue 3, pp. 259-292. Springer, 2008.



RaceTM: detecting data races using hardware transactional memory (brief announcement)

by S. Gupta, F. Sultan, S. Cadambi, F. Ivancic, and M. Rötteler

Widespread emergence of multicore processors will spur development of parallel applications, exposing programmers to more hardware concurrency. Dependable multithreaded software will have to rely on the ability to dynamically detect data races, which are non-deterministic and notoriously hard to reproduce symptoms of synchronization bugs. In this paper, we propose RaceTM, a novel approach that exploits transactional memory support to detect data races. We introduce the concept of lightweight debug transactions that exploit the conflict detection mechanisms of transactional memory systems to perform data race detection. Debug transactions differ from regular transactions in that they do not need to be rolled back, and therefore require no versioning or checkpointing support. Debug transactions do not overlap with a regular transaction, thus providing a transparent mechanism to leverage existing transactional memory support for data race detection.

Proceedings 20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'08), pp. 104-106, 2008.



Fault-tolerant computing using a hybrid nano-CMOS architecture

by M. Simsir, S. Cadambi, F. Ivancic, M. Rötteler, and N. Jha

Architectures based on nanoscale molecular devices are attracting attention for replacing CMOS architectures at the end of the semiconductor roadmap. The two most promising nanotechnologies, according to ITRS, are silicon nanowires and carbon nanotubes. Although they offer unmatched densities for building logic, interconnect and memory, they do suffer from very defect-prone manufacturing processes. This is further exacerbated by testing complexities where it is nearly impossible to detect and localize all faults in a large nanoscale chip. As a result, fault tolerance is necessary to make nanoscale architectures practical and realistic. Although reconfiguration can be used to compile around known defects on a chip, a large fraction of defects may remain hidden if they cannot be discovered by testing. Furthermore, the small structures in nanoscale architectures are particularly susceptible to transient faults which can produce arbitrary soft errors. For the first time, we propose an architecture that can tolerate a large number of undetected manufacturing faults as well as a large rate of transient faults. Our architecture is characterized by multiple levels of redundancy and majority voting to correct errors caused by such faults. A key aspect of the architecture is that it contains a judicious balance of both nanoscale and traditional CMOS components. A companion to the architecture is a compiler with heuristics tailored to quickly and compactly map logic onto partially defective components. Experimental results demonstrate the efficacy of the architecture.

Proceedings 21th Conference on VLSI Design (VLSI'08), pp. 435-440, 2008.



Algebraic signal processing theory: 2-D spatial hexagonal lattice

by Markus Püschel and Martin Rötteler

We develop the framework for signal processing on a spatial, or undirected, 2-D hexagonal lattice for both an infinite and a finite array of signal samples. This framework includes the proper notions of z-transform, boundary conditions, filtering or convolution, spectrum, frequency response, and Fourier transform. In the finite case, the Fourier transform is called discrete triangle transform (DTT). Like the hexagonal lattice, this transform is nonseparable. The derivation of the framework makes it a natural extension of the algebraic signal processing theory that we recently introduced. Namely, we construct the proper signal models, given by polynomial algebras, bottom-up from a suitable definition of hexagonal space shifts using a procedure provided by the algebraic theory. These signal models, in turn, then provide all the basic signal processing concepts. The framework developed in this paper is related to Mersereau's early work on hexagonal lattices in the same way as the discrete cosine and sine transforms are related to the discrete Fourier transform - a fact that is made rigorous in this paper.

IEEE Transactions on Image Processing, vol. 16, no. 6, pp. 1506-1521, 2007.



Fourier transform for the spatial quincunx lattice

by Markus Püschel and Martin Rötteler

We derive a new, two-dimensional nonseparable signal transform for computing the spectrum of spatial signals residing on a finite quincunx lattice. The derivation uses the connection between transforms and polynomial algebras, which has long been known for the discrete Fourier transform (DFT), and was extended to other transforms in recent research. We also show that the new transform can be computed with O(n^2 log(n)) operations, which puts it in the same complexity class as its separable counterparts.

Proceedings of the International Conference on Image Processing, (ICIP'05), vol. 2, pp. 494-497, Genova, September 11-14, 2005.



Fourier transform for the directed quincunx lattice

by Markus Püschel and Martin Rötteler

We introduce a new signal transform for computing the spectrum of a signal given on a two-dimensional directional quincunx lattice. The transform is non-separable, but closely related to a two-dimensional (separable) discrete Fourier transform. We derive the transform using recently discovered connections between signal transforms and polynomial algebras. These connections also yield several important properties of the new transform.

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05), vol. 4, pp. 401-404, Philadelphia, March 18-23, 2005.



Cooley-Tukey FFT like algorithm for the discrete triangle transform

by Markus Püschel and Martin Rötteler

The discrete triangle transform (DTT) was recently introduced in the paper
"The discrete triangle transform" (Proceedings of ICASSP'04) as an example of a non-separable transform for signal processing on a two-dimensional triangular grid. The DTT is built from Chebyshev polynomials in two variables in the same way as the DCT, type III, is built from Chebyshev polynomials in one variable. We show that, as a consequence, the DTT has, as the DCT, type III, a Cooley-Tukey FFT type fast algorithm. We derive this algorithm and an upper bound for the number of complex operations it requires. Similar to most separable two-dimensional transforms, the operations count of this algorithm is O(n^2 log(n)) for an input of size n x n.

Proceedings of the 11th Digital Signal Processing Workshop 3rd Signal Processing Education Workshop, Taos, August 1-4, 2004.



The discrete triangle transform

by Markus Püschel and Martin Rötteler

We introduce the discrete triangle transform (DTT), a non-separable transform for signal processing on a two-dimensional equispaced triangular grid. The DTT is, in a strict mathematical sense, a generalization of the DCT, type III, to two dimensions, since the DTT is built from Chebyshev polynomials in two variables in the same way as the DCT, type III, is built from Chebyshev polynomials in one variable. We provide boundary conditions, signal extension, and diagonalization properties for the DTT. Finally, we give evidence that the DTT has Cooley-Tukey FFT like algorithms that enable its efficient computation.

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'04), vol. 3, pp. 44-48, Montreal, May 17-21, 2004.