

Srihari Cadambi NEC Laboratories America 4 Independence Way, Suite 200 Princeton NJ 08540
Phone: (609) 951-2835
About
Me![]() Research Publications Patents Personal |
Research
Interests
Selected Current / Past Projects The
Sunburst Project (Current) Sunburst focuses
on
developing non-intrusive runtimes and programming models for
heterogeneous clusters where the nodes are built using multi-core CPUs
coupled with manycore graphics
processors (GPUs). The broad goals of the runtimes and middleware are
to achieve performance scalability, virtualize the hardware and manage
data placement effectively. We have proposed techniques to enable
legacy applications transparently avail of heterogeneous hardware, and
are looking into deploying programming models such as MapReduce on such
clusters. We have also
developed special-purpose, massively
parallel architectures for
recognition and mining applications, including an FPGA-based universal
learning engine that has demonstrated significant speedups for
applications like semantic text search and face recognition.We are
currently building a low-power cloud component for RM applications
where a low-end CPU is coupled with an accelerator to achieve a system
with high performance-per-watt. Architectures
and Algorithms for Pattern Matching and Retrieval Matching
and retrieving data is a common operation in network routers, intrusion
detection systems, internet search engines and databases. With
increasing
amounts of data and the growing need for security, fast, scalable and
intelligent matching and retrieval mechanisms are required. This project pursues research in pattern
matching algorithms as well as supporting hardware
architectures across the application domains of networking, databases
and content search. The
current focus is within the networking domain. Security is a key concern in networking
and is provided by host-based intrusion detection
systems that
detect viruses and Trojan horses within a host, and network
flow-based intrusion detection systems that stop internet worms from
propagating and detect intrusion attempts. The core of an IDS is a pattern
matching engine. Other than in security, matching and
retrieval is extensively used in networking appliances
for deep packet inspection, for example, in the identification of P2P
traffic,
content-based billing and policy management. Changing applications,
user demands and service convergence not only require pattern
matching engines to operate
at very high speeds, but also require their performance
to scale with the size of the
security and policy rules.
Longest
Prefix Matching (LPM) is a fundamental part of various network
processing tasks.
Previously proposed approaches for LPM result in prohibitive cost and
power
dissipation (TCAMs) or in large memory requirements and long lookup
latencies
(tries), when considering future line-rates, table sizes and key
lengths (e.g.,
IPv6). Hash-based approaches appear to be an excellent candidate for
LPM with the
possibility of low power, compact storage, and O(1) latencies. However,
there
are two key problems that hinder their practical deployment as LPM
solutions.
First, naïve hash tables incur collisions and resolve them using
chaining,
adversely affecting worst-case lookup-rate guarantees that routers must
provide.
Second, hash functions cannot directly operate on wildcard bits, a
requirement
for LPM, and current solutions require either considerably complex
hardware or large
storage space. We proposed a novel architecture which successfully
addresses for the first time, both key problems in hash based LPM —
making the
following contributions: (1) We architected an LPM solution based upon
a
collision-free hashing scheme called Bloomier filter, by eliminating
its false positives in a storage efficient way. (2) We proposed a novel
scheme
called prefix collapsing, which provides support for wildcard bits with
small
additional storage and reduced hardware complexity. (3) We exploit
prefix
collapsing and key characteristics found in real update traces to
support fast
and incremental updates, a feature generally not available in
collision-free
hashing schemes. Architectures
for Accelerating Functional Simulation Functional
simulation
and verification are often the bottlenecks during chip design. Hardware
accelerators for functional simulation are prohibitively expensive. We
introduced a novel approach to accelerating functional simulation
attributed by high-performance, low-cost, scalability and low
turn-around-time (TAT). Significant speedups over zero
delay event-driven simulation and cycle-based simulation on benchmark
and industrial circuits were demonstrated while maintaining the cost,
scalability and TAT
advantages of simulation. Owing to these attributes, such an
approach has potential for wide deployment as replacement or
enhancement for existing simulators. Our technology relies on a
VLIW-style virtual simulation processor (SimPLE) mapped to a single
FPGA on a PCI-board. Companion to the
processor is the very fast SimPLE compiler. This architecture
plugs in naturally into any existing HDL simulation environment. Professional Actitivies I serve on the Program Committees of:
|