Heterogeneous Cluster Computing

Traditional enterprise applications currently run on platforms that are complicated and expensive to build or maintain. They often require a team of experts to operate these platforms. New IT solutions that use dynamically scalable and virtualized clusters of shared computing resources have the potential to dramatically cut costs associated with delivery of enterprise IT services. However, many challenges must be overcome. We discuss the major challenges, key technologies that we are working on to overcome the challenges, and the role of benchmarks and metrics in our research program.

Major Challenges

We outline four major challenges that must be overcome so that heterogeneous computing clusters emerge as the preferred platform for executing a wide variety of enterprise workloads.

First, most enterprise applications in use today were not designed to run on such dynamic, open and heterogeneous computing clusters. Migrating these applications to heterogeneous computing clusters, especially with substantial improvement in performance or energy-efficiency, is an open problem. Second, creating new enterprise applications ground-up to execute on the new, heterogeneous computing platform is also daunting. Writing high-performance, energy-efficient programs for these architectures is extremely challenging due to the unprecedented scale of parallelism, and heterogeneity in computing, interconnect and storage units. Third, cost savings from the new shared-infrastructure architecture for consumption and delivery of IT services are only possible when multiple enterprise applications can amicably share resources (multi-tenancy) in the heterogeneous computing cluster. However, enabling multi-tenancy without adversely impacting the stringent quality of service metrics of each application calls for dynamic scalability and virtualization of a wide variety of diverse computing, storage and interconnect units, and this is yet another unsolved problem. Finally, enterprise applications encounter highly varying user loads, with spikes of unusually heavy load. Meeting quality of service metrics across varying loads calls for an elastic computing infrastructure that can automatically provision (increase or decrease) computing resources used by an application in response to varying user demand. Currently, no good solutions exist to meet this challenge.

Key Technologies

Our main objective is to develop new technologies to solve the four open problems. Our technologies will help understand, analyze, create and optimize a wide variety of enterprise applications on new, cloud-based, shared, heterogeneous computing architectures. Several new technologies are necessary to address the challenges of hosting enterprise applications on heterogeneous computing clusters, and our current focus is on the following themes:
  1. Parallel programming models and run-times:  Programming parallel computing systems is a very difficult problem. In the past few decades, most approaches to parallel programming were based on either parallelizing compilers, which extract parallelism from sequential code, or special- purpose parallel programming languages. Unfortunately, both of these approaches have fundamental drawbacks and are not likely to be suitable for widespread multi-core or cluster programming.
  2. Parallel programming models that allow programmers to specify computation and communication patterns, in a manner that is independent of the target architecture, are expected to gain wide use. These models conveniently expose concurrency in the application, and a runtime system manages the execution of the parallel computations on the target parallel computing platform. Our goal is to create new or enhance existing parallel programming models to easily express and expose maximal concurrency in computation and communication tasks in the application. Our programming models are also designed to exploit novel characteristics of emerging enterprise applications.
  3. Runtimes for adapting legacy applications:  Programming models are useful for designing new applications from scratch. However, a large number of business applications in use today were not designed for heterogeneous computing clusters. There is an urgent need for technologies that can automatically retarget existing legacy applications to heterogeneous clusters. Our goal is to design new, runtime technologies to accelerate performance of existing applications on heterogeneous computing clusters, without requiring changes to the application.
  4. Virtualization:  Automatic provisioning (increasing or decreasing) of computing resources in a shared computing cluster in response to dynamically varying computing requirements of applications necessitates that the heterogeneous resources themselves be dynamically scalable and virtualized. Our goal is to develop new technologies that enable virtualization of a wide range of heterogeneous computing, storage and interconnect resources in the heterogeneous computing cluster.
  5. Custom Accelerators:  Many emerging applications ranging from automobiles to data centers require intelligent processing of large amounts of data in real-time. Such applications have critical performance constraints. With the push towards green computing, power is an important consideration as well. We investigate novel, special-purpose systems for such applications with the goal of achieving high-performance at low power consumption. We are investigating custom many-core architectures as domain-specific accelerators, as well as power-reducing runtimes for systems with multiple accelerators.

Benchmarks and Metrics

Workloads guide the design choices we make in building new computing system architectures. Yet, benchmarks that are representative of a wide variety of enterprise applications, all of which are hosted on the same underlying shared computing infrastructure, are non-existent. Our goal is to create open, cluster-level enterprise application benchmarks that can drive the design of future heterogeneous computing cluster architectures and parallel programming models. Importance of suitable metrics and figures of merit to evaluate new technologies cannot be overstated. Using the wrong metrics can give us an incomplete or irrelevant view of the significance of new technologies. For heterogeneous computing clusters, traditional metrics like performance and energy-efficiency must directly relate to cost of delivering IT services. Such new figures of merit will provide the optimization context for new, heterogeneous cluster technologies.

view all department projects