Autonomic Management

The complexity of large-scale systems and networks has raised unprecedented
challenges for system management. Many mission critical systems are deployed
by integrating thousands of heterogeneous components. The numerous software
components running on these large systems often obscure the dependencies
and interactions among system components dramatically increasing system
complexity. Furthermore, many complex systems are not static and keep
evolving with numerous changes such as software or hardware upgrades
and configuration modifications. Therefore, the system scale, heterogeneity
and dynamics as well as hidden dependencies all contribute to today’s
difficulties in complexity management. The unmanageable system complexity
has significantly increased operational costs in recent years and also
affected system reliability and availability.
The objective of the ASDS project is to:
- understand the nature of system complexity, especially for
large and complex systems;
- introduce automation and intelligence into the lifecycle
of system and service management including system design, deployment,
operation and maintenance.
The ASDS group creates novel management technology that applies control
theory, statistical learning and inference, and information theory in
addition to experimental computer science methodologies to address the
growing complexity in large systems and networks.
Currently the ASDS groups focuses on several research topics including
system data mining and analysis, IP service management and next generation
data centers.
|
Grid Storage

Growing data volumes lead to ever increasing need for storage capacity outpacing the increase in storage density due to higher disk capacity and density. More hardware means growing complexity of the storage infrastructure and higher management overhead.
Our research focuses on grid storage with the objective to provide one scalable storage system based on commodity hardware that is self-managing, scalable, distributed across multiple locations, can recover from multiple failures and provides efficient storage with integrated data services for data protection.
|