Anomaly Diagnosis is the process of identifying and understanding abnormalities or deviations from expected behavior within a system or dataset. In various fields, including cybersecurity, healthcare, and industrial monitoring, anomaly diagnosis involves detecting patterns or events that differ significantly from the norm. This can be done using statistical methods, machine learning algorithms, or other analytical techniques to distinguish between regular patterns and unusual occurrences.

Posts

Incident Diagnosing and Reporting System based on Retrieval Augmented Large Language Model

The Internet-of-Things (IoT) is widely used in many applications such as smart city, transportation, healthcare, and environment monitoring. A key task of IoT maintenance is to analyze the abnormal sensor records and generate incident report. Traditionally, domain experts engage in such labor intensive tasks. Recent advances in Large Language Model (LLM) have sparked interests in developing AI-based systems to automate these labor intensive processes. However, two critical problems hinder the effective application of LLM in IoTs: (1) LLM lacks background knowledge of deployed IoTs; and (2) the incidents are complex = events involving many sensors and components. LLM needs to understand the sensor relationships for accurate diagnosis. In this study, we propose a Retrieval Augmented language model based Incident Diagnosing and Reporting system (RAIDR) for IoT applications. RAIDR retrieves related system documents based on the incident features and leverages LLM to analyze anomalies, identify root causes, and automatically generate incident reports. The automated incident reporting process streamlines end users’ decision making for system maintenance and troubleshooting.

Temporal Graph-Based Incident Analysis System for Internet of Things (ECML)

Internet-of-things (IoTs) deploy a massive number of sensors to monitor the system and environment. Anomaly detection on sensor data is an important task for IoT maintenance and operation. In real applications, the occurrence of a system-level incident usually involves hundreds of abnormal sensors, making it impractical for manual verification. The users require an efficient and effective tool to conduct incident analysis and provide critical information such as: (1) identifying the parts that suffered most damages and (2) finding out the ones that cause the incident. Unfortunately, existing methods are inadequate to fulfill these requirements because of the complex sensor relationship and latent anomaly influences in IoTs. To bridge the gap, we design and develop a Temporal Graph based Incident Analysis System (TGIAS) to help users’ diagnosis and reaction on reported anomalies. TGIAS trains a temporal graph to represent the anomaly relationship and computes severity ranking and causality score for each sensor. TGIAS provides the list of top k serious sensors and root-causes as output and illustrates the evidence on a graphical view. The system does not need any incident data for training and delivers high accurate analysis results in online time. TGIAS is equipped with a user-friendly interface, making it an effective tool for a broad range of IoTs.

Temporal Graph based Incident Analysis System for Internet of Things

Internet-of-things (IoTs) deploy a massive number of sensors to monitor the system and environment. Anomaly detection on sensor data is an important task for IoT maintenance and operation. In real applications, the occurrence of a system-level incident usually involves hundreds of abnormal sensors, making it impractical for manual verification. The users require an efficient and effective tool to conduct incident analysis and provide critical information such as: (1) identifying the parts that suffered most damages and (2) finding out the ones that cause the incident. Unfortunately, existing methods are inadequate to fulfill these requirements because of the complex sensor relationship and latent anomaly influences in IoTs. To bridge the gap, we design and develop a Temporal Graph based Incident Analysis System (TGIAS) to help users’ diagnosis and reaction on reported anomalies. TGIAS trains a temporal graph to represent the anomaly relationship and computes severity ranking and causality score for each sensor. TGIAS provides the list of top k serious sensors and root-causes as output and illustrates the detailed evidence on a graphical view. The system does not need any incident data for training and delivers high accurate analysis results in online time. TGIAS is equipped with a user-friendly interface, making it an effective tool for a broad range of IoTs.