Diagram Analysis is the computational interpretation of structured visual information such as flowcharts, schematics, and graphs. At NECLA, this supports media analytics, scientific document understanding, and multimodal AI systems. By enabling machines to parse diagrams, researchers improve scientific knowledge extraction, automated reasoning, and technical document retrieval. This work strengthens NECLA’s broader goals in explainable AI and knowledge systems that can bridge text, image, and symbolic data.

Posts

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

Visual Language Models (VLMs) like GPT-4V have broadened the scope of LLM applications, yet they face significant challenges in accurately processing visual details, particularly in scientific diagrams. This paper explores the necessity of meticulous visual detail collection and region decomposition for enhancing the performance of VLMs in scientific diagram analysis. We propose a novel approach that combines traditional computer vision techniques with VLMs to systematically decompose diagrams into discernible visual elements and aggregate essential metadata. Our method employs techniques in OpenCV library to identify and label regions, followed by a refinement process using shape detection and region merging algorithms, which are particularly suited to the structured nature of scientific diagrams. This strategy not only improves the granularity and accuracy of visual information processing but also extends the capabilities of VLMs beyond their current limitations. We validate our approach through a series of experiments that demonstrate enhanced performance in diagram analysis tasks, setting a new standard for integrating visual and language processing in a multimodal context.