DATA SCIENCE & SYSTEM SECURITY
PROJECTS
PEOPLE
PUBLICATIONS
PATENTS
Deep Document Analysis and Large Language Models
Unstructured data is growing at an unprecedented rate, valuable knowledge, including findings, observations, business demand, opportunities, is widely recorded as texts in documents. We are developing advanced analysis engines for mining text data in documents, aiming to discover valuable knowledge from large-scale documents and provide informed decision-making for users.
This project focuses on document knowledge discovery utilizing advanced natural language processing, deep learning, and machine learning techniques. It builds innovative analytic engines to model the large amount of document data generated from various scenarios. The engines provide interpretable knowledge with low-resource requirements in different languages and domains, further helping customers understand and optimize the decision-making process.
In addition, this project focuses on advancing the state-of-the-art in NLP for document understanding. Toward this goal, efforts have been put into tasks like information extraction, language modeling, domain adaptation, intention detection, and contrastive augmentation. We also focus on providing solutions to different industries (e.g., financial, business, security, and systems) that can help customers with operation management and decision-making optimization. Some examples are document-based business matching, business process optimization, threat intelligence discovery, and log-based system management.