Generative AI (Artificial Intelligence) (GenAI) is a branch of AI focused on creating systems that generate new content, such as images, text, audio, or video, that is indistinguishable from content created by humans. Generative AI models learn the underlying patterns and structures of a dataset and use this knowledge to generate novel, realistic outputs.

Posts

Driving the Future of Scene Editing with HorizonForge

HorizonForge introduces a new approach to driving scene generation, enabling precise control over both vehicle behavior and identity. By allowing arbitrary trajectories and flexible vehicle insertion, it creates realistic, scalable simulations for autonomous driving, digital twins, and advanced AI development.

Logical Guidance for the Exact Composition of Diffusion Models

We propose LOGDIFF (Logical Guidance for the Exact Composition of Diffusion Models), a guidance framework for diffusion models that enables principled constrained generation with complex logical expressions at inference time. We study when exact score-based guidance for complex logical formulas can be obtained from guidance signals associated with atomic properties. First, we derive an exact Boolean calculus that provides a sufficient condition for exact logical guidance. Specifically, if a formula admits a circuit representation in which conjunctions combine conditionally independent subformulas and disjunctions combine subformulas that are either conditionally independent or mutually exclusive, exact logical guidance is achievable. In this case, the guidance signal can be computed exactly from atomic scores and posterior probabilities using an efficient recursive algorithm.Moreover, we show that, for commonly encountered classes of distributions, any desired Boolean formula is compilable into such a circuit representation. Second, by combining atomic guidance scores with posterior probability estimates, we introduce a hybrid guidance approach that bridges classifier guidance and classifier-free guidance, applicable to both compositional logical guidance and standard conditional generation. We demonstrate the effectiveness of our framework on multiple image and protein structure generation tasks.

TacTool: Tactical Tool usage in Agentic AI Systems

Large language models (LLMs) are becoming the centerpiece in the design and deployment of Agentic artificial intelligence (AI) systems. AI agents typically have (a) reasoning ability to analyze and think through the given task, (b) context/memory to remember things in the short-term and long-term, and (c) tools at their disposal to interact with the outsideworld. While solving the given task, it must decide whether tool use is required; if so, it must then select the appropriate tool and invoke it with the correct parameters. Although LLMs have advanced considerably in recent years, their tool-use capabilities remain limited. Even OpenAI’s most capable model to date, GPT-5, continues to struggle with reliable tool usage. In this paper, we propose TacTool, which empowers AI agents with improved tool selection and tool call formulation using different LLMs. We conduct experiments using Nestful and Berkeley Function Calling Leaderboard version 3 (BFCLv3) benchmarks and show that TacTool achieves ?27% and ?3% improvement over GPT- 4o on Nestful and BFCL v3 dataset, respectively.

TalentScout: Multimodal AI-Driven Expert Finding in Organizations

Identifying subject-matter experts within organizations remains a challenging task due to the scale, heterogeneity, and unstructured nature of enterprise knowledge assets. We present TalentScout, an AI-driven expert identification system that constructs a unified, skill-centric knowledge graph by ingesting and analyzing diverse media, including research papers, reports, presentations, transcripts, and supervisor recommendations. TalentScout’s modular architecture integrates document parsing, audio/video transcription, metadata extraction, large language model-based skill extraction, multi-factor author disambiguation, and evidence-weighted skill attribution. At query time, TalentScout decomposes natural language queries into canonical skill requirements, traverses the constructed knowledge graph, and ranks experts based on aggregated skill weights, document quality, and endorsement signals, providing document-level justifications for each recommendation. We evaluate TalentScout on multiple public and internal enterprise datasets, including DBLP, TREC Enterprise, Tilburg, and ManConCorpus. Using standard information retrieval metrics such as Precision@ 5, Recall@5, nDCG@5, and Mean Reciprocal Rank (MRR), TalentScout consistently outperforms leading baselines, achieving up to 24% higher Precision@ 5 in early expert retrieval. The results highlight TalentScout’s scalability, transparency, and accuracy, establishing it as a practical solution for evidence-based expert discovery and organizational talent management.

SlideCraft: Context-aware Slides Generation Agent

Creating effective slide presentations requires adapting both content and structure to match the communication context e.g. whether the presentation is for summarizing to executives, or reporting progress to research supervisors. In research and enterprise environments, this need for context-sensitive presentations often leads to repeated, manual reformatting of the same material to suit different audiences. Existing generative systems support slide creation but typically rely on structured inputs, assume a fixed format, and offer limited ability to iteratively refine outputs through natural language feedback. Moreover, they rarely accommodate organizational constraints such as formatting guidelines, domain-specific terminology, or branding requirements. We present SlideCraft, a context-aware generative agent that autonomously creates and edits slide presentations based on natural language instructions. SlideCraft infers the intended presentation context, such as an executive-facing or a project review summary for technical oversight, and selects the appropriate slide template. It then synthesizes content from input documents, enriches it with external knowledge and internal assets, assembles it into a structured intermediate representation, and generates a validated slide deck. SlideCraft supports both first-time slide creation and iterative updates, operating through familiar natural language interfaces like email or messaging tools. Our experiments demonstrate that SlideCraft consistently produces high-quality, context-aware presentations tailored to diverse communication settings, with minimal human input and reliable adherence to enterprise constraints.

Energy-based Generative Models for Distributed Acoustic Sensing Event Classification in Telecom Networks

Distributed fiber-optic sensing combined with machine learning enables continuous monitoring of telecom infrastructure. We employ generative modeling for event classification, supporting semi­ supervised learning, uncertainty calibration, and noise resilience. Our approach offers a scalable, data-efficient solution for real-world deployment in complex environments.

EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs

Enterprises are increasingly adopting Generative AI applications to extract insights from large volumes of multimodal documents in domains such as finance, law, healthcare, and industry. These documents contain structured and unstructured data (images, charts, handwritten texts, etc.) requiring robust AI systems for effective retrieval and comprehension. Recent advancements in Retrieval-Augmented Generation (RAG) frameworks and Vision-Language Models (VLMs) have improved retrieval performance on multimodal documents by processing pages as images. However, large-scale deployment remains challenging due to the high cost of LLM API usage and the slower inference speed of image-based processing of pages compared to text-based processing. To address these challenges, we propose EcoDoc, a cost-effective multimodal document processing system that dynamically selects the processing modalities for each page as an image or text based on page characteristics and query intent. Our experimental evaluation on TAT-DQA and DocVQA benchmarks shows that EcoDoc reduces average query processing latency by up to 2.29× and cost by up to 10×, without compromising accuracy.

Re-ranking the Context for Multimodal Retrieval Augmented Generation

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context with improved accuracy and reduced hallucinations. However, multi-modal RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output. In this paper, we aim to address the first challenge, i.e, improving the selection of relevant context from the knowledge-base in retrieval phase of the multi-modal RAG. Specifically, we leverage the relevancy score (RS) measure designed in our previous work for evaluating the RAG performance to select more relevant entries in retrieval process. The retrieval based on embeddings, say CLIP-based embedding, and cosine similarity usually perform poorly particularly for multi-modal data. We show that by using a more advanced relevancy measure, one can enhance the retrieval process by selecting more relevant pieces from the knowledge-base and eliminate the irrelevant pieces from the context by adaptively selecting up-to-?? entries instead of fixed number of entries. Our evaluation using COCO dataset demonstrates significant enhancement in selecting relevant context and accuracy of the generated response.

Efficient Semantic Communication Through Transformer-Aided Compression

Transformers, known for their attention mechanisms, have proven highly effective in focusing on critical elements within complex data. This feature can effectively be used to address the time-varying channels in wireless communication systems. In this work, we introduce a channel-aware adaptive framework for semantic communication, where different regions of the image are encoded and compressed based on their semantic content. By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches and dynamically categorize the patches to be compressed at various rates as a function of the instantaneous channel bandwidth. Our method enhances communication efficiency by adapting the encoding resolution to the content’s relevance, ensuring that even in highly constrained environments, critical information is preserved. We evaluate the proposed adaptive transmission framework using the TinyImageNet dataset, measuring both reconstruction quality and accuracy. The results demonstrate that our approach maintains high semantic fidelity while optimizing bandwidth, providing an effective solution for transmitting multiresolution data in limited bandwidth conditions.

DiffOptics: A Conditional Diffusion Model for Fiber Optics Sensing Data Imputation

We present a generative AI framework based on a conditional diffusion model for distributed acoustic sensing (DAS) data imputation. The proposed DiffOptics model generates high-quality DAS data of various acoustic events using telecom fiber cables.