LLM-SQL-Mind is a framework for continual learning that replaces isolated text-to-SQL tasks with a dynamic, self-improving knowledge cycle. The system architecture integrates an LLM Query Generator with a SQL Executor & Validator to run candidate queries and capture critical feedback from both RDBMS error messages and execution results. By parsing signals such as syntax errors, missing joins, and unexpected cardinalities, the system identifies latent domain insights—including table relationships, field semantics, and hidden schema constraints—that are typically lost in traditional one-shot generation.
These insights are codified into natural language and stored in a persistent vector memory, creating a structured repository of past query pairs, column descriptions, and procedural query patterns. Through Retrieval-Augmented Generation (RAG), the system performs a similarity search to inject relevant historical facts and discovered rules (such as "ignore NULL names in filtering") directly into the LLM's prompt for future queries. This grounding in accumulated database-specific knowledge mimics the way human experts learn a data system over time, allowing the model to progressively reduce errors and achieve significantly higher execution accuracy.
This research project addresses the critical operational vulnerability posed by small, autonomous unmanned aerial vehicles (UAVs) to dismounted soldiers, for whom traditional, heavy detection systems like radar are logistically unfeasible. To overcome these limitations, the project introduces the Dynamic-Array Acoustic World Model (DAWM), a wearable sensing framework that utilizes miniature microphones distributed across a soldier's uniform to detect and localize threats. Unlike conventional acoustic arrays that require static, time-synchronized sensors, DAWM applies world-model and latent-dynamics paradigms from generative AI to adapt to the continuous, complex motion of the human body and the changing geometry of the sensors. By encoding sensor observations and environmental states into a compact latent space, the model predicts the temporal evolution of Acoustic Energy Maps (AEMs), enabling the system to proactively "imagine" and track acoustic sources, such as approaching drones, despite the noise and motion inherent to combat maneuvering.
To facilitate the training of this AI model, the project proposes a comprehensive simulation-based methodology to generate the necessary large-scale datasets, which currently do not exist for such dynamic sensing configurations. The research team will combine open-source human body kinematics simulators (such as OpenSIM) with acoustic simulation platforms (like SonicSim) to synthesize realistic audio streams that account for limb trajectories, torso orientation, and environmental reverberation. This data-generation pipeline serves as the foundation for training the DAWM architecture to disentangle threat signatures from the soldier’s motion and background noise. The project scope encompasses modeling these dynamic trajectories, generating massive synthetic datasets, and training a baseline model to validate the feasibility of providing continuous, short-range situational awareness using low-power, body-worn commercial sensors
Micro-learning offers flexible, bite-sized education but often lacks the structural coherence required for deep understanding, necessitating the integration of hyperlinked prerequisite information. To overcome the labor-intensive and unscalable nature of manually identifying these dependencies, this project introduces an automated Prerequisite Information Retrieval (PIR) system. By leveraging Large Language Models (LLMs), the system analyzes textual dependencies to autonomously detect prerequisite relationships and generate contextual hyperlinks, thereby reducing educator workload and enhancing the pedagogical quality of learning modules.
Unlike traditional prerequisite concept learning, which classifies binary relationships between terms, PIR focuses on retrieving specific documents that explain the foundational concepts needed to comprehend a given query text. The research advances this field by establishing a domain-agnostic benchmark dataset via synthetic generation and Wikipedia repurposing, and by evaluating novel LLM-based retrieval models. These models employ vector-space proximity techniques, such as embedding adaptation and concept extraction, to match queries with documents that provide the necessary background knowledge
This project addresses the critical challenge of ensuring reliability in Cyber-Physical Systems (CPS), which integrate physical processes with digital monitoring in safety-critical domains like predictive maintenance and healthcare. Because these complex systems exhibit multiple operational regimes and generate heterogeneous sensor data, detecting anomalies is difficult, particularly given the intrinsic scarcity of real anomalous samples for training and testing. Existing methods often lack reliable evaluation frameworks, making it risky to deploy models in mission-critical environments where standard average-performance metrics fail to capture worst-case risks or regime shifts.
To overcome these limitations, the proposed framework utilizes generative models, such as Variational Autoencoders (VAEs), to map multivariate time-series data into a latent space that reveals implicit system operating regimes. This approach enables three key innovations: the generation of physically plausible synthetic anomalies controlled via latent-space interpolation, a latent-space cross-validation strategy that treats held-out regimes as proxy anomalies to stress-test robustness, and latent-stratified evaluation metrics that provide stochastic bounds on failure risk. By benchmarking models against specific latent clusters rather than global averages, the framework offers a principled methodology for model selection and evaluation in the absence of abundant real-world anomaly data
Project CoEval is a novel framework designed to address the limitations of traditional Large Language Model (LLM) benchmarks, specifically the risks of benchmark leakage and the difficulty of distinguishing genuine reasoning from simple memorization. To achieve this, the project utilizes an ensemble of models that dynamically rotate through the roles of question generator, response evaluator, and target model, creating questions on the fly to prevent prior knowledge of the test data. This approach allows for the construction of a diverse evaluation environment by varying inference-time configurations—such as temperature, sampling methods, and prompts—which exposes variability in reasoning paths and produces a multidimensional "analytic cube" of performance metrics tailored to specific use cases rather than generic scenarios.
The framework evaluates models using two primary pillars of measurement: consensus, which assesses the stability of rankings across different generators, and dispersion, which quantifies a test's ability to differentiate between varying levels of model performance. Beyond simple scoring, CoEval profiles models on their distinct capabilities as generators, evaluators, and responders, revealing systematic biases and strengths that single-evaluator methods might miss. Furthermore, the project serves as a robust pipeline for synthetic data generation; by filtering outputs for high consensus and differentiation, CoEval creates "Golden Datasets" that provide reliable supervision for offline validation and targeted fine-tuning, independent of static or contaminated public benchmarks