CORDIAL-AI investigates how recent advances in artificial intelligence can be integrated with structured data services to support new forms of interaction with complex datasets. It explores how generative AI can interact with structured statistical systems in ways that remain traceable, reliable, and compatible with research data infrastructures.
The project combines several technologies and methodological components.
Large language models
Large language models are used to interpret natural-language queries submitted by users. These models analyse the intent of the query and identify the relevant datasets, variables, and geographic parameters required to retrieve the requested data.
AI agents
The system employs a set of specialised AI agents designed around the taxonomy of census flow datasets. These agents help decompose complex queries into structured components and coordinate the process of translating user requests into valid API calls.
Census flow data API
The project builds on an API-driven platform for census flow data that provides advanced subsetting capabilities to extract tailored datasets. The API performs all data processing, filtering, and aggregation.
Natural language to structured queries
The main area of focus of the project is the methodological challenge of translating natural-language questions into structured operations. This involves mapping user intent to the precise dataset identifiers, geographies, and variables required by the underlying data service.
Explainability and transparency
The prototype includes features designed to support transparency and data provenance. These allow users to inspect the API request generated from their query, explore the reasoning steps taken by the system, and verify the datasets used to produce the results.