MIT’s GenSQL bridges AI and SQL for easy database queries

BY Sophia Cantiller

Published 11 Jul 2024

Free Data Codes through Eyeglasses Stock Photo

Statistical analyses have been made simpler with the introduction of a generative artificial intelligence (AI) for databases, thanks to a new tool that MIT researchers have developed.  

GenSQL is built upon the existing structured query language (SQL) that lets users analyze complex statistics of tabular data with a few clicks.

Since its launch in the late 1970s, SQL has been used by many developers worldwide as a programming language for storing and manipulating information in a database. Using SQL to ask questions about data requires knowledge of established keywords, which are unfamiliar to non-experts.

With the integration of advanced AI modeling into this language, users may express their inquiries and receive answers without needing a wide technical background. This is what senior author Vikash Mansinghka hopes to achieve with his team, stating, “We think that, when we move from just querying data to asking questions of models and data, we are going to need an analogous language that teaches people the coherent questions you can ask a computer that has a probabilistic model of the data.”

Personalizing and Improving Database Queries

The MIT researchers also wanted GenSQL to be more specific by enabling users to query datasets and probabilistic models. Fueled by the lack of effective methods to combine probabilistic AI models with SQL and their goal to help databases understand human language, they developed their new technology to allow users to receive personalized insights about the data.

This way, when a developer is interested in determining whether or not they are underpaid, they can use GenSQL to gather ideas about how the salary data relates to them instead of receiving complicated numbers from database records.

Lead author Matthieu Huot said, “Looking at the data and trying to find some meaningful patterns by just using some simple statistical rules might miss important interactions. You really want to capture the correlations and the dependencies of the variables, which can be quite complicated in a model. With GenSQL, we want to enable a large set of users to query their data and their model without having to know all the details.”

Besides this, the probabilistic models incorporated in GenSQL let users audit the data used in the decision-making and include measures of uncertainty with each answer to fully inform them.  

Increased speed and accuracy

The researchers also reported that GenSQL can yield more accurate results than other AI-supported technologies in a few milliseconds, which is between 1.7 and 6.8 times faster.

Moreover, the generative AI tool showed promising performance in two cases. First, it succeeded in detecting mislabeled clinical trial information, pointing to its ability to spot anomalies and make predictions. In the other test, it generated synthetic data that are accurate with respect to real data, making it crucial in replicating sensitive information that is otherwise hard to access.

Looking ahead, Mansinghka and Huot, together with their team, plan to extend GenSQL’s coverage by doing large-scale modeling of human populations. With their eyes set on new optimizations and automation of the system, they hope to turn GenSQL into an AI expert that users can talk to about any database using plain language.