SQL Query Equivalence Evaluator
The SQL Query Equivalence evaluator assesses whether two SQL queries are semantically equivalent despite potential syntactic differences. It determines if a generated SQL query would produce the same results as a reference query when executed against the same database.

SQL Query Equivalence component interface and configuration
Evaluation Notice: SQL queries may be syntactically different but semantically equivalent. This evaluator looks beyond surface-level syntax to determine if two queries would produce the same result set when executed.
Component Inputs
- Generated SQL: The SQL query generated by the RAG system to be evaluated
Example: "SELECT name, salary FROM employees WHERE department = 'Sales' AND salary > 50000;"
- Reference SQL: The ground truth or expected SQL query for comparison
Example: "SELECT name, salary FROM employees WHERE department = 'Sales' AND salary > 50000 ORDER BY salary DESC;"
- Table Schema / Context: Database schema information to help evaluate query validity and equivalence
Example: { "employees": ["id", "name", "department", "salary", "hire_date"] }
Component Outputs
- Evaluation Result: Qualitative explanation of the equivalence assessment, highlighting differences between the queries
Example: "The generated query is semantically equivalent to the reference query in terms of data selection, but differs in ordering. Both queries select the same rows, but the reference query sorts results by salary in descending order."
Score Interpretation
High Equivalence (0.7-1.0)
Queries are semantically equivalent and would produce the same results despite potential syntactic differences
Example Score: 0.95
This indicates that the queries would return essentially the same data with possibly minor differences like column order or sorting
Partial Equivalence (0.3-0.7)
Queries have substantial overlap but differ in some aspects like sorting, column selection, or minor filtering
Example Score: 0.50
This indicates partial equivalence where the queries retrieve similar but not identical result sets
Low Equivalence (0.0-0.3)
Significant semantic differences between the queries that would produce different results
Example Score: 0.15
This indicates that the queries would return substantially different results
Implementation Example
from ragas.metrics import SQLQueryEquivalence
# Create the metric
sql_equivalence = SQLQueryEquivalence()
# Use in evaluation
from datasets import Dataset
from ragas import evaluate
eval_dataset = Dataset.from_dict({
"question": ["Show me all employees in the Sales
department with a salary above 50000"],
"contexts": [["The employees table has columns:
id, name, department, salary, hire_date."]],
"answer": ["SELECT name, salary FROM employees
WHERE department = 'Sales' AND salary > 50000;"],
"reference_answer": ["SELECT name, salary FROM
employees WHERE department = 'Sales' AND salary > 50000
ORDER BY salary DESC;"]
})
result = evaluate(
eval_dataset,
metrics=[sql_equivalence]
)
print(result)
Use Cases
- Text-to-SQL Evaluation: Assess natural language interfaces that translate user queries to SQL statements
- SQL Rewriting: Evaluate systems that optimize or rewrite SQL queries while preserving semantics
- Database Tools: Test business intelligence tools that generate SQL from user interactions or natural language
- SQL Learning Systems: Assess educational platforms teaching SQL by evaluating student solutions against reference answers
- Query Generation: Evaluate LLM capabilities to generate SQL queries based on natural language descriptions
Best Practices
- Provide comprehensive schema information to enable accurate query analysis
- Treat ordering and formatting differences differently than semantic differences
- Consider database-specific SQL dialects and syntax variations
- Review low equivalence scores manually to identify specific discrepancies
- Use in combination with query execution tests when possible to verify actual result equivalence