Fine-tuning FunctionGemma for SQL

Author

Chris von Csefalvay

Teaching FunctionGemma-270M to be good at text-to-SQL through SFT, GRPO and APO.

Environment setup

# Install dependencies
!pip install transformers torch accelerate bitsandbytes datasets trl peft sqlparse python-dotenv

Requirement already satisfied: transformers in /usr/local/lib/python3.12/dist-packages (4.57.3)

Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (2.9.0+cu126)

Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (1.12.0)

Collecting bitsandbytes

  Downloading bitsandbytes-0.49.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)

Requirement already satisfied: datasets in /usr/local/lib/python3.12/dist-packages (4.0.0)

Collecting trl

  Downloading trl-0.26.2-py3-none-any.whl.metadata (11 kB)

Requirement already satisfied: peft in /usr/local/lib/python3.12/dist-packages (0.18.0)

Requirement already satisfied: sqlparse in /usr/local/lib/python3.12/dist-packages (0.5.4)

Requirement already satisfied: python-dotenv in /usr/local/lib/python3.12/dist-packages (1.2.1)

Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from transformers) (3.20.0)

Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.36.0)

Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2.0.2)

Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (25.0)

Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers) (6.0.3)

Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2025.11.3)

Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from transformers) (2.32.4)

Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.22.1)

Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.7.0)

Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/dist-packages (from transformers) (4.67.1)

Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch) (4.15.0)

Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch) (75.2.0)

Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch) (1.14.0)

Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch) (3.6.1)

Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch) (3.1.6)

Requirement already satisfied: fsspec>=0.8.5 in /usr/local/lib/python3.12/dist-packages (from torch) (2025.3.0)

Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)

Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)

Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.80)

Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch) (9.10.2.21)

Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.4.1)

Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch) (11.3.0.4)

Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch) (10.3.7.77)

Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch) (11.7.1.2)

Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch) (12.5.4.2)

Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch) (0.7.1)

Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch) (2.27.5)

Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /usr/local/lib/python3.12/dist-packages (from torch) (3.3.20)

Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)

Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.85)

Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch) (1.11.1.6)

Requirement already satisfied: triton==3.5.0 in /usr/local/lib/python3.12/dist-packages (from torch) (3.5.0)

Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate) (5.9.5)

Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (18.1.0)

Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.3.8)

Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from datasets) (2.2.2)

Requirement already satisfied: xxhash in /usr/local/lib/python3.12/dist-packages (from datasets) (3.6.0)

Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.70.16)

Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.13.2)

Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (1.2.0)

Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.4.4)

Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.11)

Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2.5.0)

Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2025.11.12)

Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch) (1.3.0)

Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch) (3.0.3)

Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2.9.0.post0)

Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)

Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.3)

Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)

Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.4.0)

Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.4.0)

Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.8.0)

Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.7.0)

Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.4.1)

Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.22.0)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)

Downloading bitsandbytes-0.49.0-py3-none-manylinux_2_24_x86_64.whl (59.1 MB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.1/59.1 MB 20.0 MB/s eta 0:00:00:00:0100:01

Downloading trl-0.26.2-py3-none-any.whl (518 kB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 518.9/518.9 kB 40.6 MB/s eta 0:00:00

Installing collected packages: bitsandbytes, trl

Successfully installed bitsandbytes-0.49.0 trl-0.26.2

import json
import sqlite3
import os
from pathlib import Path
from typing import Optional
from dataclasses import dataclass
from collections import defaultdict
import re

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from tqdm.auto import tqdm
import pandas as pd

Download Spider benchmark

Spider is the standard text-to-SQL benchmark with 200 databases across 138 domains. We download it with the actual SQLite database files so we can execute generated queries.

!pip install gdown -q

import gdown
import zipfile
from pathlib import Path

SPIDER_DIR = Path("spider")

if not SPIDER_DIR.exists():
    # Download Spider dataset from Google Drive
    # File ID: 1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J
    url = "https://drive.google.com/uc?id=1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J"
    output = "spider.zip"
    
    print("Downloading Spider dataset...")
    gdown.download(url, output, quiet=False)
    
    print("Extracting...")
    with zipfile.ZipFile(output, 'r') as zip_ref:
        zip_ref.extractall(".")
    
    # Clean up
    os.remove(output)
    print(f"Spider dataset ready at {SPIDER_DIR}")
else:
    print(f"Spider dataset already exists at {SPIDER_DIR}")

Downloading Spider dataset...

Downloading...
From (original): https://drive.google.com/uc?id=1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J
From (redirected): https://drive.google.com/uc?id=1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J&confirm=t&uuid=f9d13177-b23e-4010-a6dd-d6469fc01931
To: /content/spider.zip
100%|██████████| 206M/206M [00:02<00:00, 74.2MB/s]

Extracting...
Spider dataset ready at spider

# Verify Spider structure
SPIDER_DIR = Path("spider_data")
spider_db_dir = SPIDER_DIR / "database"
spider_train = SPIDER_DIR / "train_spider.json"
spider_dev = SPIDER_DIR / "dev.json"
spider_tables = SPIDER_DIR / "tables.json"

print(f"Database directory exists: {spider_db_dir.exists()}")
print(f"Train file exists: {spider_train.exists()}")
print(f"Dev file exists: {spider_dev.exists()}")
print(f"Tables file exists: {spider_tables.exists()}")

if spider_db_dir.exists():
    databases = list(spider_db_dir.iterdir())
    print(f"\nNumber of databases: {len(databases)}")
    print(f"Sample databases: {[d.name for d in databases[:5]]}")

Database directory exists: True
Train file exists: True
Dev file exists: True
Tables file exists: True

Number of databases: 166
Sample databases: ['race_track', 'entrepreneur', 'flight_1', 'hr_1', 'battle_death']

SQL Benchmarker

The benchmarker evaluates text-to-SQL models using execution accuracy: whether the generated SQL produces the same results as the gold SQL when run against the actual database.

@dataclass
class BenchmarkResult:
    """Result of a single benchmark query."""
    question: str
    db_id: str
    gold_sql: str
    predicted_sql: str
    gold_result: Optional[list] = None
    predicted_result: Optional[list] = None
    execution_match: bool = False
    syntax_valid: bool = False
    error: Optional[str] = None


class SpiderBenchmarker:
    """Benchmarker for text-to-SQL using Spider dataset with execution accuracy."""
    
    def __init__(self, spider_dir: Path, split: str = "dev"):
        self.spider_dir = Path(spider_dir)
        self.db_dir = self.spider_dir / "database"
        
        # Load the appropriate split
        if split == "dev":
            data_file = self.spider_dir / "dev.json"
        else:
            data_file = self.spider_dir / "train_spider.json"
        
        with open(data_file) as f:
            self.data = json.load(f)
        
        # Load table schemas
        with open(self.spider_dir / "tables.json") as f:
            tables_data = json.load(f)
            self.schemas = {t["db_id"]: t for t in tables_data}
        
        print(f"Loaded {len(self.data)} examples from {split} split")
        print(f"Loaded schemas for {len(self.schemas)} databases")
    
    def get_schema_prompt(self, db_id: str) -> str:
        """Generate a schema description for prompting."""
        schema = self.schemas.get(db_id)
        if not schema:
            return ""
        
        lines = [f"Database: {db_id}", "Tables:"]
        
        table_names = schema["table_names_original"]
        column_names = schema["column_names_original"]
        column_types = schema["column_types"]
        primary_keys = schema.get("primary_keys", [])
        foreign_keys = schema.get("foreign_keys", [])
        
        # Group columns by table
        table_columns = defaultdict(list)
        for i, (table_idx, col_name) in enumerate(column_names):
            if table_idx >= 0:  # Skip the special "*" column
                col_type = column_types[i] if i < len(column_types) else "unknown"
                is_pk = i in primary_keys
                pk_marker = " (PK)" if is_pk else ""
                table_columns[table_idx].append(f"{col_name} {col_type}{pk_marker}")
        
        for idx, table_name in enumerate(table_names):
            cols = table_columns.get(idx, [])
            lines.append(f"  {table_name}: {', '.join(cols)}")
        
        # Add foreign keys
        if foreign_keys:
            lines.append("Foreign keys:")
            for fk in foreign_keys:
                if len(fk) == 2:
                    from_col = column_names[fk[0]]
                    to_col = column_names[fk[1]]
                    if from_col[0] >= 0 and to_col[0] >= 0:
                        from_table = table_names[from_col[0]]
                        to_table = table_names[to_col[0]]
                        lines.append(f"  {from_table}.{from_col[1]} -> {to_table}.{to_col[1]}")
        
        return "\n".join(lines)
    
    def execute_sql(self, db_id: str, sql: str, timeout: float = 5.0) -> tuple[Optional[list], Optional[str]]:
        """Execute SQL against a database and return results."""
        db_path = self.db_dir / db_id / f"{db_id}.sqlite"
        
        if not db_path.exists():
            return None, f"Database not found: {db_path}"
        
        try:
            conn = sqlite3.connect(str(db_path), timeout=timeout)
            conn.text_factory = str
            cursor = conn.cursor()
            cursor.execute(sql)
            results = cursor.fetchall()
            conn.close()
            return results, None
        except Exception as e:
            return None, str(e)
    
    def results_match(self, result1: Optional[list], result2: Optional[list]) -> bool:
        """Check if two query results match (order-independent for sets)."""
        if result1 is None or result2 is None:
            return False
        
        # Convert to sets of tuples for order-independent comparison
        try:
            set1 = set(tuple(row) if isinstance(row, (list, tuple)) else (row,) for row in result1)
            set2 = set(tuple(row) if isinstance(row, (list, tuple)) else (row,) for row in result2)
            return set1 == set2
        except (TypeError, ValueError):
            # Fall back to list comparison if unhashable
            return sorted(str(r) for r in result1) == sorted(str(r) for r in result2)
    
    def evaluate_single(self, example: dict, predicted_sql: str) -> BenchmarkResult:
        """Evaluate a single prediction."""
        question = example["question"]
        db_id = example["db_id"]
        gold_sql = example["query"]
        
        result = BenchmarkResult(
            question=question,
            db_id=db_id,
            gold_sql=gold_sql,
            predicted_sql=predicted_sql
        )
        
        # Execute gold SQL
        gold_result, gold_error = self.execute_sql(db_id, gold_sql)
        result.gold_result = gold_result
        
        if gold_error:
            result.error = f"Gold SQL error: {gold_error}"
            return result
        
        # Execute predicted SQL
        pred_result, pred_error = self.execute_sql(db_id, predicted_sql)
        result.predicted_result = pred_result
        
        if pred_error:
            result.error = f"Predicted SQL error: {pred_error}"
            result.syntax_valid = False
        else:
            result.syntax_valid = True
            result.execution_match = self.results_match(gold_result, pred_result)
        
        return result
    
    def run_benchmark(self, model_fn, num_samples: Optional[int] = None, 
                      verbose: bool = True) -> list[BenchmarkResult]:
        """
        Run the benchmark with a model function.
        
        Args:
            model_fn: Function that takes (question, schema_prompt) and returns SQL
            num_samples: Limit number of samples (None for all)
            verbose: Print progress
        
        Returns:
            List of BenchmarkResult objects
        """
        samples = self.data[:num_samples] if num_samples else self.data
        results = []
        
        iterator = tqdm(samples, desc="Benchmarking") if verbose else samples
        
        for example in iterator:
            schema_prompt = self.get_schema_prompt(example["db_id"])
            
            try:
                predicted_sql = model_fn(example["question"], schema_prompt)
            except Exception as e:
                predicted_sql = ""
                
            result = self.evaluate_single(example, predicted_sql)
            results.append(result)
        
        return results
    
    def compute_metrics(self, results: list[BenchmarkResult]) -> dict:
        """Compute aggregate metrics from results."""
        total = len(results)
        if total == 0:
            return {"total": 0}
        
        syntax_valid = sum(1 for r in results if r.syntax_valid)
        execution_match = sum(1 for r in results if r.execution_match)
        
        return {
            "total": total,
            "syntax_valid": syntax_valid,
            "syntax_accuracy": syntax_valid / total,
            "execution_match": execution_match,
            "execution_accuracy": execution_match / total,
        }


print("SpiderBenchmarker class defined")

SpiderBenchmarker class defined

Load FunctionGemma-270M

We load the base FunctionGemma-270M model to establish baseline performance before fine-tuning.

# Load FunctionGemma-270M
MODEL_ID = "google/functiongemma-270m-it"

print(f"Loading {MODEL_ID}...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    attn_implementation="eager",
    torch_dtype="auto",
)
model.eval()
print(f"Model loaded on {model.device}")

Loading google/functiongemma-270m-it...
Model loaded on cuda:0

# Define the SQL execution tool for FunctionGemma
SQL_TOOL = {
    "function": {
        "name": "execute_sql",
        "description": "Execute a SQL query to answer the user's question. The query parameter should contain a valid SELECT statement.",
        "parameters": {
            "type": "OBJECT",
            "properties": {
                "query": {
                    "type": "STRING",
                    "description": "A valid SQL SELECT statement"
                }
            },
            "required": ["query"]
        }
    }
}


def parse_function_call(output: str) -> dict:
    """Parse FunctionGemma's function call format.
    
    Format: <start_function_call>call:func_name{param:<escape>value<escape>}<end_function_call>
    """
    import re
    
    # Try to extract function call
    match = re.search(r'call:(\w+)\{(.+?)\}', output)
    if match:
        func_name = match.group(1)
        params_str = match.group(2)
        
        # Parse parameters: key:<escape>value<escape>
        params = {}
        param_matches = re.findall(r'(\w+):<escape>(.+?)<escape>', params_str)
        for key, value in param_matches:
            params[key] = value
        
        return {"name": func_name, "parameters": params}
    
    return {}


def generate_sql(question: str, schema: str, max_new_tokens: int = 256) -> str:
    """Generate SQL from a natural language question using FunctionGemma."""
    
    messages = [
        {
            "role": "developer",
            "content": "You are a model that can do function calling with the following functions"
        },
        {
            "role": "user",
            "content": f"""{schema}

Question: {question}

Call execute_sql with the appropriate SQL query."""
        }
    ]
    
    prompt = tokenizer.apply_chat_template(
        messages,
        tools=[SQL_TOOL],
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    # Decode WITHOUT skipping special tokens to preserve function call format
    full_output = tokenizer.decode(outputs[0], skip_special_tokens=False)
    new_output = full_output[len(tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=False)):]
        
    # Parse the FunctionGemma function call format
    func_call = parse_function_call(new_output)
    
    if func_call and "parameters" in func_call:
        sql = func_call["parameters"].get("query", "")
    else:
        # Fallback: try to extract SQL directly
        sql = new_output.strip()
    
    # Clean up
    if sql:
        sql = sql.split(";")[0].strip()
        if sql and not sql.endswith(";"):
            sql += ";"
    
    return sql


# Test the generation
test_schema = """Database: concert_singer
Tables:
  singer: Singer_ID (PK), Name, Country, Age"""

test_question = "How many singers are there?"

print("Test generation:")
result = generate_sql(test_question, test_schema)
print(f"Extracted SQL: {result}")

Test generation:
Extracted SQL: SELECT singer_id, name, country, age FROM singer;

Run Spider benchmark

We run the benchmark on the dev set to establish baseline performance.

# Initialise benchmarker
benchmarker = SpiderBenchmarker(SPIDER_DIR, split="dev")

# Run benchmark (use subset for initial testing)
NUM_SAMPLES = 100  # Set to None for full benchmark

print(f"Running benchmark on {NUM_SAMPLES or 'all'} samples...")
results = benchmarker.run_benchmark(generate_sql, num_samples=NUM_SAMPLES)

# Compute and display metrics
metrics = benchmarker.compute_metrics(results)
print("\n" + "="*50)
print("BENCHMARK RESULTS: FunctionGemma-270m-it (baseline)")
print("="*50)
print(f"Total samples:       {metrics['total']}")
print(f"Syntax valid:        {metrics['syntax_valid']} ({metrics['syntax_accuracy']*100:.1f}%)")
print(f"Execution match:     {metrics['execution_match']} ({metrics['execution_accuracy']*100:.1f}%)")
print("="*50)

Loaded 1034 examples from dev split
Loaded schemas for 166 databases
Running benchmark on 100 samples...

Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id, singer_id FROM concert_singer WHERE concert_id = concert_id AND singer_id = singer_id AND concert_id = stadium_id<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id FROM singer_data WHERE concert_id = concert_id<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text ORDER BY age DESC LIMIT 100000;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text ORDER BY singer.Age DESC LIMIT 10;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id:PK, singer_id:PK, age: AVG(age) AS min_age, age: AVG(age) AS max_age FROM concert_singer WHERE country = 'FR' AND song_name = 'concert_singer' GROUP BY singer_id ORDER BY BY age DESC LIMIT 10000 WHERE singer_id:PK;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id, singer_id, singer_name, age FROM concert_singer WHERE concert_id = 'stadium_singer_2019_en' AND singer_id = 'singer_french_female_100' AND concert_id = 'stadium_singer_2019_en' AND song_name = 'French_singers_2019' AND age BETWEEN 18 AND 60<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text ORDER BY singer.Singer_ID text ASC;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text ORDER BY singer.Singer_ID text ASC;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT countries WHERE singer_in_concert.Singer_ID = ? && singer.Singer_ID = ? && singer.Singer_ID = ? && concert.concert_ID -> stadium.Stadium_ID && singer.Singer_ID -> singer.Singer_ID && singer.Singer_ID -> concert.concert_ID<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT countries, singer_in_concert FROM concert_singer WHERE concert_ID = ? AND singer_ID = ? ORDER BY singer_ID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id. Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID<escape>, Name:<escape>Singer_ID<escape>, Capacity:5000, Highest number:<escape>Lowest number<escape>, Lowest number:<escape>Lowest number<escape>, Average number:<escape>Average number<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID<escape>, Name:<escape>Singer_ID<escape>, Capacity:5000, Highest number:<escape>Lowest number<escape>, Lowest number:<escape>Lowest number<escape>, Average number:<escape>Average number<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:PK, Location:<escape>Name text, Capacity number, Highest number, Lowest number, Average number<escape>? FROM stadium:Stadium_ID:PK ORDER BY Location:<escape>Name text, Capacity number, Highest number, Lowest number, Average number<escape> ORDER BY Capacity number:<escape>Stadium_ID:PK, Location:<escape>Name text, Capacity number, Highest number, Lowest number, Average number<escape> ORDER BY Singer_ID:<escape>Singer_ID<escape> INTO concert_singer.stadium FROM concert:concert_ID:PK WHERE singer_in_concert.singer_ID = singer.Singer_ID AND concert_ID:PK WHERE stadium:<escape>Stadium_ID:PK, Location:<escape>Name text, Capacity number, Highest number, Lowest number, Average number<escape><escape> ORDER BY Singer_ID:<escape>Singer_ID<escape> INTO concert_singer.stadium USING concert.Stadium_ID -> singer_in_concert.singer_ID INTO concert_singer.stadium WHERE stadium:<escape>Stadium_ID:PK, Location:<escape>Name text, Capacity number, Highest number, Lowest number, Average number<escape><escape> ORDER BY Singer_ID:<escape>Singer_ID<escape> INTO concert_
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID number (PK), Name text, Capacity number, Highest number, Lowest number, Average number<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID text, Name text, Capacity number, Highest number, Lowest number, Average number<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Singer_ID:PK, Singer_ID:PK, Concert_Name text, Song_Name text, Song_release_year text, Age number, Is_male others FROM concert_singer WHERE Stadium_ID:12345<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT AVG(Capacity) AS Highest_Capacity FROM stadium WHERE Query = 'Average capacities' ORDER BY Query;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT AVG(Capacity) AS Highest_Capacity FROM stadium WHERE Query = 'Average capacities' ORDER BY Query;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id:PK, singer_id:PK, concert_id:PK, concert_name: 'Stadium_ID' AS singer_name, Song_name: 'Stadium_ID' AS song_name, Age: n AS age, Is_male others ? false ? true ? true<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(concert_ID) AS year2014 OR concert_ID IN (SELECT singer_id FROM singer_in_concert WHERE Singer_ID = 'your_singer_id') WHERE concert_ID IN (SELECT concert_id FROM concert_singer WHERE Stadium_ID = 'stadium_id') GROUP BY YEAR2014 OR YEAR2015 ORDER BY YEAR2014 OR YEAR2015 BY year2014;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(concert_ID) AS count FROM concert_singer WHERE concert_ID = ? AND year = ? AND Stadium_ID = ?<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID number (PK), Location text, Name text, Capacity number, Highest number, Lowest number, Average number<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Concert_Name:<escape>Concert_Name text, Stadium_ID:1, Year text, Singer_ID:<escape>Singer_ID text<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID text, Location text, Name text, Capacity number, Highest number, Lowest number, Average number<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(concert_id) AS num_concerts FROM concert_singer WHERE stadium_id = 'Stadium_ID_NUMBER' AND singer_id = 'Singer_ID_NUMBER' AND concert_id = 'concert_ID_NUMBER'<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID number (PK), Location text, Name text, Capacity number, Highest number, Lowest number, Average number<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Stadium_ID:1, Location:<escape>Stadium_ID text, Location text, Name text, Capacity number, Highest number, Lowest number, Average number<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT Singer_ID:PK, Singer_ID:PK, Concert_Name text, Song_Name text, Song_release_year text, Age number, Is_male others FROM concert_singer WHERE Stadium_ID:12345<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_id text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(CASE WHEN Stadium_ID = 'Stadium_ID' THEN 1 ELSE 0 END) AS num_concerts FROM concert_singer WHERE query = 'Count all concerts in Stadium_ID = Stadium_ID'<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT YEAR(concert_singer.concert_ID) AS year_concerts<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT singer_id FROM singer_in_concert WHERE concert_id = stadium_id;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id FROM singer_data WHERE singer_id = ? AND age > 40 AND is_male:true and singer_id IN (SELECT singer_id FROM concert_singer WHERE concert_id = ? AND singer_id = ? AND concert_name = ? AND stadium_id = ? WHERE stadium_id = ? AND year = ?)<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id FROM singer_in_concert WHERE concert_id = ? AND concert_name = ? AND singer_id != ? AND song_name = ?<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT singer_id FROM singer_in_concert WHERE concert_id != stadium_id ORDER BY singer_id;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id, singer_name FROM concert_singer WHERE concert_id = 'stadium_id_1' AND concert_name = 'concert_singer' AND singer_id = 'singer_1' GROUP BY concert_id;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id, singer_name, concert_id FROM concert_singer WHERE concert_id = concert_id AND singer_id = singer_id AND concert_id = stadium_id<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id:PK, Singer_id:PK, concert_id:PK, concert_name: 'Stadium_ID' AS singer_name, Song_name: 'Stadium_ID' AS song_name, Age: n AS age, Is_male: true, others: false, Stadium_ID: Stadium_ID<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id:PK, Singer_id:PK, concert_id:PK, concert_name: 'Stadium_ID' AS singer_name, Song_name: 'Stadium_ID' AS song_name, Age: n AS age, Is_male: true, others: false, Stadium_ID: Stadium_ID<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_id text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_id text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_id text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text ORDER BY singer.Singer_ID text ASC;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text ORDER BY singer.Singer_ID text ASC;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_id text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT concert_id.concert_id text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT singer_id.Singer_ID text;<escape>,query:<escape>SELECT singer_id.Singer_ID text;<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECT singer_id. Singer_ID text;<escape>,query:<escape>SELECT singer_id. Singer_ID text;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(concert_id) AS num_concerts FROM concert_singer WHERE capacity = 'highest' AND stadium_id = 'stadium_id_1'<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(concert_ID) AS num_concerts FROM concert_singer WHERE capacity = 'Largest' AND stadium_id = 'Stadium_ID'<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID FROM Pets WHERE PetID = 12345 AND Weight > 10<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID FROM Pets WHERE PetID = 10 AND Has_Pet.StuID > 10<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID FROM Pets WHERE PetID = 12345 ORDER BY PetType ASC;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID FROM Pets WHERE PetID = 12345 AND PetType = 'Dog<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code text
FROM Pets
WHEREStuID = 12345 AND PetID = 600
AND PetType = 'pet' GROUP BY PetID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetType, pet_age, weight NUMBER ORDER BY Has_Pet.StuID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code, text FROM pets_1 WHEREStuID > 20<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID ORDER BYStuID ASC LIMIT 10;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code text FROM Pets WHEREStuID:stuID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID FROM Pets WHERE PetID = '101' ANDStuID = '100' AND has_pet:true<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT Pets.PetType TEXT, COUNT(Has_Pet.StuID) AS num_distinct_type FROM Pets WHEREStuID IN (SELECT DISTINCTStuID FROM Student WHEREStuID IN (SELECT DISTINCTStuID FROM Has_Pet WHEREStuID IN (SELECT DISTINCTStuID FROM Pets WHERE PetID IN (SELECT PetID FROM Pets WHERE PetType = 'Pet')))) GROUP BY num_distinct_type ORDER BY num_distinct_type;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetType, pet_age, weight from Pets WHERE PetID = StuID and Has_Pet = true<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID.StuID + 1,StuID.StuID + 1, cat_pet.name text, dog_pet.name text, has_pet.stuID -> student.stuID + 1<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID,StuID FROM Pets WHEREStuID = 12345<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID.StuID + 1,StuID.StuID + 1 FROM Pets WHERE has_pet.StuID = StuID AND has_pet.PetID = 'cat' AND pet_age > 1000000000 Appleouse number<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT StuID.StuID + 1 AS first_name FROM Pets WHERE has_pet.StuID = StuID AND has_pet.PetID = PetID<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:1.1, LName:<escape>Husband<escape>, Fname:<escape>Husband<escape>, Age:18, Sex:<escape>Male<escape>, Major:<escape>Student<escape>, Advisor:<escape>Husband<escape>, city_code:<escape>Tokyo<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECTStuID:1.1, LName:<escape>Husband<escape>, Fname:<escape>Husband<escape>, Age:19, Sex:<escape>Female<escape>, Major:<escape>Student<escape>, Advisor:<escape>Husband<escape>, city_code:<escape>Tokyo<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:stuID, LName:text, Fname:text, Age:number, Sex:text, Major:text ORDER BYStuID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID FROM Pets WHEREStuID != StuID AND has_pet: no<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID FROM Student WHEREStuID != StuID and has_pet:CANDID:false<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID,StuID FROM Pets WHEREStuID != StuID AND has_pet: no<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID,StuID FROM Pets WHEREStuID != StuID AND has_pet: no<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID.StuID, PetID.PetType, pet_age:18000, weight:10000000000000000000000000000000, PetID:1 as PetID<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:1, LName:<escape>PetID<escape>, Fname:<escape>PetType<escape>, Age:18, Sex:<escape>Female<escape>, Major:<escape>Major<escape>, Advisor:<escape>Advisor Number<escape>, city_code:<escape>City_Code<escape>}<end_function_call><start_function_call>call:execute_sql{query:<escape>SELECTStuID:1, LName:<escape>PetID<escape>, Fname:<escape>PetType<escape>, Age:18, Sex:<escape>Male<escape>, Major:<escape>Major<escape>, advisor:<escape>Advisor Number<escape>, city_code:<escape>City_Code<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID,StuID FROM Pets WHERE Age > 1 GROUP BY HAVING (has_pet);<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID,StuID FROM Pets WHEREStuID > 1;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT
    s.stuID,
    AVG(f.age) AS avg_age,
    MAX(f.age) AS max_age
FROM Pets.Has_Pet AS p
    WHERE p.stuID = s.stuID
    AND p.pet_age = f.age
    GROUP BY s.stuID
    ORDER BY s.stuID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code text FROM Pets WHERE StuID = 12345;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT AVG(Weight) FOR StuID IN (SELECT Student.StuID FROM Pets WHEREStuID =Stu_ID) FOR PetID IN (SELECT PetID FROM Pets WHERE PetID = Pet_ID)<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code text FROM Pets WHEREStuID =stu_id;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID ORDER BYStuID ASC LIMIT 10;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:stuID, LName:text, Fname:text, Age:number, Sex:text, Major:number, Advisor:number, city_code:text[]<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuIDNUMBER, LName text, Fname text, Age number, Sex text, Major number, Advisor number, city_code text FROM pets_1 WHEREStuIDNUMBER = 'Smith'<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:stuID, LName text, Fname text, Age number, Sex text, Major number, Advisor number, city_code text FROM Pets WHEREStuID:stuID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID.stuID, has_pet. StuID;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID.StuID NUMBER(100,2) AS numPeterscript, LName text, Fname text, Age number, Sex text, Major number, Advisor number, city_code text FROM pets_1. StuID NUMBER(100,2)<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, PetID ORDER BYStuID ASC LIMIT 1 LIMIT 1;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID,StuID FROM Pets WHEREStuID > 1, PetID: some_value<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:1, LName:<escape>cat<escape>, Fname:<escape>dog<escape>, Age:3, Sex:<escape>male<escape>, Major:<escape>animal<escape>, Advisor:<escape>female<escape>, city_code:<escape>city<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID:1234, LName:<escape>cat<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code FROM students WHEREStuID such thatStuID is not in any pet group<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECTStuID, LName, Fname, Age, Sex, Major, Advisor, city_code, text FROM students WHEREStuID such as: StuID != NULL AND has_pet such as: StuID != NULL AND not_owns_pet such as: StuID != NULL<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(continent)<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(continent)<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT continent_id, continent_name, COUNT(car_makers.Id) GROUP BY continent_id ORDER BY continent_id;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT continent.ContId, continent.CountryName, COUNT(car_makers.Id) AS count FROM car_makers WHERE continent.CountryName = 'Car_1' AND model.Name = 'Car_1' GROUP BY region.ContId ORDER BY region.ContId;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(DISTINCT ContId | Continent TEXT) GROUP BY ? ORDER BY ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT COUNT(DISTINCT ContId | Continent TEXT) GROUP BY ? ORDER BY ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? |
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT Maker TEXT, Make TEXT, Model TEXT, COUNT(Car_Names TEXT) AS Make TEXT, SUM(Cars_Data TEXT) AS MPG TEXT, COUNT(Car_Names TEXT) AS Model TEXT, COUNT(Car_Names TEXT) AS Car_Names TEXT ORDER BY Maker TEXT;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT Car_Maker_Name, Make_Id, Model_Text, Car_Name, MPG, Cylinders, Edispl, Horsepower, Weight, Accelerate, Year[] FROM car_1 WHERE Country = 'Car_1' AND Continent = 'Cont' AND Car_Maker = 'Model' AND Model = 'Car_makers' GROUP BY CAR_MAKER_ID ORDER BY YEAR DESC;<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT MIN(PMR)<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT CarName | MakerName | Make | Model | MPG | Cylinders | Edispl | Horsepower | YEAR |<escape> out o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT model FROM car_models WHERE weight < average_weight<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT Continent, CountryName, Car_makers.Id, Model, Car_names.Make, Car_names.Model, Cars_data.Id, MPG, Cylinders, Edispl, Horsepower, Weight, Accelerate, Year from car_1.contids.car_makers.Id? WHERE Car_makers.Id = 'YOUR_CAR_ID' AND Model = 'YOUR_CAR_MODEL' AND Car_names.Make = 'YOUR_CAR_TEAM_NAME' AND Car_names.Model = 'YOUR_CAR_MODEL_DESCRIPTION'<escape>}<end_function_call><start_function_response>
Raw output: <start_function_call>call:execute_sql{query:<escape>SELECT DISTINCT Maker TEXT, Make TEXT, Model TEXT, Car Name TEXT, Cars_Data TEXT WHERE Id = 1970<escape>}<end_function_call><start_function_response>

==================================================
BENCHMARK RESULTS: FunctionGemma-270M (baseline)
==================================================
Total samples:       100
Syntax valid:        0 (0.0%)
Execution match:     0 (0.0%)
==================================================

# Show 3 detailed examples
import difflib

def show_example(result, idx):
    print(f"\n{'='*60}")
    print(f"EXAMPLE {idx + 1}")
    print(f"{'='*60}")
    
    print(f"\nDatabase: {result.db_id}")
    print(f"Question: {result.question}")
    
    print(f"\n--- Generated SQL ---")
    print(result.predicted_sql)
    
    print(f"\n--- Gold SQL ---")
    print(result.gold_sql)
    
    # Show diff
    print(f"\n--- Diff (generated vs gold) ---")
    diff = difflib.unified_diff(
        result.gold_sql.splitlines(keepends=True),
        result.predicted_sql.splitlines(keepends=True),
        fromfile='gold',
        tofile='generated',
        lineterm=''
    )
    diff_text = ''.join(diff)
    if diff_text:
        print(diff_text)
    else:
        print("(identical)")
    
    # Show execution results
    print(f"\n--- Execution Results ---")
    print(f"Gold result:      {result.gold_result[:3] if result.gold_result else 'None'}{'...' if result.gold_result and len(result.gold_result) > 3 else ''}")
    print(f"Generated result: {result.predicted_result[:3] if result.predicted_result else 'None'}{'...' if result.predicted_result and len(result.predicted_result) > 3 else ''}")
    
    print(f"\n--- Verdict ---")
    print(f"Syntax valid:     {result.syntax_valid}")
    print(f"Execution match:  {result.execution_match}")
    if result.error:
        print(f"Error:            {result.error}")


# Show 3 examples: 1 success (if any), 1 wrong result, 1 syntax error
examples_to_show = []

# Find one successful match
for r in results:
    if r.execution_match:
        examples_to_show.append(r)
        break

# Find one wrong result (valid syntax but wrong output)
for r in results:
    if r.syntax_valid and not r.execution_match:
        examples_to_show.append(r)
        break

# Find one syntax error
for r in results:
    if not r.syntax_valid:
        examples_to_show.append(r)
        break

# If we don't have 3, just take first 3
while len(examples_to_show) < 3 and len(results) > len(examples_to_show):
    for r in results:
        if r not in examples_to_show:
            examples_to_show.append(r)
            break

for idx, result in enumerate(examples_to_show):
    show_example(result, idx)


============================================================
EXAMPLE 1
============================================================

Database: concert_singer
Question: How many singers do we have?

--- Generated SQL ---
SELECT singer_id, singer_id FROM concert_singer WHERE concert_id = concert_id AND singer_id = singer_id AND concert_id = stadium_id;

--- Gold SQL ---
SELECT count(*) FROM singer

--- Diff (generated vs gold) ---
--- gold+++ generated@@ -1 +1 @@-SELECT count(*) FROM singer+SELECT singer_id, singer_id FROM concert_singer WHERE concert_id = concert_id AND singer_id = singer_id AND concert_id = stadium_id;

--- Execution Results ---
Gold result:      [(6,)]
Generated result: None

--- Verdict ---
Syntax valid:     False
Execution match:  False
Error:            Predicted SQL error: no such table: concert_singer

============================================================
EXAMPLE 2
============================================================

Database: concert_singer
Question: What is the total number of singers?

--- Generated SQL ---
SELECT singer_id FROM singer_data WHERE concert_id = concert_id;

--- Gold SQL ---
SELECT count(*) FROM singer

--- Diff (generated vs gold) ---
--- gold+++ generated@@ -1 +1 @@-SELECT count(*) FROM singer+SELECT singer_id FROM singer_data WHERE concert_id = concert_id;

--- Execution Results ---
Gold result:      [(6,)]
Generated result: None

--- Verdict ---
Syntax valid:     False
Execution match:  False
Error:            Predicted SQL error: no such table: singer_data

============================================================
EXAMPLE 3
============================================================

Database: concert_singer
Question: Show name, country, age for all singers ordered by age from the oldest to the youngest.

--- Generated SQL ---
SELECT singer_id.Singer_ID text ORDER BY age DESC LIMIT 100000;

--- Gold SQL ---
SELECT name ,  country ,  age FROM singer ORDER BY age DESC

--- Diff (generated vs gold) ---
--- gold+++ generated@@ -1 +1 @@-SELECT name ,  country ,  age FROM singer ORDER BY age DESC+SELECT singer_id.Singer_ID text ORDER BY age DESC LIMIT 100000;

--- Execution Results ---
Gold result:      [('Joe Sharp', 'Netherlands', 52), ('John Nizinik', 'France', 43), ('Rose White', 'France', 41)]...
Generated result: None

--- Verdict ---
Syntax valid:     False
Execution match:  False
Error:            Predicted SQL error: no such column: singer_id.Singer_ID

# Save results for analysis
results_df = pd.DataFrame([
    {
        "question": r.question,
        "db_id": r.db_id,
        "gold_sql": r.gold_sql,
        "predicted_sql": r.predicted_sql,
        "syntax_valid": r.syntax_valid,
        "execution_match": r.execution_match,
        "error": r.error,
    }
    for r in results
])

results_df.to_csv("baseline_benchmark_results.csv", index=False)
print(f"Results saved to baseline_benchmark_results.csv")

# Summary by database
print("\nExecution accuracy by database (top 10):")
db_accuracy = results_df.groupby("db_id")["execution_match"].agg(["sum", "count"])
db_accuracy["accuracy"] = db_accuracy["sum"] / db_accuracy["count"]
print(db_accuracy.sort_values("accuracy", ascending=False).head(10))

Results saved to baseline_benchmark_results.csv

Execution accuracy by database (top 10):
                sum  count  accuracy
db_id                               
car_1             0     13       0.0
concert_singer    0     45       0.0
pets_1            0     42       0.0

So, unsurprisingly, we find what we’ve known all along: out of the box, FunctionGemma-270m-it is not a very good text-to-SQL model.

What can we do about that? Fine-tuning!

Round 1: Supervised Fine-Tuning (SFT)

Citation

BibTeX citation:

@misc{csefalvay,
  author = {{Chris von Csefalvay}},
  title = {Fine-Tuning {FunctionGemma} for {SQL}},
  url = {https://chrisvoncsefalvay.com/posts/babys-first-function-caller/finetuning-for-sql.html},
  langid = {en-GB}
}

For attribution, please cite this work as:

Chris von Csefalvay. n.d. “Fine-Tuning FunctionGemma for SQL.” https://chrisvoncsefalvay.com/posts/babys-first-function-caller/finetuning-for-sql.html.