Documentation

Databricks SQL Agent

The Databricks SQL Agent is a specialized component that enables natural language interaction with Databricks SQL warehouses, providing intelligent query generation and data analysis capabilities.

Databricks SQL Agent Component

Databricks SQL Agent interface and configuration options

Configuration Parameters

Required Parameters

  • Language Model: The AI model for processing
  • Warehouse ID: Databricks SQL warehouse identifier
  • Host URL: Databricks workspace URL
  • Access Token: Authentication token
  • Schema: Database schema information

Optional Configuration

  • Query Options: Query settings
    • maxRows: Maximum rows to return
    • timeout: Query timeout
    • caching: Enable result caching
  • Performance: Performance settings
    • concurrency: Concurrent queries
    • retryPolicy: Retry configuration
    • pooling: Connection pooling
  • Security: Security options
    • encryption: Data encryption
    • auditLogging: Query logging

Output Format

{
  "result": {
    "query": {
      "sql": string,
      "parameters": object
    },
    "data": {
      "columns": array,
      "rows": array,
      "rowCount": number
    },
    "metadata": {
      "executionTime": string,
      "warehouseId": string,
      "queryId": string
    }
  }
}

Example Usage

const databricksSQLAgent = new DatabricksSQLAgent({
  languageModel: "gpt-4",
  warehouseId: "warehouse_id",
  hostUrl: "https://your-workspace.cloud.databricks.com",
  accessToken: process.env.DATABRICKS_TOKEN,
  schema: {
    database: "sales_db",
    tables: ["orders", "customers", "products"]
  },
  queryOptions: {
    maxRows: 1000,
    timeout: 300,
    caching: true
  },
  performance: {
    concurrency: 5,
    retryPolicy: {
      maxAttempts: 3,
      backoff: "exponential"
    },
    pooling: {
      min: 1,
      max: 10
    }
  },
  security: {
    encryption: true,
    auditLogging: true
  }
});

const result = await databricksSQLAgent.process({
  input: "Show total sales by region for last quarter"
});

Best Practices

  • Use appropriate warehouse sizing
  • Implement query timeouts
  • Enable result caching
  • Monitor query performance
  • Follow security guidelines