Documentation

Google Cloud Connectors

Google Cloud connectors enable seamless integration with Google's cloud services, allowing your workflows to leverage Google's powerful data storage, processing, and management capabilities.

1.1 Bigtable Loader

The Bigtable Loader connects to Google Cloud Bigtable, a fully managed, high-performance NoSQL database service designed for large analytical and operational workloads. It enables you to extract data stored in Bigtable tables for use in your RAG applications.

Bigtable Loader Interface

Bigtable Loader Interface

Use Cases

  • Analyzing time series data for IoT applications
  • Processing financial data for market analysis
  • Working with massive datasets for machine learning training
  • Real-time analytics on high-volume data
  • Managing and querying wide-column formatted data

Inputs

  • Service Account Key Path: JSON file containing authentication information (required)

    Example: /path/to/service_account_key.json

  • Project Name: Google Cloud project name (required)

    Example: my-project-id

  • Bigtable Instance ID: Bigtable instance identifier (required)

    Example: my-bigtable-instance

  • Table Name: Bigtable table name (required)

    Example: user-events

  • Filter Column: Column to filter results (optional)

    Example: user_id

Outputs

Results in tabular or JSON format containing the retrieved Bigtable data.

Example Output:

[
  {
    "row_key": "user123",
    "data": {
      "personal_info:name": "Jane Smith",
      "personal_info:email": "jane@example.com",
      "activity:last_login": "2023-06-15T14:22:30Z",
      "activity:total_sessions": "42"
    }
  },
  {
    "row_key": "user456",
    "data": {
      "personal_info:name": "John Doe",
      "personal_info:email": "john@example.com",
      "activity:last_login": "2023-06-14T09:15:22Z",
      "activity:total_sessions": "27"
    }
  }
]

Implementation Notes

  • Bigtable is optimized for high throughput and low latency at scale
  • Use row key prefixes to optimize data retrieval patterns
  • Consider row and column family design for efficient data organization
  • Implement pagination when dealing with large datasets

1.2 GCP Firestore Loader

The Firestore Loader connects to Google Cloud Firestore, a flexible, scalable NoSQL cloud database for storing and syncing data for client- and server-side development. It allows you to retrieve document collections from Firestore for processing in your workflows.

Firestore Loader Interface

Firestore Loader Interface

Use Cases

  • Real-time data synchronization for collaborative applications
  • Storing and retrieving user-generated content
  • Managing application state and configuration
  • Building serverless applications with real-time updates
  • Mobile and web application backend data storage

Inputs

  • Service Account Key File: JSON file for authentication (required)

    Example: Upload a service account key file

  • Project Name: Google Cloud Project Name (required)

    Example: my-firestore-project

  • Database: Firestore database name (optional for default)

    Example: app-database

  • Collection: Target Collection to retrieve (required)

    Example: users

Outputs

Documents from the specified collection in JSON format.

Example Output:

[
  {
    "id": "user_1234",
    "name": "Jane Smith",
    "email": "jane@example.com",
    "account_type": "premium",
    "created_at": "2023-01-15T08:30:00Z",
    "preferences": {
      "theme": "dark",
      "notifications": true
    }
  },
  {
    "id": "user_5678",
    "name": "John Doe",
    "email": "john@example.com",
    "account_type": "basic",
    "created_at": "2023-02-22T14:15:30Z",
    "preferences": {
      "theme": "light",
      "notifications": false
    }
  }
]

Implementation Notes

  • Firestore supports automatic multi-region replication
  • Use collection group queries to search across subcollections
  • Consider implementing pagination for large document sets
  • Leverage transactions for operations that need atomicity

Workflow Integration Tips

  • Use Google OAuth Token connector to generate authentication for user-level access
  • Chain multiple Google connectors together to build comprehensive data pipelines
  • For file-based operations, consider using GCS File Loader for individual files and GCS Bucket Loader for multiple files
  • When working with email data, use Gmail Loader with appropriate label filtering
  • Combine BigQuery for structured data with Firestore/Datastore for document data to create rich applications

Authentication & Security

  • Create dedicated service accounts with the minimum permissions needed
  • Rotate service account keys regularly and store them securely
  • Enable audit logging for all Google Cloud services you connect to
  • Use VPC Service Controls when possible to enhance network security
  • For user data access, implement proper scopes when using OAuth tokens

Best Practices

  • Ensure appropriate API services are enabled in your Google Cloud Console
  • Implement request throttling to avoid reaching API quotas and limits
  • Use data filtering at the source to minimize data transfer and processing costs
  • Set up monitoring and alerting for your Google Cloud resources
  • Consider implementing caching for frequently accessed, rarely changing data
  • Follow the principle of least privilege when creating service accounts