Google Cloud Connectors

Google Cloud connectors enable seamless integration with Google's cloud services, allowing your workflows to leverage Google's powerful data storage, processing, and management capabilities.

1.1 Bigtable Loader

The Bigtable Loader connects to Google Cloud Bigtable, a fully managed, high-performance NoSQL database service designed for large analytical and operational workloads. It enables you to extract data stored in Bigtable tables for use in your RAG applications.

Bigtable Loader Interface

Use Cases

Analyzing time series data for IoT applications
Processing financial data for market analysis
Working with massive datasets for machine learning training
Real-time analytics on high-volume data
Managing and querying wide-column formatted data

Inputs

Service Account Key Path: JSON file containing authentication information (required)
Example: /path/to/service_account_key.json
Project Name: Google Cloud project name (required)
Example: my-project-id
Bigtable Instance ID: Bigtable instance identifier (required)
Example: my-bigtable-instance
Table Name: Bigtable table name (required)
Example: user-events
Filter Column: Column to filter results (optional)
Example: user_id

Outputs

Results in tabular or JSON format containing the retrieved Bigtable data.

Example Output:

[
  {
    "row_key": "user123",
    "data": {
      "personal_info:name": "Jane Smith",
      "personal_info:email": "jane@example.com",
      "activity:last_login": "2023-06-15T14:22:30Z",
      "activity:total_sessions": "42"
    }
  },
  {
    "row_key": "user456",
    "data": {
      "personal_info:name": "John Doe",
      "personal_info:email": "john@example.com",
      "activity:last_login": "2023-06-14T09:15:22Z",
      "activity:total_sessions": "27"
    }
  }
]

Implementation Notes

Bigtable is optimized for high throughput and low latency at scale
Use row key prefixes to optimize data retrieval patterns
Consider row and column family design for efficient data organization
Implement pagination when dealing with large datasets

1.2 GCP Firestore Loader

The Firestore Loader connects to Google Cloud Firestore, a flexible, scalable NoSQL cloud database for storing and syncing data for client- and server-side development. It allows you to retrieve document collections from Firestore for processing in your workflows.

Firestore Loader Interface

Use Cases

Real-time data synchronization for collaborative applications
Storing and retrieving user-generated content
Managing application state and configuration
Building serverless applications with real-time updates
Mobile and web application backend data storage

Inputs

Service Account Key File: JSON file for authentication (required)
Example: Upload a service account key file
Project Name: Google Cloud Project Name (required)
Example: my-firestore-project
Database: Firestore database name (optional for default)
Example: app-database
Collection: Target Collection to retrieve (required)
Example: users

Outputs

Documents from the specified collection in JSON format.

Example Output:

[
  {
    "id": "user_1234",
    "name": "Jane Smith",
    "email": "jane@example.com",
    "account_type": "premium",
    "created_at": "2023-01-15T08:30:00Z",
    "preferences": {
      "theme": "dark",
      "notifications": true
    }
  },
  {
    "id": "user_5678",
    "name": "John Doe",
    "email": "john@example.com",
    "account_type": "basic",
    "created_at": "2023-02-22T14:15:30Z",
    "preferences": {
      "theme": "light",
      "notifications": false
    }
  }
]

Implementation Notes

Firestore supports automatic multi-region replication
Use collection group queries to search across subcollections
Consider implementing pagination for large document sets
Leverage transactions for operations that need atomicity

Workflow Integration Tips

Use Google OAuth Token connector to generate authentication for user-level access
Chain multiple Google connectors together to build comprehensive data pipelines
For file-based operations, consider using GCS File Loader for individual files and GCS Bucket Loader for multiple files
When working with email data, use Gmail Loader with appropriate label filtering
Combine BigQuery for structured data with Firestore/Datastore for document data to create rich applications

Authentication & Security

Create dedicated service accounts with the minimum permissions needed
Rotate service account keys regularly and store them securely
Enable audit logging for all Google Cloud services you connect to
Use VPC Service Controls when possible to enhance network security
For user data access, implement proper scopes when using OAuth tokens

Best Practices

Ensure appropriate API services are enabled in your Google Cloud Console
Implement request throttling to avoid reaching API quotas and limits
Use data filtering at the source to minimize data transfer and processing costs
Set up monitoring and alerting for your Google Cloud resources
Consider implementing caching for frequently accessed, rarely changing data
Follow the principle of least privilege when creating service accounts

Documentation

Google Cloud Connectors

1.1 Bigtable Loader

Use Cases

Inputs

Outputs

Implementation Notes

1.2 GCP Firestore Loader

Use Cases

Inputs

Outputs

Implementation Notes