Merge Data Component

The Merge Data component combines multiple data sources using configurable merge operations. It supports various data formats, handles conflicts, and provides detailed merge statistics while maintaining data integrity.

Merge Data Architecture

Merge Data workflow and architecture

Configuration Parameters

Required Parameters

  • dataInputs: Array of input data sources
    • Arrays
    • Objects
    • Dataframes
    • JSON structures
  • mergeOperation: Type of merge to perform
    • concat: Simple concatenation
    • join: SQL-style joins (inner, outer, left, right)
    • union: Set operations
    • custom: User-defined merge function

Output Format

{
  "dataframe": {
    "data": Array<any>,
    "columns": string[],
    "index": Array<number | string>,
    "metadata": {
      "shape": [number, number],
      "dtypes": Object
    }
  },
  "data": {
    "merged_result": any,
    "format": string,
    "size": number
  },
  "statistics": {
    "input_counts": number[],
    "output_count": number,
    "merge_time": number,
    "memory_usage": number
  },
  "operations": {
    "type": string,
    "parameters": Object,
    "success": boolean
  },
  "conflicts": {
    "count": number,
    "resolved": number,
    "strategy": string,
    "details": Array<{
      "location": string,
      "values": any[],
      "resolution": string
    }>
  },
  "metadata": {
    "source_types": string[],
    "result_type": string,
    "processing_info": {
      "start_time": string,
      "end_time": string,
      "duration": number
    }
  }
}

Features

  • Multiple merge types
  • Conflict resolution
  • Type handling
  • Performance optimization
  • Memory management
  • Error handling
  • Statistics tracking
  • Custom merge functions

Note: Choose appropriate merge operations based on data structure and size. Handle type conflicts carefully.

Tip: Use memory-efficient operations for large datasets. Implement proper error handling for merge conflicts.

Example Usage

const merger = new DataMerger();

// Simple array concatenation
const result1 = await merger.merge({
  dataInputs: [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
  ],
  mergeOperation: {
    type: "concat",
    axis: 0
  }
});

// DataFrame join operation
const result2 = await merger.merge({
  dataInputs: [
    dataframe1,
    dataframe2
  ],
  mergeOperation: {
    type: "join",
    on: "id",
    how: "left",
    suffixes: ["_1", "_2"]
  }
});

// Custom merge with conflict resolution
const result3 = await merger.merge({
  dataInputs: [
    { a: 1, b: 2 },
    { b: 3, c: 4 }
  ],
  mergeOperation: {
    type: "custom",
    resolver: (conflicts) => {
      return conflicts.map(c => Math.max(...c.values));
    },
    options: {
      preserveTypes: true,
      ignoreNull: false
    }
  }
});

Common Merge Operations:

// Concatenation
{
  type: "concat",
  axis: 0 | 1
}

// Join
{
  type: "join",
  on: string | string[],
  how: "inner" | "outer" | "left" | "right"
}

// Union
{
  type: "union",
  dedup: boolean,
  preserve_order: boolean
}