Tidy-TS Logo

Joining DataFrames

Combine data from multiple sources with different join types, multi-key support, and strong type safety. Learn how to merge datasets effectively.

Basic Join Types
Understanding the four fundamental join types

Joining DataFrames lets you combine data from multiple sources. tidy-ts supports all standard join types with multi-key support and strong type safety.

import { createDataFrame } from "@tidy-ts/dataframe";

const employees = createDataFrame([
  { emp_id: 1, name: "Alice", dept_id: 10 },
  { emp_id: 2, name: "Bob", dept_id: 20 },
  { emp_id: 3, name: "Charlie", dept_id: 10 },
]);

const departments = createDataFrame([
  { dept_id: 10, dept_name: "Engineering" },
  { dept_id: 20, dept_name: "Marketing" },
]);

const joined = employees.innerJoin(departments, "dept_id");
joined.print("Inner join result:");
import { createDataFrame } from "@tidy-ts/dataframe";

const employees = createDataFrame([
  { emp_id: 1, name: "Alice", dept_id: 10 },
  { emp_id: 2, name: "Bob", dept_id: 20 },
  { emp_id: 3, name: "Charlie", dept_id: 10 },
]);

const departments = createDataFrame([
  { dept_id: 10, dept_name: "Engineering" },
  { dept_id: 20, dept_name: "Marketing" },
]);

const joined = employees.innerJoin(departments, "dept_id");
joined.print("Inner join result:");
Single Key Joins
Join DataFrames using a single column as the key

Single key joins are the most common type. Each join type preserves different combinations of records from the left and right DataFrames.

const employees = createDataFrame([
  { emp_id: 1, name: "Alice", dept_id: 10 },
  { emp_id: 2, name: "Bob", dept_id: 20 },
  { emp_id: 3, name: "Charlie", dept_id: 30 }, // No matching department
]);

const departments = createDataFrame([
  { dept_id: 10, dept_name: "Engineering" },
  { dept_id: 20, dept_name: "Marketing" },
]);

const joined = employees.leftJoin(departments, "dept_id");
joined.print("Left join result:");
const employees = createDataFrame([
  { emp_id: 1, name: "Alice", dept_id: 10 },
  { emp_id: 2, name: "Bob", dept_id: 20 },
  { emp_id: 3, name: "Charlie", dept_id: 30 }, // No matching department
]);

const departments = createDataFrame([
  { dept_id: 10, dept_name: "Engineering" },
  { dept_id: 20, dept_name: "Marketing" },
]);

const joined = employees.leftJoin(departments, "dept_id");
joined.print("Left join result:");
Multi-Key Joins
Join using multiple columns for complex relationships

Multi-key joins are great for complex data relationships where you need to match on multiple criteria. tidy-ts provides strong typing for all join scenarios.

const sales = createDataFrame([
  { year: 2023, quarter: "Q1", product: "Widget A", sales: 1000 },
  { year: 2023, quarter: "Q2", product: "Widget B", sales: 1500 },
]);

const targets = createDataFrame([
  { year: 2023, quarter: "Q1", product: "Widget A", target: 1200 },
  { year: 2023, quarter: "Q2", product: "Widget B", target: 1400 },
]);

const joined = sales.innerJoin(targets, ["year", "quarter", "product"]);
joined.print("Multi-key join result:");
const sales = createDataFrame([
  { year: 2023, quarter: "Q1", product: "Widget A", sales: 1000 },
  { year: 2023, quarter: "Q2", product: "Widget B", sales: 1500 },
]);

const targets = createDataFrame([
  { year: 2023, quarter: "Q1", product: "Widget A", target: 1200 },
  { year: 2023, quarter: "Q2", product: "Widget B", target: 1400 },
]);

const joined = sales.innerJoin(targets, ["year", "quarter", "product"]);
joined.print("Multi-key join result:");
Different Column Names
Join DataFrames with different column names

Real-world data often has different column names for the same concept. tidy-ts allows you to specify different column names for left and right DataFrames.

const orders = createDataFrame([
  { order_id: 1, order_region: "North", order_product: "A", quantity: 10 },
  { order_id: 2, order_region: "South", order_product: "B", quantity: 20 },
]);

const inventory = createDataFrame([
  { inv_region: "North", inv_product: "A", stock: 100 },
  { inv_region: "South", inv_product: "B", stock: 200 },
]);

const joined = orders.innerJoin(inventory, {
  keys: {
    left: ["order_region", "order_product"],
    right: ["inv_region", "inv_product"],
  },
});

joined.print("Join with different column names:");
const orders = createDataFrame([
  { order_id: 1, order_region: "North", order_product: "A", quantity: 10 },
  { order_id: 2, order_region: "South", order_product: "B", quantity: 20 },
]);

const inventory = createDataFrame([
  { inv_region: "North", inv_product: "A", stock: 100 },
  { inv_region: "South", inv_product: "B", stock: 200 },
]);

const joined = orders.innerJoin(inventory, {
  keys: {
    left: ["order_region", "order_product"],
    right: ["inv_region", "inv_product"],
  },
});

joined.print("Join with different column names:");
Comprehensive Joins
Complex joins with suffixes and multiple keys

For complex data integration scenarios, you can use suffixes to distinguish between columns with the same name and join on multiple keys simultaneously.

const orders = createDataFrame([
  { order_id: 1, order_region: "North", order_product: "A", quantity: 10 },
  { order_id: 2, order_region: "South", order_product: "B", quantity: 20 },
]);

const inventory = createDataFrame([
  { inv_region: "North", inv_product: "A", stock: 100 },
  { inv_region: "South", inv_product: "B", stock: 200 },
]);

const joined = orders.innerJoin(inventory, {
  keys: {
    left: ["order_region", "order_product"],
    right: ["inv_region", "inv_product"],
  },
});

joined.print("Join with different column names:");
const orders = createDataFrame([
  { order_id: 1, order_region: "North", order_product: "A", quantity: 10 },
  { order_id: 2, order_region: "South", order_product: "B", quantity: 20 },
]);

const inventory = createDataFrame([
  { inv_region: "North", inv_product: "A", stock: 100 },
  { inv_region: "South", inv_product: "B", stock: 200 },
]);

const joined = orders.innerJoin(inventory, {
  keys: {
    left: ["order_region", "order_product"],
    right: ["inv_region", "inv_product"],
  },
});

joined.print("Join with different column names:");
Join Type Rules
Understanding how different join types affect your data structure

Join Type Rules

Join TypeResult Type PatternDescription
Inner JoinL ∪ (R \ K)All fields required - only matching records
Left JoinL ∪ (R \ K)?Right non-key fields: T | undefined
Right Join(L \ K)? ∪ RLeft non-key fields: T | undefined
Outer Join(L \ K)? ∪ (R \ K)?Both sides: T | undefined
Cross JoinL ∪ RAll fields required (Cartesian product)

Where: L = Left DataFrame, R = Right DataFrame, K = Join keys

Note: All joins use explicit undefined unions (T | undefined), never optional properties (T?)