Tidy-TS Logo

Grouping and Aggregation

Group your data and calculate summary statistics. Learn how to split your data into categories and get useful insights.

Basic groupBy and summarize
Group data by one or more columns and calculate summary statistics

groupBy lets you split your data into categories, then calculate summary statistics for each group. Great for understanding patterns and differences across categories.

import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const people = createDataFrame([
  { id: 1, name: "Luke", species: "Human", mass: 77, height: 172, year: 2023 },
  { id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167, year: 2023 },
  { id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96, year: 2023 },
  { id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202, year: 2024 },
  { id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228, year: 2024 },
]);

// Basic species analysis
const speciesAnalysis = people
  .groupBy("species")
  .summarise({
    count: (group) => group.nrows(),
    avg_height: (group) => s.round(s.mean(group.height), 1),
    avg_mass: (group) => s.round(s.mean(group.mass), 1),
    max_height: (group) => s.max(group.height),
    min_mass: (group) => s.min(group.mass),
  })
  .arrange("avg_mass", "desc");

speciesAnalysis.print("Species analysis:");
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const people = createDataFrame([
  { id: 1, name: "Luke", species: "Human", mass: 77, height: 172, year: 2023 },
  { id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167, year: 2023 },
  { id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96, year: 2023 },
  { id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202, year: 2024 },
  { id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228, year: 2024 },
]);

// Basic species analysis
const speciesAnalysis = people
  .groupBy("species")
  .summarise({
    count: (group) => group.nrows(),
    avg_height: (group) => s.round(s.mean(group.height), 1),
    avg_mass: (group) => s.round(s.mean(group.mass), 1),
    max_height: (group) => s.max(group.height),
    min_mass: (group) => s.min(group.mass),
  })
  .arrange("avg_mass", "desc");

speciesAnalysis.print("Species analysis:");
Multiple Column Grouping
Group by multiple columns for more detailed analysis

You can group by multiple columns to create more detailed breakdowns and see how different factors interact in your data.

// Group by multiple columns
const multiGroupAnalysis = people
  .groupBy("species", "year")
  .summarise({
    count: (group) => group.nrows(),
    avg_mass: (group) => s.round(s.mean(group.mass), 1),
    avg_height: (group) => s.round(s.mean(group.height), 1),
    total_mass: (group) => s.sum(group.mass),
  })
  .arrange("species", "year");

multiGroupAnalysis.print("Multi-column grouping analysis:");
// Group by multiple columns
const multiGroupAnalysis = people
  .groupBy("species", "year")
  .summarise({
    count: (group) => group.nrows(),
    avg_mass: (group) => s.round(s.mean(group.mass), 1),
    avg_height: (group) => s.round(s.mean(group.height), 1),
    total_mass: (group) => s.sum(group.mass),
  })
  .arrange("species", "year");

multiGroupAnalysis.print("Multi-column grouping analysis:");
Complex Aggregation Patterns
Complex aggregation with conditional logic and custom calculations

You can create complex summary statistics with conditional logic, custom calculations, and derived metrics.

// Basic aggregation with conditional logic
const basicAnalysis = people
  .groupBy("species")
  .summarise({
    total_count: (group) => group.nrows(),
    heavy_count: (group) => group.filter((row) => row.mass > 100).nrows(),
    avg_mass: (group) => s.round(s.mean(group.mass), 1),
    top_performer: (group) => {
      return group.sliceMax("mass", 1).extractHead("name", 1) || "N/A";
    },
  })
  .arrange("avg_mass", "desc");

basicAnalysis.print("Basic species analysis:");
// Basic aggregation with conditional logic
const basicAnalysis = people
  .groupBy("species")
  .summarise({
    total_count: (group) => group.nrows(),
    heavy_count: (group) => group.filter((row) => row.mass > 100).nrows(),
    avg_mass: (group) => s.round(s.mean(group.mass), 1),
    top_performer: (group) => {
      return group.sliceMax("mass", 1).extractHead("name", 1) || "N/A";
    },
  })
  .arrange("avg_mass", "desc");

basicAnalysis.print("Basic species analysis:");
Common Aggregation Functions
Essential functions for group summaries

Count and Size

  • group.nrows() - Number of rows in group
  • group.length - Same as nrows()
  • s.countValue() - Count specific values
  • s.uniqueCount() - Count unique values

Central Tendency

  • s.mean() - Arithmetic mean
  • s.median() - Median value
  • s.mode() - Most frequent value
  • s.quantile() - Specific quantiles

Spread and Variation

  • s.stdev() - Standard deviation
  • s.variance() - Variance
  • s.range() - Range (max - min)
  • s.iqr() - Interquartile range

Extremes

  • s.min() - Minimum value
  • s.max() - Maximum value
  • s.first() - First value
  • s.last() - Last value