Tidy-TS Logo

Transforming Data with mutate()

The mutate() function is your primary tool for adding calculated columns and transforming data. Learn everything from basic calculations to async operations.

Basic Mutate
Let's start with the simplest case: adding one calculated column

The mutate() function adds new columns to your DataFrame. Each key becomes a column name, and the value is a function that uses the current row.

import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const people = createDataFrame([
  { id: 1, name: "Luke", species: "Human", mass: 77, height: 172 },
  { id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167 },
  { id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96 },
  { id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202 },
  { id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228 },
]);

// Add a single calculated column
const withBmi = people
  .mutate({
    bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
  });

withBmi.print("Added BMI column:");
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const people = createDataFrame([
  { id: 1, name: "Luke", species: "Human", mass: 77, height: 172 },
  { id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167 },
  { id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96 },
  { id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202 },
  { id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228 },
]);

// Add a single calculated column
const withBmi = people
  .mutate({
    bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
  });

withBmi.print("Added BMI column:");
More than just the current row
You're not just limited to the current row. You can also use the index (i.e. row number) and the entire DataFrame to help you calculate the new column.

row: Current row's data • index: Row position (0-based) • df: Entire DataFrame

// The mutate function provides three parameters:
const withParameters = people
  .mutate({
    // row: Access current row's values
    bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
    
    // index: Get the current row's position (0-based)
    in_first_half: (_row, index, df) => index < df.nrows() / 2,
    
    // df: Access the entire DataFrame for calculations across all rows
    is_above_average: (row, _index, df) => row.mass > s.mean(df.mass)
  });

withParameters.print("Using all three parameters:");
// The mutate function provides three parameters:
const withParameters = people
  .mutate({
    // row: Access current row's values
    bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
    
    // index: Get the current row's position (0-based)
    in_first_half: (_row, index, df) => index < df.nrows() / 2,
    
    // df: Access the entire DataFrame for calculations across all rows
    is_above_average: (row, _index, df) => row.mass > s.mean(df.mass)
  });

withParameters.print("Using all three parameters:");
Chaining Mutate Operations
Different approaches for handling dependent calculations
// When intermediate values are needed, you can always chain multiple mutate calls
const chainedExample = people
  .mutate({
    doubleMass: (row) => row.mass * 2,
  })
  .mutate({
    quadrupleMass: (row) => row.doubleMass * 2, // Now doubleMass exists
  })
  .mutate({
    massRatio: (row) => row.quadrupleMass / row.mass,
  });

chainedExample.print("Chained mutate operations:");
// When intermediate values are needed, you can always chain multiple mutate calls
const chainedExample = people
  .mutate({
    doubleMass: (row) => row.mass * 2,
  })
  .mutate({
    quadrupleMass: (row) => row.doubleMass * 2, // Now doubleMass exists
  })
  .mutate({
    massRatio: (row) => row.quadrupleMass / row.mass,
  });

chainedExample.print("Chained mutate operations:");
Using Stats Functions
Leverage the stats module for calculations

The stats module provides 25+ statistical functions including mean, median, standard deviation, quantiles, ranking, and more. All functions are fully typed and optimized for performance.

// Use the stats module for calculations
const withStats = people
  .mutate({
    // Calculate z-score for mass
    mass_zscore: (row, _index, df) => {
      const mean = s.mean(df.mass);
      const std = s.stdev(df.mass);
      return s.round((row.mass - mean) / std, 3);
    },
    
    // Calculate percentile rank
    mass_percentile: (row, _index, df) => {
      return s.round(s.percentileRank(df.mass, row.mass), 1);
    },
    
    // Use cumulative functions
    cumulative_mass: (_row, index, df) => s.cumsum(df.mass)[index],
  });

withStats.print("Added columns using stats functions:");
// Use the stats module for calculations
const withStats = people
  .mutate({
    // Calculate z-score for mass
    mass_zscore: (row, _index, df) => {
      const mean = s.mean(df.mass);
      const std = s.stdev(df.mass);
      return s.round((row.mass - mean) / std, 3);
    },
    
    // Calculate percentile rank
    mass_percentile: (row, _index, df) => {
      return s.round(s.percentileRank(df.mass, row.mass), 1);
    },
    
    // Use cumulative functions
    cumulative_mass: (_row, index, df) => s.cumsum(df.mass)[index],
  });

withStats.print("Added columns using stats functions:");
Async Operations
Handle asynchronous operations with full type safety

tidy-ts supports asynchronous operations across all functions including mutate(), filter(), groupBy().summarise(), and more. Async operations are automatically handled with proper concurrency control and retry mechanisms.

📖 Learn more: See the Async Operations page for examples and patterns.

Stats Module Reference
Statistical functions available in mutate operations

Basic Statistics

  • s.sum() - Sum of values
  • s.mean() - Arithmetic mean
  • s.median() - Median value
  • s.mode() - Most frequent value
  • s.min() - Minimum value
  • s.max() - Maximum value
  • s.product() - Product of values

Spread & Distribution

  • s.range() - Range (max - min)
  • s.variance() - Variance
  • s.stdev() - Standard deviation
  • s.iqr() - Interquartile range
  • s.quantile() - Quantiles and percentiles
  • s.quartiles() - First, second, third quartiles
  • s.percentileRank() - Percentile rank

Additional Functions

  • s.rank() - Ranking values
  • s.denseRank() - Dense ranking
  • s.cumsum() - Cumulative sum
  • s.cummean() - Cumulative mean
  • s.cumprod() - Cumulative product
  • s.cummin() - Cumulative minimum
  • s.cummax() - Cumulative maximum
  • s.lag() - Lag values
  • s.lead() - Lead values
  • s.normalize() - Normalize values
  • s.round() - Round to decimal places
  • s.floor() - Floor values
  • s.ceiling() - Ceiling values

Bivariate Statistics

  • s.covariance() - Covariance
  • s.corr() - Correlation coefficient

Count & Unique

  • s.unique() - Unique values
  • s.uniqueCount() - Count of unique values
  • s.countValue() - Count specific value