The mutate() function is your primary tool for adding calculated columns and transforming data. Learn everything from basic calculations to async operations.
The mutate() function adds new columns to your DataFrame. Each key becomes a column name, and the value is a function that uses the current row.
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";
const people = createDataFrame([
{ id: 1, name: "Luke", species: "Human", mass: 77, height: 172 },
{ id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167 },
{ id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96 },
{ id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202 },
{ id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228 },
]);
// Add a single calculated column
const withBmi = people
.mutate({
bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
});
withBmi.print("Added BMI column:");
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";
const people = createDataFrame([
{ id: 1, name: "Luke", species: "Human", mass: 77, height: 172 },
{ id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167 },
{ id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96 },
{ id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202 },
{ id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228 },
]);
// Add a single calculated column
const withBmi = people
.mutate({
bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
});
withBmi.print("Added BMI column:");
row: Current row's data • index: Row position (0-based) • df: Entire DataFrame
// The mutate function provides three parameters:
const withParameters = people
.mutate({
// row: Access current row's values
bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
// index: Get the current row's position (0-based)
in_first_half: (_row, index, df) => index < df.nrows() / 2,
// df: Access the entire DataFrame for calculations across all rows
is_above_average: (row, _index, df) => row.mass > s.mean(df.mass)
});
withParameters.print("Using all three parameters:");
// The mutate function provides three parameters:
const withParameters = people
.mutate({
// row: Access current row's values
bmi: (row) => row.mass / Math.pow(row.height / 100, 2),
// index: Get the current row's position (0-based)
in_first_half: (_row, index, df) => index < df.nrows() / 2,
// df: Access the entire DataFrame for calculations across all rows
is_above_average: (row, _index, df) => row.mass > s.mean(df.mass)
});
withParameters.print("Using all three parameters:");
// When intermediate values are needed, you can always chain multiple mutate calls
const chainedExample = people
.mutate({
doubleMass: (row) => row.mass * 2,
})
.mutate({
quadrupleMass: (row) => row.doubleMass * 2, // Now doubleMass exists
})
.mutate({
massRatio: (row) => row.quadrupleMass / row.mass,
});
chainedExample.print("Chained mutate operations:");
// When intermediate values are needed, you can always chain multiple mutate calls
const chainedExample = people
.mutate({
doubleMass: (row) => row.mass * 2,
})
.mutate({
quadrupleMass: (row) => row.doubleMass * 2, // Now doubleMass exists
})
.mutate({
massRatio: (row) => row.quadrupleMass / row.mass,
});
chainedExample.print("Chained mutate operations:");
The stats module provides 25+ statistical functions including mean, median, standard deviation, quantiles, ranking, and more. All functions are fully typed and optimized for performance.
// Use the stats module for calculations
const withStats = people
.mutate({
// Calculate z-score for mass
mass_zscore: (row, _index, df) => {
const mean = s.mean(df.mass);
const std = s.stdev(df.mass);
return s.round((row.mass - mean) / std, 3);
},
// Calculate percentile rank
mass_percentile: (row, _index, df) => {
return s.round(s.percentileRank(df.mass, row.mass), 1);
},
// Use cumulative functions
cumulative_mass: (_row, index, df) => s.cumsum(df.mass)[index],
});
withStats.print("Added columns using stats functions:");
// Use the stats module for calculations
const withStats = people
.mutate({
// Calculate z-score for mass
mass_zscore: (row, _index, df) => {
const mean = s.mean(df.mass);
const std = s.stdev(df.mass);
return s.round((row.mass - mean) / std, 3);
},
// Calculate percentile rank
mass_percentile: (row, _index, df) => {
return s.round(s.percentileRank(df.mass, row.mass), 1);
},
// Use cumulative functions
cumulative_mass: (_row, index, df) => s.cumsum(df.mass)[index],
});
withStats.print("Added columns using stats functions:");
tidy-ts supports asynchronous operations across all functions including mutate(), filter(), groupBy().summarise(), and more. Async operations are automatically handled with proper concurrency control and retry mechanisms.
📖 Learn more: See the Async Operations page for examples and patterns.
s.sum()
- Sum of valuess.mean()
- Arithmetic means.median()
- Median values.mode()
- Most frequent values.min()
- Minimum values.max()
- Maximum values.product()
- Product of valuess.range()
- Range (max - min)s.variance()
- Variances.stdev()
- Standard deviations.iqr()
- Interquartile ranges.quantile()
- Quantiles and percentiless.quartiles()
- First, second, third quartiless.percentileRank()
- Percentile ranks.rank()
- Ranking valuess.denseRank()
- Dense rankings.cumsum()
- Cumulative sums.cummean()
- Cumulative means.cumprod()
- Cumulative products.cummin()
- Cumulative minimums.cummax()
- Cumulative maximums.lag()
- Lag valuess.lead()
- Lead valuess.normalize()
- Normalize valuess.round()
- Round to decimal placess.floor()
- Floor valuess.ceiling()
- Ceiling valuess.covariance()
- Covariances.corr()
- Correlation coefficients.unique()
- Unique valuess.uniqueCount()
- Count of unique valuess.countValue()
- Count specific value