Group your data and calculate summary statistics. Learn how to split your data into categories and get useful insights.
groupBy lets you split your data into categories, then calculate summary statistics for each group. Great for understanding patterns and differences across categories.
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";
const people = createDataFrame([
{ id: 1, name: "Luke", species: "Human", mass: 77, height: 172, year: 2023 },
{ id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167, year: 2023 },
{ id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96, year: 2023 },
{ id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202, year: 2024 },
{ id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228, year: 2024 },
]);
// Basic species analysis
const speciesAnalysis = people
.groupBy("species")
.summarise({
count: (group) => group.nrows(),
avg_height: (group) => s.round(s.mean(group.height), 1),
avg_mass: (group) => s.round(s.mean(group.mass), 1),
max_height: (group) => s.max(group.height),
min_mass: (group) => s.min(group.mass),
})
.arrange("avg_mass", "desc");
speciesAnalysis.print("Species analysis:");
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";
const people = createDataFrame([
{ id: 1, name: "Luke", species: "Human", mass: 77, height: 172, year: 2023 },
{ id: 2, name: "C-3PO", species: "Droid", mass: 75, height: 167, year: 2023 },
{ id: 3, name: "R2-D2", species: "Droid", mass: 32, height: 96, year: 2023 },
{ id: 4, name: "Darth Vader", species: "Human", mass: 136, height: 202, year: 2024 },
{ id: 5, name: "Chewbacca", species: "Wookiee", mass: 112, height: 228, year: 2024 },
]);
// Basic species analysis
const speciesAnalysis = people
.groupBy("species")
.summarise({
count: (group) => group.nrows(),
avg_height: (group) => s.round(s.mean(group.height), 1),
avg_mass: (group) => s.round(s.mean(group.mass), 1),
max_height: (group) => s.max(group.height),
min_mass: (group) => s.min(group.mass),
})
.arrange("avg_mass", "desc");
speciesAnalysis.print("Species analysis:");
You can group by multiple columns to create more detailed breakdowns and see how different factors interact in your data.
// Group by multiple columns
const multiGroupAnalysis = people
.groupBy("species", "year")
.summarise({
count: (group) => group.nrows(),
avg_mass: (group) => s.round(s.mean(group.mass), 1),
avg_height: (group) => s.round(s.mean(group.height), 1),
total_mass: (group) => s.sum(group.mass),
})
.arrange("species", "year");
multiGroupAnalysis.print("Multi-column grouping analysis:");
// Group by multiple columns
const multiGroupAnalysis = people
.groupBy("species", "year")
.summarise({
count: (group) => group.nrows(),
avg_mass: (group) => s.round(s.mean(group.mass), 1),
avg_height: (group) => s.round(s.mean(group.height), 1),
total_mass: (group) => s.sum(group.mass),
})
.arrange("species", "year");
multiGroupAnalysis.print("Multi-column grouping analysis:");
You can create complex summary statistics with conditional logic, custom calculations, and derived metrics.
// Basic aggregation with conditional logic
const basicAnalysis = people
.groupBy("species")
.summarise({
total_count: (group) => group.nrows(),
heavy_count: (group) => group.filter((row) => row.mass > 100).nrows(),
avg_mass: (group) => s.round(s.mean(group.mass), 1),
top_performer: (group) => {
return group.sliceMax("mass", 1).extractHead("name", 1) || "N/A";
},
})
.arrange("avg_mass", "desc");
basicAnalysis.print("Basic species analysis:");
// Basic aggregation with conditional logic
const basicAnalysis = people
.groupBy("species")
.summarise({
total_count: (group) => group.nrows(),
heavy_count: (group) => group.filter((row) => row.mass > 100).nrows(),
avg_mass: (group) => s.round(s.mean(group.mass), 1),
top_performer: (group) => {
return group.sliceMax("mass", 1).extractHead("name", 1) || "N/A";
},
})
.arrange("avg_mass", "desc");
basicAnalysis.print("Basic species analysis:");
group.nrows()
- Number of rows in groupgroup.length
- Same as nrows()s.countValue()
- Count specific valuess.uniqueCount()
- Count unique valuess.mean()
- Arithmetic means.median()
- Median values.mode()
- Most frequent values.quantile()
- Specific quantiless.stdev()
- Standard deviations.variance()
- Variances.range()
- Range (max - min)s.iqr()
- Interquartile ranges.min()
- Minimum values.max()
- Maximum values.first()
- First values.last()
- Last value