Tidy-TS Logo

Missing Data Handling

Handle null and undefined with replaceNull/replaceUndefined (replace with defaults) or removeNull/removeUndefined (drop rows with type inference). Stats options and examples included.

Null and Undefined Support
tidy-ts naturally supports null and undefined values

DataFrames can contain null and undefined values in any column. These are treated as missing data (NA) and handled appropriately by all operations.

import { createDataFrame } from "@tidy-ts/dataframe";

// DataFrames naturally support null and undefined values
const data = createDataFrame([
  { id: 1, name: "Alice", age: 25, score: 85 },
  { id: 2, name: null, age: 30, score: undefined },
  { id: 3, name: "Charlie", age: null, score: 92 },
]);

data.print("Data with null and undefined values:");
import { createDataFrame } from "@tidy-ts/dataframe";

// DataFrames naturally support null and undefined values
const data = createDataFrame([
  { id: 1, name: "Alice", age: 25, score: 85 },
  { id: 2, name: null, age: 30, score: undefined },
  { id: 3, name: "Charlie", age: null, score: 92 },
]);

data.print("Data with null and undefined values:");
Stats Functions Default Behavior
Statistical functions return null when NA values are present

By default, statistical functions like sum, mean, max, etc. return null when any NA values are present in the data. This preserves data integrity.

import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const data = createDataFrame([
  { id: 1, value: 10 },
  { id: 2, value: null },
  { id: 3, value: 20 },
  { id: 4, value: undefined },
]);

// By default, stats functions return null when NA values are present
const total = s.sum(data.value);
const average = s.mean(data.value);
const maximum = s.max(data.value);

console.log("Default behavior (with NA values):");
console.log("Sum:", total);        // null
console.log("Mean:", average);     // null  
console.log("Max:", maximum);      // null
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const data = createDataFrame([
  { id: 1, value: 10 },
  { id: 2, value: null },
  { id: 3, value: 20 },
  { id: 4, value: undefined },
]);

// By default, stats functions return null when NA values are present
const total = s.sum(data.value);
const average = s.mean(data.value);
const maximum = s.max(data.value);

console.log("Default behavior (with NA values):");
console.log("Sum:", total);        // null
console.log("Mean:", average);     // null  
console.log("Max:", maximum);      // null
Using removeNull / removeUndefined Options
Ignore NA values in statistical calculations

Pass { removeNull: true, removeUndefined: true } to calculate statistics on only the valid (non-NA) values. This is useful when you want to analyze available data despite missing values.

import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const data = createDataFrame([
  { id: 1, value: 10 },
  { id: 2, value: null },
  { id: 3, value: 20 },
  { id: 4, value: undefined },
]);

// Use removeNull and removeUndefined to ignore null and undefined values
const total = s.sum(data.value, { removeNull: true, removeUndefined: true });
const average = s.mean(data.value, { removeNull: true, removeUndefined: true });
const maximum = s.max(data.value, { removeNull: true, removeUndefined: true });

console.log("With removeNull/removeUndefined: true:");
console.log("Sum:", total);        // 30 (10 + 20)
console.log("Mean:", average);     // 15 ((10 + 20) / 2)
console.log("Max:", maximum);      // 20
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

const data = createDataFrame([
  { id: 1, value: 10 },
  { id: 2, value: null },
  { id: 3, value: 20 },
  { id: 4, value: undefined },
]);

// Use removeNull and removeUndefined to ignore null and undefined values
const total = s.sum(data.value, { removeNull: true, removeUndefined: true });
const average = s.mean(data.value, { removeNull: true, removeUndefined: true });
const maximum = s.max(data.value, { removeNull: true, removeUndefined: true });

console.log("With removeNull/removeUndefined: true:");
console.log("Sum:", total);        // 30 (10 + 20)
console.log("Mean:", average);     // 15 ((10 + 20) / 2)
console.log("Max:", maximum);      // 20
Replace Missing Values
Replace null and undefined with defaults using replaceNull and replaceUndefined

Chain replaceNull and replaceUndefined to replace missing values with specific defaults. Use replaceNull for null and replaceUndefined for undefined; chaining both covers all NA values.

import { createDataFrame } from "@tidy-ts/dataframe";

const messyData = createDataFrame([
  { id: 1, name: "Alice", age: 25, score: 85 },
  { id: 2, name: null, age: 30, score: null },
  { id: 3, name: "Charlie", age: null, score: 92 },
]);

// Replace null and undefined with defaults (chain replaceNull and replaceUndefined)
const cleaned = messyData
  .replaceNull({ name: "Unknown", age: 0, score: -1 })
  .replaceUndefined({ name: "Unknown", age: 0, score: -1 });

cleaned.print("After replaceNull/replaceUndefined:");
import { createDataFrame } from "@tidy-ts/dataframe";

const messyData = createDataFrame([
  { id: 1, name: "Alice", age: 25, score: 85 },
  { id: 2, name: null, age: 30, score: null },
  { id: 3, name: "Charlie", age: null, score: 92 },
]);

// Replace null and undefined with defaults (chain replaceNull and replaceUndefined)
const cleaned = messyData
  .replaceNull({ name: "Unknown", age: 0, score: -1 })
  .replaceUndefined({ name: "Unknown", age: 0, score: -1 });

cleaned.print("After replaceNull/replaceUndefined:");
Drop Rows with Missing Values (Type-Safe)
Use removeNull and removeUndefined to drop rows and narrow types

removeNull and removeUndefined drop rows where the given fields are null or undefined and automatically narrow the TypeScript type. filter() alone cannot narrow types, so downstream code (e.g. stats) gets correct inference (e.g. number instead of number | null).

import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

// Rows with null/undefined in "score" — use removeNull/removeUndefined for type inference
const data = createDataFrame([
  { id: 1, name: "Alice", score: 85 },
  { id: 2, name: "Bob", score: null },
  { id: 3, name: "Carol", score: undefined },
  { id: 4, name: "Dave", score: 92 },
]);

// removeNull/removeUndefined narrow types so TypeScript knows score is number
const complete = data.removeNull("score").removeUndefined("score");
const total = s.sum(complete.score); // type: number (not number | null)
console.log("Sum of complete scores:", total);
import { createDataFrame, stats as s } from "@tidy-ts/dataframe";

// Rows with null/undefined in "score" — use removeNull/removeUndefined for type inference
const data = createDataFrame([
  { id: 1, name: "Alice", score: 85 },
  { id: 2, name: "Bob", score: null },
  { id: 3, name: "Carol", score: undefined },
  { id: 4, name: "Dave", score: 92 },
]);

// removeNull/removeUndefined narrow types so TypeScript knows score is number
const complete = data.removeNull("score").removeUndefined("score");
const total = s.sum(complete.score); // type: number (not number | null)
console.log("Sum of complete scores:", total);