Data Processing
Clean, transform, and prepare data for analysis.
Purpose
Use this module for feature engineering, missing data handling, entity matching, and data quality checks.
Key functions
AddDateNumberColumns— Add year, month, quarter, week, and day columns from datesAddLeadingZeros— Add leading zeros to numeric columnsAddRowCountColumn— Add row numbers within groupsAddTPeriodColumn— Create time period columns for time series analysisAddTukeyOutlierColumn— Add an outlier flag column using Tukey’s methodCleanTextColumns— Remove leading/trailing spaces from text columnsConductAnomalyDetection— Detect anomalies using a z-score methodConductEntityMatching— Fuzzy matching between datasets using various algorithmsConvertOddsToProbability— Convert odds to probabilitiesCountMissingDataByGroup— Count missing values grouped by categoriesCreateBinnedColumn— Bin continuous variables into discrete categoriesCreateDataOverview— Dataset summary with missing data visualizationCreateRandomSampleGroups— Create random sample groups for validationCreateRareCategoryColumn— Identify and flag rare categoriesCreateStratifiedRandomSampleGroups— Stratified random samplingImputeMissingValuesUsingNearestNeighbors— Impute missing values using KNNVerifyGranularity— Check dataset granularity based on key columns
Common use cases
Data cleaning and feature engineering
Missing data handling
Data quality assessment
Sampling and validation splits
Entity resolution (fuzzy matching)
Last updated