Function
Clean Your Data: Remove Duplicates Instantly
Smartly detect and clean duplicates from your dataset (CSV or Excel).
This function scans your data to find:
- 🔁 Exact duplicates — identical rows or repeated entries.
- 🤖 Fuzzy duplicates — similar rows with small differences
(typos, spacing, casing, or minor text variations).
It automatically keeps the first valid occurrence of each duplicate
and exports everything neatly organized in a single downloadable ZIP.
📦 Inside the ZIP you’ll get:
deduplicated_<name>.csv
— your cleaned dataset (duplicates removed)duplicates_removed_<name>.csv
— all duplicate rows that were droppedfuzzy_pairs_<name>.csv
— pairs of rows that look alike (based on similarity)
Args:
file (FilePath): The uploaded CSV or Excel file to analyze.
subset (str): Optional — comma-separated list of column names to check.
If left empty, all columns are analyzed.
similarity_threshold (int): Optional — how strict fuzzy matching should be (0–100).
Higher = only very similar values are flagged.
Default = 90 (good balance).
Returns:
str: Generated ZIP archive containing the cleaned dataset
and detailed duplicate reports.
Run Function
Authentication Required
You are not connected. Sign in or create an account to run this function.Run Cost
$0.01 + $0.001/s
Reviews
Total Score
0.0
Based on 0 reviews
Sign in to rate this function.
