#

data-deduplication

Here are 15 public repositories matching this topic...

bevry / fellow

Fellow is a package for creating people that can be unified by their shared values via a singleton list on the class

nodejs model data-deduplication client-side

Updated Jun 16, 2024
TypeScript

Anveshika06 / VIT-VTAS-TY-2022

data-deduplication hashing-algorithm

Updated Jan 7, 2023
Python

baraverkstad / mixtape

Practical backups. The Unix toolkit way.

linux shell bash unix backup command-line data-deduplication

Updated Jan 14, 2018
Shell

bmiller1009 / deduper

General deduping engine for JDBC sources with output to JDBC/csv targets

data-deduplication deduplication deduplicate deduplicate-data

Updated Dec 21, 2020
Kotlin

KeerthanaPalanikumar / Data-Cleaning-on-SQL

This repository contains SQL scripts and documentation for cleaning and standardizing data in the NashvilleHousing table within the sqlproject2 database. The project aims to prepare the dataset for analysis by addressing inconsistencies, filling missing values, standardizing formats, and removing duplicates.

data-deduplication database-management mssql data-manipulation data-cleaning ssms data-standardization

Updated Jun 17, 2024

Jim-JMCD / Data_storage_network_deduplication_calculator

A calculator for storage and transmission of deduplicated data presentation in charts and tables

data-deduplication deduplication deduplication-calculator storage-deduplication-calculator network-deduplication-calculator

Updated Sep 26, 2023

imehar / data-deduplication

This is a server client architecture based data deduplication implementation

hashing data-deduplication server-client

Updated May 14, 2019
C++

david-siqi-liu / sparklyclean

Optimal distributed data deduplication and supervised learning pipeline using Apache Spark

distributed-systems data-science spark hadoop data-deduplication data-engineering data-cleaning deduplication

Updated Aug 19, 2020
Scala

PolyDeDupe

gagan3012 / PolyDeDupe

PolyDeDupe: Multi-Lingual Data Deduplication

multilingual nlp data-deduplication

Updated Sep 16, 2024
Python

shubham-thakare / data-deduplication

A JAVA project that splits data using hashing techniques and removes duplicate blocks to save cloud storage. This project also uses the CloudSim framework for cloud storage simulation.

java cloud-storage data-deduplication cloudsim cloudsim-framework

Updated Jan 6, 2021
Java

fabriziosalmi / text-boundaries

A Python-based tool for preprocessing, cleaning, and analyzing text datasets, designed to filter, deduplicate, sort data, and generate statistical insights.

machine-learning natural-language-processing data-validation data-deduplication data-preprocessing data-sorting data-automation dataset-cleaning text-data-analysis dataset-boundaries data-statistics-generation

Updated Sep 16, 2024
Python

jchristn / WatsonDedupe

Self-contained C# library for data deduplication using Sqlite

compression storage nuget dedupe sqlite-database data-deduplication chunk compress deduplication chunk-data duplicate-data chunk-key

Updated Apr 7, 2023
C#

Zabuzard / FastCDC4J

Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.

java library data-deduplication chunking cdc fastcdc content-defined-chunking

Updated Sep 21, 2023
Java

sail-sg / sailcraft

🚢 Data Toolkit for Sailor Language Models

data-deduplication data-cleaning

Updated Jul 11, 2024
Python

dpc / rdedup

Data deduplication engine, supporting optional compression and public key encryption.

backup encryption data-deduplication deduplication

Updated Aug 25, 2022
Rust

Improve this page

Add a description, image, and links to the data-deduplication topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-deduplication topic, visit your repo's landing page and select "manage topics."