Entity Resolution

Tutorial code and data for the entity resolution workshops.

Matching is the name of the game

Entity Resolution is the task of disambiguating manifestations of real world entities through linking and grouping and is often an essential part of the data wrangling process. There are three primary tasks involved in entity resolution: deduplication, record linkage, and canonicalization; each of which serve to improve data quality by reducing irrelevant or repeated data, joining information from disparate records, and providing a single source of information to perform analytics upon. However, due to data quality issues (misspellings or incorrect data), schema variations in different sources, or simply different representations, entity resolution is not a straightforward process and most ER techniques utilize machine learning and other stochastic approaches.

About

The code in this repository is used for a workshop presentation as an introduction to entity resolution and is not meant to be used in a production environment.

The image used in this README, "Matches" by MiloszB is licensed under CC BY 2.0.