4

I have read a little about record linkage, but it seems to me that a requirement is that all fields in both sources can be compared. For example, with sources A and B, an assumption is that we can create a 'comparator' measure on all their fields. So for example both A and B may include First and Last name, location, etc. and we'd compare them with a string distance measure, geographical distance, and so on.

But what if that is not the case? I.e. if A and B do share some, but not all fields? What if these non-shared fields do give some information on whether any two records are a match? For example, source A may include 'household expenditure', and B may include 'number of children' or 'civil status' variables. Is it necessary to create a distance measure to compare these? Why shouldn't they be used separately?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
mkln
  • 141
  • 2

0 Answers0