What do the Social Security Numbers in the United States, Resident Identity Card numbers in China, Aadhar Card numbers in India, and National Health Service numbers in the United Kingdom have in common?
They’re all unique identifiers (UIDs), a unique ID for each person.
UIDs are crucial not just for tracking citizens in a country but also for tracking and identifying people in a survey.
A crucial step in survey design is making sure that every unit in the survey — person, household, program, location, etc. — has a unique identifier. This helps you keep track of which response belongs to which unit. Without the ability to distinguish between units, you run the risk of including duplicates in your analysis.
How to Build a Unique Identifier into a Survey
There are four basic ways to build a unique identifier into a survey: use an existing UID, manually create your own UID, or automatically generate new UIDs.
Use Pre-Existing UIDs
If you find a pre-existing UID that everyone in your population has and is willing to share, it’s usually the easiest, most reliable way to add a UID to your survey.
The simplest way to include a UID in your survey is to collect a pre-existing UID from every person — for example, an Aadhar number in India.
This method has its limitations. In the above example, it will only work if everyone in your population has an Aadhar card. UID availability is an issue for Aadhar cards, which aren’t mandatory and have only been taken up by some of India’s citizens, but not Social Security Numbers, which are issued at birth in the United States.
Another issue to consider is whether your population will be willing to share their UID number. For example, people in the United States won’t share their Social Security Number with anyone other than the government, their bank, and their employer for safety.
Automatically Generate New UIDs
Survey tools like Atlan’s Collect have made it simple to automatically assign unique IDs to units while conducting a survey.
Collect automatically creates a unique ID for each survey response. These responses can be used to collect and monitor data about a unit. For example, if you want to survey a set of households, you can collect data on each household (such as its location and members) and Collect will generate a UID for each household.
Then, with Collect’s monitoring feature, you can search the available households, select the correct one, and add data about that household, all without having to remember its UID.
Manually Create UIDs
Building a unique ID into a paper-based survey can be more challenging, since it requires creating an ID that is both intuitive and unique.
For example, a unique ID may combine digits representing various levels of geographic information to create a unique ID that represents the geographic location of a unit. At the top of each survey page (or booklet in the case of longer surveys), there should be boxes which enumerators can fill out with the unique ID.
The following is an example of a unique ID format that may be used in a household survey:
In this example, the unique ID is made up of 14 numbers which represent the geographic location of the household at the state, district, and village level. This information is easily available to the enumerator and also ensures that each household has a unique ID. Moreover, this format also lets you cluster responses at each level of geography.
What If Your Survey Doesn’t Have a UID?
What can you do if you have already collected your data, start cleaning it, and only then realize that you don’t have a UID? While this situation is never ideal, you can form a unique ID once the data is collected, based on a combination of variables that can uniquely identify every unit within the survey sample.
This method involves concatenating (or combining) two or more fields until every ID in the data set is unique. In other words, you can combine responses from several questions into one string of letters and numbers in a way that the combination is unique for each unit of analysis.
The post-enumeration method of assigning UIDs is imperfect:
- It assumes that there can be a combination of responses that uniquely distinguishes every unit.
- Because of data cleanliness issues, it may be difficult to find fields that have consistent, reliable responses to include in the concatenated unique ID. If missing or incorrect values in a field are used to create the unique ID, the validity of the ID could be jeopardized.
For these reasons, it’s always better to have your UIDs in place before the survey starts than to create your UIDs after data collection is done.