Carnegie Mellon University

Data Privacy Center

Data Privacy Course


Lab 7b: Hiding Identity




Objective

The objective of this lab is for you to learn how to make small, and seemingly innocent changes in personal information to thwart direct and probabilistic linkage. You will be asked to write a program that accomplishes an attack of your own design. Then, you will be asked to write a program that defeats your attack.


Overview of activities

The activites in today's lab involve identity avoidance or fraud. We will see how these seemingly innocent modifications to names, SSNs, dates of birth or addresses, can make direct and probablistic linkages fail.

We will be motivated by the following example. In many cases police get or collect data on individuals, and will have several records on the same person which they would like to link together.  Often times, the individuals are criminals and attempt to disguise their identities with minor mispellings, permuting integers and even alternative names.  In other cases, data entry at different facilities leads to irregularities or differennces in entered values.  So the task is to find all records belonging to the same person, using name, address, city, DOB, etc.

The Cambridge Voter list has been loaded into Dataville and made available for your use.


Part I. Hiding Names

Select ten names from the Cambridge voter list (above) and develop an algorithm that will make seemingly innocent errors in the names in an attempt to look realistic but force direct linkages to the correct names fail. Ideas include spelling changes (insertions, additions, deletions, transpositions), use of nick names, and names that sound similar. Below are lists of common names that may spawn other ideas for you.

Write your idea as to how you would distort these names to prevent linkage. Remember the replacement name should seem realistic to a person who would be accepting the name. Further, if the problem is detected, the misspelling should appear innocent. Lastly, the change should be effective in that the linkage should not work. For each of the 10 sampled names drawn from the Cambridge Voter list, demonstrate your idea(s) by expressing why the resulting names would be realistic and appear as innocent mistakes.

Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 hiding names". The body of the message should contain your answers.


Part II. Finding the hidden names

At first glance, simply changing the first name may appear effective, even given the technique you created. However, we have already learned in this course that other values can combine to uniquely identify you. For example, if the name was provided along with {date of birth, ZIP}, it may still be easy to link on these other fields to the voter list, and correctly re-identify the person despite your changes. Try it out with the 10 names you provided. How many can you re-identify?

Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 find bad names". The body of the message should contain your answers.


Part III. Hiding identity

Think beyond merely changing a first name. Consider a more comprehensive approach involving more than one field of information. All changes must still seem realistic and innocent. Consider the ideas of wrong map, null map and k-map introduced in the last lecture as ways to improve your results.

Develop an algorithm that will make seemingly innocent errors in some combination of fields so that direct linkages to the correct person fail. You may use any of the fields included in the Cambridge Voter list. Below are lists that may spawn other ideas for you.

Write your idea as to how you would distort these identities to prevent linkage. Remember the replacement identity should seem realistic to a person who would be accepting the information. Further, if a problem is detected, any misinformation should appear innocent. Lastly, the change should be effective in that the linkage should not work back to the Cambridge Voter list. For each of the 10 sampled identities on which you will demonstrate your idea, express why the resulting identities would be realistic and appear as an innocent mistake. Demonstrate how linkages to the Cambridge voter list would support your solution.

Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 hiding identity". The body of the message should contain your answers.


Part IV. Challenge a Classmate

Talk to other classmates to see what kinds of attacks they have launched. Find a classmate who has launched an attack that you believe you could write a program that would defeat his/her attack.

Then, for the next lab, due in two weeks, you will write two programs. The first program will be an implementation of your attack idea described above. The second program will be a program that attempts to defeat someonelse's attacj program.

Send an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 challenge". The body of the message should contain the name(s) of the student(s) whose attack your program will attempt to thwart. Include a description of what you understand the student's attack to be and a description of how you intend to defeat it.

BATTLING PROGRAMS:

Prior to arriving at the next lab (in 2 weeks), submit your Java, C or C++ programs as an email attachment to padlab@privacy.cs.cmu.edu. Your programs must compile on the machines in the lab. With your email submission, include a description of your programs and some results demonstrated on the Cambridge voter list. We will spend some time in the upcoming lab with live demonstrations of the battling programs. We will try to arrange it as a contest!


Fall 2004 Privacy and Anonymity in Data
Professor: Latanya Sweeney, Ph.D. [latanya@privacy.cs.cmu.edu]