The activites in today's lab involve identity avoidance or fraud. We will see how these seemingly innocent modifications to names, SSNs, dates of birth or addresses, can make direct and probablistic linkages fail.
We will be motivated by the following example. In many cases police get or collect data on individuals, and will have several records on the same person which they would like to link together. Often times, the individuals are criminals and attempt to disguise their identities with minor mispellings, permuting integers and even alternative names. In other cases, data entry at different facilities leads to irregularities or differennces in entered values. So the task is to find all records belonging to the same person, using name, address, city, DOB, etc.
The Cambridge Voter list has been loaded into Dataville and made available for your use.
Select ten names from the Cambridge voter list (above) and develop an algorithm that will make seemingly innocent errors in the names in an attempt to look realistic but force direct linkages to the correct names fail. Ideas include spelling changes (insertions, additions, deletions, transpositions), use of nick names, and names that sound similar. Below are lists of common names that may spawn other ideas for you.
Write your idea as to how you would distort these names to prevent linkage. Remember the replacement name should seem realistic to a person who would be accepting the name. Further, if the problem is detected, the misspelling should appear innocent. Lastly, the change should be effective in that the linkage should not work. For each of the 10 sampled names drawn from the Cambridge Voter list, demonstrate your idea(s) by expressing why the resulting names would be realistic and appear as innocent mistakes.
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 hiding names". The body of the message should contain your answers.
At first glance, simply changing the first name may appear effective, even given the technique you created. However, we have already learned in this course that other values can combine to uniquely identify you. For example, if the name was provided along with {date of birth, ZIP}, it may still be easy to link on these other fields to the voter list, and correctly re-identify the person despite your changes. Try it out with the 10 names you provided. How many can you re-identify?
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 find bad names". The body of the message should contain your answers.
Think beyond merely changing a first name. Consider a more comprehensive approach involving more than one field of information. All changes must still seem realistic and innocent. Consider the ideas of wrong map, null map and k-map introduced in the last lecture as ways to improve your results.
Develop an algorithm that will make seemingly innocent errors in some combination of fields so that direct linkages to the correct person fail. You may use any of the fields included in the Cambridge Voter list. Below are lists that may spawn other ideas for you.
Write your idea as to how you would distort these identities to prevent linkage. Remember the replacement identity should seem realistic to a person who would be accepting the information. Further, if a problem is detected, any misinformation should appear innocent. Lastly, the change should be effective in that the linkage should not work back to the Cambridge Voter list. For each of the 10 sampled identities on which you will demonstrate your idea, express why the resulting identities would be realistic and appear as an innocent mistake. Demonstrate how linkages to the Cambridge voter list would support your solution.
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#7 hiding identity". The body of the message should contain your answers.
Talk to other classmates to see what kinds of attacks they have launched.
Find a classmate who has launched an attack that you believe
you could write a program that would defeat his/her attack.
Then, for the next lab, due in two weeks, you will write two programs.
The first program will be an implementation of your attack idea
described above. The second program will be a program that
attempts to defeat someonelse's attacj program.
Send an email message
to padlab@privacy.cs.cmu.edu.
In the subject, write: "Lab#7 challenge".
The body of the message should contain the name(s) of the student(s)
whose attack your program will attempt to thwart. Include a description
of what you understand the student's attack to be and a description
of how you intend to defeat it.
BATTLING PROGRAMS:
Part IV. Challenge a Classmate
Fall 2004
Privacy and Anonymity in Data
Professor: Latanya Sweeney, Ph.D.
[latanya@privacy.cs.cmu.edu]