The Cambridge Voter list has been loaded into Dataville and made available for your use.
A sample of 12 records for medical information has been provided. Each individual patient is a hypothetical subject who has demographics that match a person in the Cambridge voter list. The following website shows what happens to the data under various release strategies.
First, identify in the table each two records whose demographics match. Then, determine Write down how many possible people match each person in the original data by completing the following table.
| Row | Date of birth | Gender | ZIP | Matching record# |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 7 | ||||
| 8 | ||||
| 9 | ||||
| 10 | ||||
| 11 | ||||
| 12 |
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#11 pairs". The body of the message should contain your answers.
For each of the results provided for the claims data, determine the naive identifiability of each person. As you may recall, to determine the naive identifiability of the information, you use the pigeon hole priniciple to determine the number of possible candidates that could match those demographics. In this case, you will want to run queries against the Dataville database to determine the number of overall gross totals.
| Row | Date of birth | Gender | ZIP | Naive identifiability |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 7 | ||||
| 8 | ||||
| 9 | ||||
| 10 | ||||
| 11 | ||||
| 12 |
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#11 identifiability". The body of the message should contain your answers.
Now, for each record in the claims data, determine the number of people who actually match the information based on the Cambridge Voter list. In this case, you will want to run queries against the Dataville database to determine the number of possible matches based on the given demographics.
| Row | Date of birth | Gender | ZIP | Number of matching people |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 7 | ||||
| 8 | ||||
| 9 | ||||
| 10 | ||||
| 11 | ||||
| 12 |
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#11 matches". The body of the message should contain your answers.
You have been asked to statistically answer the following questions based on the data in the problem lists.
"Are the problems associated with heart disease more prevalent in one race than another, in what ZIP code more so than another, or within one gender more than the other?"
To answer these questions, you will look at the original data and contrast the results you get from the each of the anonymized tables for the problem list.
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#11 usefulness". The body of the message should contain your answers.