Considerations that should come to mind:
Suppose one of our friends, Alice, from XYZ University was reading the following news article about four children who had cancer. Alice is very curious about one of the children, Brandon Steele, and wants to know more specifics about his medical condition. Here is the article Alice read.
In the article, Alice learned that "Brandon Steele of Taylorville, Illinois was diagnosed with neuroblastoma in August 1991 and later died." Alice wants to know more about Brandon's medical history.
After the first labs in this course, some data sets come to mind immediately. Links are provided below. Take a few minutes and investigate Brandon using the links provided. Find "something" about his medical history that is not contained in the article. That should be easy, because not much more detail is provided about Brandon in the article.
When you find a fact or two about Brandon's medical record, send an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#8a Brandon". The body of the message should contain the following the fact or two you learned. Keep a copy of the information so we can share in class.
Discussion thought: Should Alice just be able to find this kind of information out so easily?
Later in this course we will learn ways to learn this kind of information automatically.
Consider the Social Security death indices that are on-line. Note. The on-lince indices do not cover exactly the same people. Nevertheless, using the on-line death indices and the hospital information below, tell me the names of some of these people and how they died.
Activity 2-1.
Start by making a list of Social Security death indices you will use. Find the URL for a death index. These ones used in earlier labs were, but you may find others to be more useful: http://www.ancestry.com/search/rectype/vital/ssdi/main.htm and http://ssdi.genealogy.rootsweb.com/.
Activity 2-2.
Below is a sample drawn from some hospital discharge data. Each of these records include people who died in the hospital. Report for some of these a description of the disease from which they died, their name and any other available information.
Send your answers as an email message to padlab@privacy.cs.cmu.edu. In the subject, write: "Lab#8a deaths described". The body of the message should contain your answers.
If there are multiple matches in death data, each match will be a row in the
spreadsheet. If there are no matches in the death data, there will be no
rows appearing for that search in the resulting spreadsheet.
Submit your search results in the described spreadsheet by
9am Friday, 11/19/2004!. Send your spreadsheet as an attachment
to padlab@privacy.cs.cmu.edu. We will combine your results with
those from your classmates to make a master death data list that you
can use for the remainder
of this assignment. Below are the records in the health data for which you must
provide matches. Redundancy has been purposefully built-in. More than one student
will provide an answer for each group of records. The record rows
correspond to rows in the Excel sheet at http://privacy.cs.cmu.edu/courses/pad1/assign/lab8a/part2/deaths.xls.
The y-axis is the number of records in the health data that have the same
bin size. As the bin size increases, the number of records having that bin size
is expected to decrease.
Consider your findings. Write a 3 page report on the experiment you just conducted.
Your write-up should include the traditional sections: Abstract, Introduction (explain why is this experiment important),
Background (describe sharing practices), Methods (describe your experiment, be precise), Results (include your diagram and report), and Discussion (explain what was important about you demonstrated).
Submit your report by email to padlab@privacy.cs.cmu.edu. A copy will be placed
on-line for review. Also include an Excel spreadsheet showing the binsizes you found.
Part III. Assignment (Due Monday 11/22/2004 9am)
Health record rows Student 2-21 Born 2-21 Chang 22-41 Forges 22-41 Gaustad 42-61 Goodman 42-61 Gwynn 62-81 Hum 62-81 Imrhan 82-101 Johnson 82-101 Kannan 102-121 Kim 102-121 Lemmon 122-141 Lim 122-141 Liu 142-161 Lynn 142-161 Mirochnik 162-181 Nussdorfer 162-181 Pawson 182-201 Pennock 182-201 Pickett Student Solutions
Fall 2004
Privacy and Anonymity in Data
Professor: Latanya Sweeney, Ph.D.
[latanya@privacy.cs.cmu.edu]