The following assignment is due next week, by Friday 1/27/2006 at 9am. No extensions!
As a student in the course, select any one of the following activities and complete it.
Notify the TAs (dp1staff@privacy.cs.cmu.edu)
as soon as possible of which activity you intend
to do. All students must report which activity they have selected
by Tuesday 1/24/2006.
Select an activity
Some activities are more involved than others. Some require programming
and others do no. You can select any one of these activities to achieve.
- Identifying "real" resumes. Write a Java program that attempts
to identify real resumes from documents that are retrieved
from a Google search "resume vitae." [Hard] Additional resource:
Filtered Search using Google API
- Harvest email address from resumes. Given a resume text file,
write a Java program that harvests the email address of the subject
of the resume, if the email address is present.
The goal is to identify the email address of the subject of the resume.
So, if there are multiple email addresses present, the program should
return the email address it considers the subject's. You may also want your
program to have an option to return all email addresses found.
[Easy] Additional resource:
databases of resumes
- Harvest Social Security number from resumes. Given a resume text file,
write a Java program that harvests the Social Security number of the subject
of the resume, if present.
Your program should not return values that are not Social Security numbers.
[Medium] Additional resource:
databases of resumes
- Harvest date of birth from resumes. Given a resume text file,
write a Java program that harvests the date of birth of the subject
of the resume, if present.
The goal is to identify the date of birth of the subject of the resume.
So, if there are multiple dates present, the program should
return the only the date of birth of the subject.
[Medium]
Additional resource:
databases of resumes
- Estimate the number of on-line resumes. Review literature and/or conduct
your own experiment to determine the number of on-line resumes. Be careful to not
just cite a source. Instead, you will want to find means to verify claims made
by others, especially job banks. Alternatively, you may come up with your method to estimate the number. [Easy]
- Job bank review. Suvery on-line job banks.
Decide on a set of questions to ask about a job bank
and then get answers from each job bank (most likely
using their on-line materials). Sample questions:
How many job banks are there?
How many resumes do they contain? How is access to resumes controlled?
How many of resumes contain Social Security numbers? [Medium]
- Investigate credit card application fraud.
The nature of the investigation is up to you.
It should be insightful. Same ideas: identify some criminal cases and report on them.
What are common ways to conduct credit card fraud done? What, if anything, has credit card companies
attempted to do to combat the problems. Brainstorm on ways credit card companies might combat
the problem. [Medium]
- Convert PDF into text. Write a Java program that given a resume
in PDF format, produces a text file containing the content. You may
write this from scratch, pipe through Google's server, or use any other
means. The format does not have to be preserved. [Medium]
- Convert HTML into text. Write a Java program that given a resume
in HTML format, produces a text file containing the content (no HTML tags).
The format does not have to be preserved. [Easy]
- Send email messages. Write a Java program that sends an email message.
[Easy]
See Architecture and Format
for more details.
What to submit