Is using Personally Identifiable Information (PII) as foreign keys discouraged in database design

database, database-design

While clensing PII from test data I have been stuck with a challenging scenario: cascading the changes through the foreign key relationships in the data. Given the focus on privacy and regulations should this practice be discouraged? If the PII itself were not used in any key fashion a neat trick would be to just shuffle the columns.

There are some commercial tools available to address this problem but none of them seem to handle a large variety of databases well.

Best Solution

HIPAA has a concept called the "Unique Patient Identifier" which can be used as we describe to link data:

Unique Patient Identifier eliminates the need for the repetitive use and disclosure of an individual's personal identification information (i.e. name, age, sex, race, marital status, place of residence, etc.) for routine internal and external communications (e.g. orders, results, medication, consultation, etc.) and protects the privacy of the individual. It helps preserve the patient anonymity while facilitating communication and information sharing. Healthcare is fundamentally a multi-disciplinary process. A Unique Patient Identifier enables the integration and the availability of critically needed information from multi-disciplinary sources and multiple care settings. Therefore, the integrity and security of the patient information depend on the use of a reliable Unique Patient Identifier.

The privacy issue hinges not so much on the identifier itself, but on the security and privacy of the data that the identifier is used to access, and how that access is controlled. My understanding is that typically this means that a system querying for information via a patient identifier should only get back information that can not be pieced together to reveal private information.

Essentially you would generate an artificial key for each person. Even though it is unique to the person, it is not personally identifying, unless you also were to release personally identifiable information along with it. For example, if you let people see only first names with a particular query, but also returned the artificial key, then they now know that artificial key 00003 is associated with first name Bob. now if you allow them to somehow go back and query with 00003 as criteria, and allow them access to the lastname, you can see how they can start to accumulate information. It is important that there be no way for an unauthorized user to get the artifical key and PII returned in the same query, since that would then make the artifical key itself PII. that's my interpretation at least.