Database Meets Revision Control

Any developer who has worked with HIPAA compliancy knows that the law is murky at best and the fed doesn’t publish a programmers’ guide to make your life any easier. However, one of the cardinal rules is the requirement to keep track of who sees HIPAA data, who modifies it and when this was done. Another is that if you delete/update patient data you need to log what was deleted/updated in order to provide an audit trail, if only for the lawyers. Failure to do so can subject a company to some pretty draconian penalties.

This creates a challenge on the database side because SQL UPDATE obliterates a record’s history. There are a few potential solutions, such as maintaining a changelog which such updates are written based on table triggers. I’ve done this but the log of atomic changes can grow immense. It’s also difficult to reconstruct a large record based on potentially dozens or even hundreds of changes to records which must be retained for up to six years. That’s how a traditional RCS would handle rollbacks but it’s not practical inside the confines of a database.

Nevertheless, a resource control system (RCS) approach is what’s needed, where a SQL UPDATE would maintain a copy of the pre-updated record and freeze it from further changes. RCS does its work by storing just the changes, or diffs, made to a document. While it would be technically possible to do this with a database record — for instance, using a BLOB in a sibling table — there’s a simpler and more practical method that also maintains relational integrity.

What I arrived at was adding an enum, or integer, column to each table named entity_status, where entity_status = 1 is the current live record and entity_status = 0 is an archived row that should never show up in queries for live data, which can be accomplished either with SQL filtering or ideally table partitioning. I added a few more statuses, such as “locked” which also freezes a live record from concurrent editing, “merged” and “incomplete” which is a live record that hasn’t been completed or approved , but those are application-specific and not important here.

The issue is that if you have a normalized table you can’t have multiple records sharing the same primary key. In other words, a clinic encounter might identify the patient internally as patient_id=100 with a foreign key to the patient table. What happens in this system if you “update” Patient 100’s data? You would have to insert a new record with a new primary key, Now the encounter record is pointing at an inactive patient record.

The solution was creating companion _detail tables for HIPAA tables. The _detail table contains all the volatile protected data that might be updated. Here’s an example:

CREATE TABLE patient (
    patient_id SERIAL NOT NULL UNIQUE,
    created_by VARCHAR NOT NULL,
    update_time TIMESTAMP WITH TIMEZONE,
    PRIMARY KEY (patient_id));

CREATE TABLE patient_detail (
    patient_detail_id SERIAL NOT NULL UNIQUE,
    patient_id INTEGER NOT NULL,
    first_name VARCHAR,
    last_name VARCHAR,
    dob DATE NOT NULL,
    -- add more HIPAA columns here
    created_by VARCHAR NOT NULL,
    create_date TIMESTAMP WITH TIMEZONE,
    entity_status SMALLINT DEFAULT 1 CHECK(entity_status in (0, 1)),
    PRIMARY KEY (person_detail_id,person_id));

ALTER TABLE patient_detail ADD FOREIGN KEY (person_id) REFERENCES patient (patient_id) ON UPDATE RESTRICT ON DELETE RESTRICT;

With the patient table holding the permanent patient_id and exporting it as a foreign key to the patient_detail table, you always have a consistent patient_id. You just have to join any queries on the patient table with its companion patient_detail table, filtering out any records which don’t have an entity_status of active, i.e. the value of 1.

Ideally, you would access these two tables together via a SQL view to keep that logic encapsulated from parent queries:

SELECT P.*, PD.* FROM patient P LEFT JOIN patient_detail AS PD ON (PD.patient_id = P.patient_id AND PD.entity_status = ENTITY_ACTIVE);

Because, logically, every record update on a _detail table is an insert we can use a single stored procedure to handle both cases. In either case, I check for any existing _detail record that has an entity_status=1 and update that to entity_status=0, which hides it from the queries. Then I insert the new record, which defaults to entity_status=1.

It’s a very simple system which has been working flawlessly on a couple of busy health records systems for the past several years. Periodically, you may want to spool very old inactive records to offline storage and delete them from the live database. That will maintain your HIPAA compliance while freeing up disk and increasing query speed.