Anonymization and Data Masking with PostgreSQL Friday 09:20 Hotel
Twitter: @daamien Blog: blog.taadeem.net LinkedIn: @damienclochard Company website: dalibo.com Other: postgresql_anonymizer
My name is Damien Clochard. Since 2005, I work for Dalibo, a worker-owned cooperative dedicated to PostgreSQL in France. I’ve had held positions in the company along the years, at the moment I am a DBA doing mostly Postgres integration, support and trainings...
I’m involved in the PostgreSQL community at various levels: I’m part the admin team behind the www.postgresql.fr platform. A few years ago, I launched a media called “PostgreSQL Magazine”. I’m also president of the french speaking PostgreSQL users association and one the organizer of the PG Day France conference.
FOSDEM is probably my favorite open source event! I really like to see of all these different free software communities gathering in one place for a few days. All these passionate people from around the world sharing more than source code…. It makes me feel small and important at the same time. When I came here for the first time 12 years ago, I was amazed by the vibe of the event. Even now, it never ceases to impress me.
I’m going to talk about anonymization and data masking. Over the last 2 years, we’ve seen growing concerns about protection of personal data… There’s the GDPR of course but it goes beyond legal obligations. I think free software communities must lead the way to build a future where privacy and anonymity are available to everyone. And of course PostgreSQL has an important role to in this domain because it’s by far the wolrd’s most dynamic and innovative database system.
Personally I think that PostgreSQL must evolve from being a simple data storage engine to a data protection platform with an emphasis on security features like encryption or row level security policies… In that regard, data anonymization is an old topic but it’s also a rather unexplored area.
Last year I started a project called called “PostgreSQL Anonymizer” which is basically a PoC to show why we should write anonymization and masking rules directly using the SQL syntax....
In most organization, the anonymization of sensible information is a task assigned to database administrators (DBA). So my talk is oriented to every DBA who ever tried to remove personal info from a dataset...
However I’m convinced that the anonymization policy of a dataset should be defined at the early development stages of every applications. It is a design task, just like choosing data types, adding indexes, defining integrity constraints, creating foreign keys, etc…. So my talk is also aimed at developers to convince them that they have the responsibility to describe how their datasets must be anonymized….
PostgreSQL 11 introduced built-in binary string functions such as sha256(), sha512(), etc. It’s not the most spectacular feature because they were already available in the pgcrypto extension. But implementing this directly inside the Postgres core and make them accessible to all users is a great move!
I’m on a mission to convince people that we need to extend the SQL syntax in order to define anonymization policies directly with the DDL language. Something like:
ALTER TABLE users ALTER COLUMN birth MASK WITH FUNCTION random_date();
It’s easier said than done, obviously. The road is long but my talk is small step toward this goal!
I usually spend a lot of time in the Open Source Design devroom (link below) which is always fun and insightful… And of course you’ll find around the PostgreSQL devroom!