Sharing Sensitive Data without Showing it
The border between what is public and what is private has become hard to patrol. Social media’s siren song plays out billions of times a day, connecting us far and wide, while sometimes hyper-sharing the most private facts of our lives.
The tension between the need to share and the need to rope off critical information constrains a wide range of enterprises too. Banks, for example, can’t collaborate on concerns of common interest without compromising customer confidentiality.
Raluca Ada Popa, assistant professor of computer science, designs computer systems to protect confidentiality by computing over encrypted data, while at the same time allowing joint access to the results of data analysis. With the support of the Bakar Fellows program her lab plans to build and test a new encryption system.
She describes the new systems and the encryption that underlies them, and she discusses her work with industry and health care systems eager to collaborate without compromising precious data.
Q. How does the push-pull of data sharing and confidentiality affect business operations?
A. Banking is a good example. Banks need to keep their transactions private or they risk losing clients to other competing banks. But if they try to stay totally sequestered from competitors, they lose the chance to tackle problems best solved together.
We’re involved in a collaboration now with five banks in Canada. We’re helping them address a big problem: Criminals launder money across different banks, but the banks can’t share data to track the laundering because they don’t want to expose their clients’ data to each other.
We are developing encryption algorithms and a system that can allow each bank to supply information for a shared strategy – a model to detect money laundering, but without disclosing their clients’ data. Each bank’s data can be manipulated, but it can’t be read by the other banks.
Q. How can a bank share data without the others seeing it?
A. That’s the magic of the encryption technology. Only the model is accessible to all the banks. It’s as if you supply your data to a blindfolded machine learning system that can compute on the data but cannot see it. You end up with a useful model without comprising your confidential data.
Q. You are working with medical centers to tackle similar kinds of problems, right?
A. Yes, we have proposed a solution to a problem faced by a major Bay Area health care provider. Researchers want to develop a good flu predictor to improve their vaccination program. They need patient data from hospitals over a wide geographic area, but they are blocked by each hospital’s obligation to protect patient confidentiality. Just as with the banks, they need a way to analyze data to develop a useful model without divulging sensitive information.
The encryption algorithm we are developing will allow them to supply enough data to allow a machine learning program to develop the model, while still “masking” the patient records. Researchers can access only the data needed to develop the model.
Q. This seems like such a pervasive problem. Aren’t there already algorithms and other strategies to allow this now?
A. Our peers in the theoretical cryptography community have developed a number of general-purpose solutions, but these are orders of magnitude too slow for many problems. For example, our system called Helen is about 1,000 times faster than current technology for the same level of security.
Q. How would you get new clients to adopt the system of submitting their data to the machine learning algorithm when that data is at the very heart of what makes them competitive?
A. Yes, clients indeed need to gain confidence in the power and privacy of the encryption algorithm. We already have a formal mathematical proof of the security of the algorithm. You need the community to understand, analyze, and build trust in it. Our system is going through extensive security and code reviews and tests by experts. And we are also setting up “hackathons” for hackers to try to attack the system.
My senior graduate student, Rishabh Poddar, is a Bakar Innovation Fellow this year and leads this effort on the student side. In addition to enabling adoption for his research on this project, he is considering launching a startup company to offer this technology. Organizations have a huge need to be able to work together but protect their sensitive data, and many industry sectors need such a technology.
The Bakar Fellows Program supports innovative research by early career faculty at UC Berkeley with a special focus on projects that hold commercial promise. For more information, see http://bakarfellows.berkeley.edu.