The approach allowed researchers to use machine learning on encrypted data without first decrypting it.
Banco Bradesco, S.A., a prominent Brazilian financial institution, has for the past year been working with IBM Research to apply a technique called homomorphic encryption to banking data. The pilot showed it was possible to apply machine learning algorithms to encrypted data without decrypting it, creating a new level of privacy that could be applied to other industries.
Machine learning is often used in banking and finance to predict scenarios like transaction fraud or investment outcomes. This typically involves vast stores of data, much of which are sensitive but must be decrypted before processing, exposing sensitive data to exfiltration and leaks.
The idea behind homomorphic encryption (HE), now emerging in real-life applications like this one, is to keep data encrypted while it’s being processed. This type of cryptography was first proposed in the 1970s; it wasn’t until 2009 that IBM scientist Craig Gentry created the first fully homomorphic encryption system. HE is based on the mathematics of lattices and, researchers say, protects the confidentiality of data from complex attacks – even by quantum computers.
“In the past, we’ve used encryption for transmitting data,” says Flavio Bergamaschi, IBM researcher and lead author of this project. When you shop online and enter your credit card number, it’s encrypted to transfer but must be decrypted to do anything with it. The number is encrypted when stored on a disk, but it must be decrypted to act on it.
Bergamaschi says HE protects information from what he calls the “honest but curious” threat model. An entity performing computation may be legitimate but at the same time curious about your information: When you ask a cloud service how long it takes to get to work, or where the nearest coffeeshop is, you reveal factors like where you are and where you’re going. The machine collecting this data can then create a graph of everyone whose data it holds.
With HE, these machines can perform computations while the data remains encrypted. As a result, the entity can act on data without gathering or storing any sensitive information. HE won’t prevent data breaches but will prevent data thieves from grabbing usable information. The technology has now reached an “inflection point” at which it’s ready for practical use.
During their pilot project with Banco Bradesco, the scientists’ goal was to look at an account holder’s banking activity over a window of time and using machine learning, predict with good accuracy whether that account holder would need a loan within the following three months.
The first step was to use HE to encrypt transaction data, as well as the machine learning-based prediction model. Financial analysts usually pinpoint factors in someone’s financial history to make these types of predictions, IBM explains in a blog post. Scientists showed they could make predictions using encrypted data with the same accuracy as with unencrypted data.
“Once we proved we could achieve the same level of accuracy, we looked at, ‘Can we now train or retrain the model using new transaction data that remains encrypted?'” says Bergamaschi of the process. “In doing so, we limited the chance of data exfiltration.” The team was able to train the model using encrypted data, demonstrating the use of HE to maintain data privacy and confidentiality while running algorithms on it.
The pilot, which ran from January through July 2019, taught a few key lessons. “It’s been very educational in the sense that we had to work with many groups that have different levels of understanding of the privacy, security, and mathematics behind everything,” Bergamaschi says. “Being able to interact with all of them, and trying to make all the mathematics and cryptography consumable, was interesting.”
Scientists also had to consider every aspect of their workflow and how to protect data in different scenarios. Being able to manage encryption keys was one; another was ensuring secure environments when the researchers had results and wanted to decrypt them.
Banking isn’t the only industry where HE can be applied. “There are a plethora of use cases that we are just scratching the surface of,” Bergamaschi adds. Industries like government and healthcare, where data privacy is a top priority, could benefit from the use of HE. IBM Research will continue working with Banco Bradesco to apply HE on financial data, he says.
We may not know the extent of where and how HE can be used. “Imagine what you could do that you don’t do today, if you could do the computation on encrypted data,” Bergamaschi adds. Many of business activities require information sharing, but the sharing of information is only done on a need-to-know basis. “There are many things we don’t do because we are not prepared to share the information in its raw format,” he says.
Kelly Sheridan is the Staff Editor at Dark Reading, where she focuses on cybersecurity news and analysis. She is a business technology journalist who previously reported for InformationWeek, where she covered Microsoft, and Insurance & Technology, where she covered financial … View Full Bio