Machine learning requires massive amounts of data to teach the model. But we're often uploading that data to machine learning cloud services run by folks like Amazon and Google, where it might be exposed to malicious actors. Can we use machine-learning-as-service and protect privacy?
Machine learning is one of the hottest topic in computer science today. So hot, in fact, that cloud providers are doing a good and rapidly growing business in machine-learning-as-a-service (MLaaS).
But these services come with a caveat: all the training data must be revealed to the service operator. Even if the service operator does not intentionally access the data, someone with nefarious motives may. Or their may be legal reasons to preserve privacy, such as with health data.
In a recent paper, Chiron: Privacy-preserving Machine Learning as a Service Tyler Hunt, of the University of Texas, and others, presents a system that preserves privacy while enabling the use of cloud MLaaS.
PRIVACY CUTS BOTH WAYS
While users may not wish to reveal their training data, the service providers have privacy concerns of their own. They typically do not allow customers to see the algorithms under their MLaaS technology.
To that end,
. . . Chiron conceals the training data from the service operator. [And] in keeping with how many existing ML-as-a-service platforms work, Chiron reveals neither the training algorithm nor the model structure to the user, providing only black-box access to the trained model.
Chiron uses Intel's Software Guard Extensions (SGX) secure enclaves, an architecture designed to increase the security of application code. But SGX alone isn't enough. Chiron also uses the SGX platform for Ryoan sandbox, a distributed, protected sandbox that secures untrusted user code from malicious infrastructure, such as you might find in the cloud.
Chiron's goal is to protect the user's training data, as well as trained model queries and outputs, while in the cloud.
To that end:
We assume that the entire platform is untrusted, including the . . . operating system and hypervisor. The attacker could be the machine's owner and operator, a curious or even malicious administrator, or an invader who has taken control of the OS and/or hypervisor. The attacker . . . could even be a malicious OS developer and add functionality that directly records user input.
Since trained models can leak training data through certain queries, Chiron ensures that only the entity that supplied the training data can query the resulting model. Even an attacker with complete control of the infrastructure could not query the model to access training data. It seems comprehensive enough, but there are issues with the underlying hardware.
SGX itself is not bulletproof. In particular, Intel's Performance Monitoring Unit (PMU), enables an untrusted platform to peer deeply into what the system is doing.
The current specification for SGX allows privileged software to manipulate the page tables of an enclave to observe its code and data trace at page-level granularity. This can lead to devastating attacks. . . .
Since Chiron relies on Intel's SGX, it can't be used with GPUs, since these lack an SGX-like facility. So the current implementation is far from ideal until the GPU vendors also start taking security seriously.
Despite the limitations, Hunt et. al. tested Chiron and found that its performance was competitive with standard, unprotected, infrastructures.
THE STORAGE BITS TAKE
The little Dutch boy had it easy: he could plug a hole in the dike with one finger. In our modern, massive data world, there are millions of holes, exploitable in thousands of ways.
Perfect security doesn't seem likely, but we can certainly do better than we have been; right, Facebook? If we can make it harder, we'll knock out the cyber street criminals - muggers - and leave the field to big, well-financed players, against which we can field big, well-financed, tools, such as Chiron.