post-thumb

Proper key management in the cloud with a Cloud Secure Module

Introduction

Cryptography-dependent applications, such as payment applications, depend on HSMs (Hardware Security Modules) to securely store cryptographic keys and perform cryptographic operations. The security of these HSMs depends mainly on the key management process: that is the process where our key custodians update the set of keys in the HSMs. If these applications are being moved to the cloud, it is not sufficient to just move the HSMs to the cloud as well, because the key custodian can no longer perform key management. The key management process of the cloud provider is insufficient to guarantee the security of the keys. In this article, I explain why this is the case, and I propose an approach to solve this problem.

Before explaining what problems are solved, and how these are solved, let’s have a good look at what HSMs do.

What is an HSM?

An HSM (Hardware Security Module) is a “computer in a vault”. The cryptographic keys are stored in internal tamper protected memory and the computations with these keys are performed inside, by the HSM itself, so that keys never have to leave the device.

Performing cryptographic operations is not the main function of an HSM. Most modern microprocessors are quite good at doing these calculations themselves and even have specific instructions for doing fast encryptions. Instead, the main function of the HSM is to keep the keys secret. Keeping keys secret can be divided into:

  • Physical protections, such as a strong construction and tamper detection mechanisms that destroy the keys if you try to break it.
  • Logical protections. There is no command to extract the HSMs keys. No combination of commands can be used to get information about the keys. (This is not as easy as it sounds. For example, the often-used PKCS#11 standard for communication with HSMs is notorious for its security flaws.)

The picture below is an “Adyton” HSM that is used in many places in our company. Adyton

Key management

The security of HSMs does not only depend on their physical protection, but also on the surrounding key management procedures that maintain the secrecy of the cryptographic keys in the device. These key management procedures are executed by specialized key custodians who are responsible for maintaining availability and secrecy of cryptographic keys. They are the people that allows our company to claim liability for the security of our keys, and therefore our payment data.

Developers of secure applications do not have to worry about key management themselves, which has three advantages:

  • The possibility of security mistakes is reduced.
  • There is more security, because the keys cannot be accessed.
  • The programmer does not have to take the responsibility for cryptographic security.

This is the reason that compliance frameworks and many policies (including those of Worldline) forbid the use of keys in applications.

In summary:

The main function of an HSM is separation between key management and using the keys.

History of HSMs in the payment industry

To see why HSMs are such an important part of application security, let’s start with a bit of history. About half a century ago the first digital payment applications started. It began with the installation of “Automated Teller Machines” (ATMs) that allowed you to get cash without needing to go to a bank. Not much later, “Point of Sale” (PoS) terminals allowed you to pay for your purchase with your card and PIN.

The computers in these days did not have much memory (we expressed memory sizes in MB instead of GB). To reduce the number of cryptographic keys that needed to be stored, most cryptographic keys were calculated from other keys. These “derived keys” allowed to do all encryption operations such as transaction verification and PIN verification with only a handful of keys.

For example, in the Netherlands, there was a single Master PIN key for PIN transportation between Interpay and the banks. This key was used to derive different keys for each bank. This way, all parties needed a single key for secure PIN transport: each bank used their derived key, while Interpay only needed to store the Master PIN key. Other operations, such as generation of PINs, also used keys that were derived from other keys.

This construction using master keys had one big disadvantage. The security of the master keys was the basis of the security of the entire system. Knowledge of such a key was enough to do a lot of damage. The designers of these systems realized that and decided to require that these keys were protected by HSMs.

The original list of security requirements from that era has (unfortunately!) been lost in time, but here are a few that I reconstructed from context:

  • keys are always stored in an HSM (except for frequently changed session keys)
  • HSMs are controlled by key custodians (both Interpay and the banks)
  • all key operations are under dual control and use split knowledge (that is, no key custodian ever sees the plaintext of a key)
  • HSMs are connected directly to the server that handles the corresponding payment applications
  • there shall be no command to export master keys from the HSM
  • payment applications do not contain any key material
  • the server is situated in our own buildings (often on a separate floor within the office building)

This was the application for which HSMs were designed. The design requirements for HSMs were documented and standardized at that moment. The current ISO standard for HSM security (ISO 13491) is still based on these assumptions, even though it is updated regularly (in fact, I am one of the editors of this document).

Reality check

In the last few decades, the payment infrastructure is updated significantly and as a result, many details have been changed. As a result, their security assumptions are no longer true, and many requirements in the list above make no sense anymore.

Here’s a list of some changes in the way HSMs are used in the payment systems I have seen in my career at Worldline, in more or less chronological order.

ChangeReasonEffect on security
HSMs are connected in a local networkRedundancyHarder to detect wiretapping because there are so many wires
Servers moved to data centersEfficiencyLess oversight on servers
HSMs moved with the serversEfficiencyHSM-server cabling is no longer visible
HSMs connected via patch bay in data centerData centerConnections can be changed without being noticed
Key management via secure link to HSMCost savingsSecure link becomes new attack target
Servers in virtual machinesEfficiencyConnections are determined by configuration files and can be modified without physical access
Key stored in encrypted databaseAllow for more keysOld keys can be used again by copying the database entries
Moving HSMs to the cloudEfficiencyKey management no longer performed by key custodians

For every change, the same reasoning was used:

  • It is a small change, the security design is basically the same as it was.
  • It is only a negligible amount of additional risk.
  • We don’t want to redesign everything because that would be too costly.

After many changes, it becomes like the telephone game: many small changes can completely change the original intent.

Telephone game

The final step in the table, were the HSMs are moved into the cloud, even defeats the original purpose of HSMs: key management is no longer explicitly separated from use of the keys. This is because the cloud provider does not support separation of key management from key use by default. In fact, they advertise how easy it is to have “automatic key management” for applications, allowing the application developers to set up the keys themselves.

The goal of this article is to make sure the next change will not add another reduction of security to this table.

HSMs in the cloud

Cloud providers offer a feature called “Bring Your Own Key” (BYOK), which allows customers to store their existing encryption keys in the cloud provider’s HSMs. The obvious advantage of this is that new users of the cloud can start working without changing the keys. However convenient this sounds, there are two potential security risks here. First, the actual process of moving keys to the cloud is complicated and error prone. Second, the key is now stored in more locations than before, increasing the potential attack surface.

I even heard cloud providers advertise this claiming that under BYOK “keys are under your control”. This is misleading, as the converse is true: once your keys are uploaded to the cloud, you can never claim exclusive control over them anymore. For companies like Worldline this is specifically important, as we handle many keys owned by our clients.

The cloud provider may want to be able to move the keys from one HSM to another in order to be able to move their services to another location (since that ability is one of the cornerstones of the cloud provider’s services). This means that their way of storing HSM keys explicitly is designed to make moving keys easier, which works against the design criteria of an HSM. Furthermore, cloud providers do not want to take the liability for the key and associated data.

Current implementations of “HSM in the cloud”

Current applications using HSMs cannot be moved to the cloud just like that. When these applications are moved to the cloud, the way they work with HSMs has to be changed. Worldline made an overview of the different ways to use HSMs with cloud applications. Unfortunately, each of them has significant disadvantages:

  • The standard functionality of the cloud provider’s HSMs can be used. This consists often of not much more than a way to provide storage keys that you can use to encrypt data yourself; this is just good enough to prevent theft of the database contents if the attacker cannot access this HSM.
  • Some cloud providers have specific payment HSMs with specific functionality. To use these, you may have to provide keys to the cloud provider. To be honest, I do not understand that PCI compliance requirements can be met if you don’t manage your own keys, but apparently PCI does allow this. Also, many of the payment protocols in use by Worldline are not supported.
  • We could hire a rack in the cloud provider’s data center and put your own HSMs in there. This would allow to have all the functionality we need, including proper key management with our own methods. Unfortunately, this is extremely expensive, and furthermore, it means that we need multiple racks in different data centers if we want to meet our requirements for redundancy.
  • Finally, and this is the solution that is now often used in our company, you can just leave the HSMs where they are in our data center, and have the cloud applications “dial into” our data centers using a secure connection to get the necessary functionality. This is sometimes called the “hybrid” solution. It fulfills the necessary security and key management requirements. The main disadvantages of this method are that the connection is relatively slow, and that we still have to maintain hardware, defeating the purpose of using the cloud in the first place.

Towards security in the cloud

The main idea of this article is investigating this question from scratch, without getting distracted by the current solutions:

how do we do key management in the cloud?

As is clear from the discussion above, using HSMs in the cloud is not enough. As terrifying as it is, I can only conclude that the solution to our problem is to make something in software that should satisfy the same objectives as originally stated for HSMs, but adjusted for the world we currently live in.

I suggest to use the name “Cloud Security Module”, or CSM. This is not to be confused with “virtual HSM” where a physical HSM has different presences for different applications.

High level requirements

Let’s start with a description of what a CSM should do, at the highest level possible. From the idea “what an HSM does, but then in the cloud”, we get the following four basic requirements:

  1. Prevent keys from being read intentionally or on purpose, both by internal and external parties.
  2. Provide a means to separate the key management activities from cryptographic operations.
  3. Support key management procedures used by key custodians, allowing the custodians to take their responsibilities. The procedures in question must be comparable to current procedures.
  4. Operate within a cloud environment, performing cryptographic operations and protecting the keys.

You just created another threat to key security!

Before we go further, let’s specifically address the elephant in the room. The change from HSM to a software CSM can be seen as just another change in the way we handle keys and add it to the list of changes in the table above. And that is a fair point, the security design is changed again. I am not going to deny that.

Storing keys in the cloud sounds like a great idea. The HSM provides physical protection against key manipulation and theft. The cloud provider’s data centers provide a very secure environment for the HSM. The cloud provider will have to protect this since their entire company depends on this.

But it is not all about physical security. Storing keys in the cloud directly means that you trust the cloud provider with the keys. Keeping keys secure is not only about the way you store them but also about the key management and the procedures around it: the logical security.

In order to get a balanced view on the security of this alternate solution, it is important to understand the risks of both solutions and compare them in a fair way.

Fair comparison between HSM and CSM

Let’s compare protection of keys against leakage between HSMs and software.

RiskHSMSoftware
VerificationHard to check hardwareSource could be audited¹
Dependency²ManufacturerCloud provider and operating system
Security designOften proprietaryVisible in source code
Location of keysInside enclosureConfidential computing³
Government influence⁴Can be invisibleHard to hide
Security auditsNot adjusted for cloudNot yet developed

Clarification of the entries in the table:

  1. I will assume in this that the source code of the software is visible to the user of the CSM. I don’t think that a CSM based on secret source code is to be recommended.

  2. “Dependency” means which party you are implicitly trusting with your keys. As long as you don’t build your own hardware, there is always somebody you’re depending on.

  3. Confidential computing is a technology that encrypts the communication channel between a processor and memory. This technology is specifically created for running secure code in a cloud environment. It is elaborated further below.

  4. I acknowledge here that many governments cannot resist the temptation to try to get access to cryptographic key material.

    The “legal access” methods attempt to maintain the general security but allow a way to get access for governments; this tends to be not secure because they backfire, and attackers abuse them to eavesdrop . This has happened in the past, for example in the 2022 Greek surveillance scandal . Even worse, governments actively try to reduce security for “export” products. Cees Jansen writes about this on his home page , about an old encryption device that he helped developing when he worked for Philips. Translated from there:

    In order to be able to generate revenue abroad, which was a must for a commercial company like Philips, the strength of the cryptography had to be reduced. Contrary to everything that has been said and written about this in the past, this degradation was implemented in consultation with the government service at that time (the then National Bureau for Communications Security - NBV).

Aroflex encryption device

The question is: how sure are we that the HSMs we use today are completely free from such government influence?


Implementing the requirements

Now we have a high-level idea of what our CSM has to do, we can try to see how these can be worked out in more detail.

First requirement: preventing keys from being read

As you know, it is not a matter if someone breaks into your infrastructure, but when. The first an attacker does is check if there are files around in your system with keys, passwords or access tokens in them.

This includes your code repositories and databases! Therefore, any software that uses keys (and our CSM is certainly one of these) needs to store keys in a secure way, not accessible to regular users. Encrypting your secret data does not really solve the problem of preventing exposure of the secrets: it merely moves the problem of protecting your data into the problem of protecting your key.

The way to securely have secrets in your applications is called secrets management and there are standard solutions for that from companies like Hashicorp , Akeyless and CyberArk .

CSM has its own form of secrets management, that could be compatible with one of the commercial solutions but doesn’t need to be. This needs to be worked out; I will describe it in a generic way. When CSM starts up, it reads the keys once and stores them in process memory. This application will use Shamir secret sharing to read the keys from several trusted places, (at least two). Each of these trusted places contains a “key share database” that contains a key share for every key that is needed. For each key, the shares are combined to form the actual keys. In principle, you could store the entire key database in encrypted form with a single master key, and store that master key using key sharing. I chose not to use this solution in order to make key management easier; see below for details.

Each of the parties that provides such a database, will perform independent authentication of the CSM application before providing access to the shares. In that way, the security of the keys is dependent on the party with the strongest authentication.

CSM Application

Second requirement: Separation of key use and key management

The second requirement is about separating key use from key management. It would be hard to make an application that would do both aspects, but it is possible to split the aspects into two different applications:

  • One that performs the cryptographic operations with secret keys, and one that is used for key management.
  • One that performs key management operations. I will show below how this can be designed in such a way that it doesn’t have to deal with the values of the cryptographic keys, relaxing the security requirements for this application.

The main CSM application using the keys

The main application must not be able to modify keys. We choose a separate application that performs the cryptographic computations. Keys are protected because this application simply doesn’t have any key management functionality. This matches how HSMs are operated: calls to the HSM cannot influence its state. Key management uses a different interface.

Another measure that protects the keys against modification is protecting access to the keys. When the main application is running, the keys are in the RAM memory of the process, which is harder to access than the file system. Accessing RAM with another process is not impossible: there are attacks on the RAM contents directly (such as the Rowhammer attack ), where a process is able to find information about the memory contents of another process that runs in the same memory. Since the memory of a virtual machine is in a file, its contents can also be accessed directly. Confidential computation provides protection against access to RAM of a process that runs in the cloud that prevents all these attacks.

The key management application

Key management is a set of procedures that makes sure that keys are distributed in a secure, auditable way that allows to recover from disasters and human error. Proper key management procedures not only protect the keys, but also the key custodians. The main threat for key custodians is duress, where they are forced somehow to do things that can implicate the security. The key management that belongs to CSM application implements the technical part of the key management procedure. It consists of a key management interface program that is run by the key custodians. Each key custodian runs a separate copy of this program.

A key management operation updates the databases that contains key shares. When the databases are updated, a new copy of the main application is started that works with the new values. This allows for a smooth transition from one version of the key set to a new one.

The key shares are not simply stored in the databases by the key custodians. Note that the number of key custodians and the number of key share databases is not necessarily the same. There is a simple mathematical trick that allows the key custodians as a group to update a key, without ever having to see the key in the process. This mathematical trick is based on the mathematical properties of Shamir key sharing, and allows redundancy in the sense that the key management works even if not all key custodians are available, and the main application to work if not all key databases are available.

Key Sharing

When operating on a key, the key management interface program splits the share of the corresponding key custodian share databases separately, and will never operate on actual key values, reducing the security needs of this application. Together, the key custodians create new key shares in the databases by each providing a piece of the final key share.

How this works is illustrated in the picture above. The key custodians each have a key share, shown as half a key. Each custodian splits the share into three subshares shown with different colors as shown for one of the key custodians. They send the subshares to the respective key databases. Each key database combines the subshares into a full share. The three shares stored in the databases together form the key, which is the same as the combination of the key custodian’s shares.

The actual key will only start to exist when a new copy of the CSM application is started. This way, the key is protected during key management.

From a security point of view, the secrecy of the key is protected as long as any of the subshares stays secret, and access is protected as long as any database account of any key custodian is secure. This is as secure as it can be, since this by definition the only way to access a cloud resource.

Real key security

Third requirement: support key management operations

In the previous paragraph I showed that it is possible to make key management of CSM technically secure. Of course, the application needs to support key management procedures, preferably in a way that is compatible with current key management processes. This part of CSM is not completely technical. I am in discussion with key custodians to find out how their requirements can be translated into practical procedures that balance security with flexibility. These requirements include, but are not limited to:

  • Duress protection that prevents key custodians from being forced into actions. This could for example include a video link to the other custodians during key management actions. For the highest level of protection, there could be a way to force that actions can only be preformed if all custodians are in a specific room at the same time.
  • A way to provide an audit record of key management actions.
  • A way to produce verifiable reports of actions.

Fourth requirement: secure operations in the cloud

To satisfy the fourth high level requirement, we have to make sure our software is correct (which is known to be as good as impossible). This is probably the hardest requirement to satisfy in practice. To do this, I propose to use formal verification . This technology is used more and more for programs that need to perform reliably. It is the reason that operating systems are so much more reliable than they were ten years ago. In a nutshell, it is a mathematical proof of the theorem “this program does what is designed to do, and not more than that”. The computer can help with the tedious steps of filling in the details of this proof.

Scalability

A good cloud application is designed to be scalable. Since the CSM application has no state, it lends itself to simple scaling, so this shouldn’t be a problem.

The CSM application should also be able to securely authenticate the processes that are allowed to use its functions, to prevent abuse. (The way an HSM or CSM can be abused through its API depends on the cryptographic commands that are allowed to be used: this is a whole different problem that is worth investigation on its own.)

Another desirable property is to make it easy to make multiple copies of the application with different key sets and different authorizations. That means that there can be multiple key databases, separating keys per application domain, authorisation level. Of course, not all applications need to have the same set of cloud providers for their keys. I can imagine that for certain high security application, a larger number of key databases is used. This matches the way HSMs are used in practice, where different HSMs have different key sets and are used by different applications or users. The key custodians should be in control of the assignment of key sets.


Conclusion

This article shows how I believe it is possible to create secure key management within a cloud environment. The next step would be to precisely write down the requirements. The full set of requirements can be separated into three groups:

  • Technical requirements about implementation in the cloud. A pilot of the CSM would be useful to find out if all requirements are satisfied.
  • Procedural requirements regarding workability. For this, a pilot of the key management interface would provide necessary insight into the necessary details.
  • Requirements about the legal framework. To be honest, I can use some help here, since I do not have the necessary knowledge.
  • Adoption of the solution into compliance and standardization frameworks. This requires quite a bit of work and may need to be delayed until some experience has been collected with the steps above.

Relevant standards

If you want to find more information about the standards and compliance frameworks that are relevant for CSM, see:

A suggestion for cloud providers

Cloud providers use a set of permissions that allow specific tasks within the cloud environment. This permission system can be very fine-grained, so in principle it is possible to make specific permissions on key management and use of keys within an application. My recommendation to cloud providers is to have a good look at these requirements and see if these can be set up in a way that approximates the separation that HSMs can provide. One thing that I have not seen in current cloud offerings is what the key custodians need:

  • a good way to implement split knowledge (where no individual ever has access to a key) and
  • dual control (where no individual ever can execute certain actions alone).

These things may be hard to implement, but do address security risks that are more relevant than the marketing terms “Bring You Own Key” (BYOK) and “Automatic key rotation”.