API Security

Four Dilemmas of Keeping Secrets

Tom McNamara

July 8, 2023

vector graphic of mesh network with machine entities as nodes

“Secrets,” when used in the context of cybersecurity, is a term that describes all kinds of credentials that are used to prove identity in a digital environment. Think about passwords or PINs for humans. In the cloud, where there are a vast number of non-human entities  such as Docker containers and Kubernetes clusters (we'll call these "machines"), the secrets include things like keys and tokens. Of course we're supposed to protect these and keep them secret. But we all know how difficult that is and we also know that secrets are an attractive and easy target for thieves. They’re attractive because they are one of two common credentials (a username and secret) needed to prove an identity through the method we know as “authentication” and they're an easy target because they can be easily discovered and exploited, especially when they are passed around. In this article, I am going to discuss some of the nuanced implications of managing secrets and authentication in the Cloud that create four dilemmas for security and risk managers in a digital enterprise. But don't lose hope, I am also going to describe a way to solve all four of the dilemmas. 

Dilemma #1 - Secrets Chaining

Secrets Chaining is a dilemma that occurs when secrets are centrally stored in a tool such as a secrets manager or password manager. These tools require authentication of the legitimate user, and this creates another secret or password (the one needed to access the tool). What do you do with that secret? I think you can see the dilemma… if you protect it somewhere else, then that other location must be secured and a chain is created; it becomes impractical to just keep growing the chain. In practice, the chain doesn't usually go very far, and it almost always ends up with a password owned by a system administrator. And we all know from experience that that is the weak link in the security chain. 

Dilemma #2 - Secrets Leakage 

But secrets used for authentication have another problem, too: leakage. It seems that they just can’t hold their secrecy for very long. One reason for this is that the process of authentication requires that secrets get passed around, and they’re not always passed, used, or stored securely. One solution to leakage assumes the secret will leak and to change it frequently, so that if it is revealed or stolen without your knowledge the potential vulnerability is only for a period of time (hopefully a short period of time). Another solution is using multi-factor Authentication (MFA). You have probably experienced this with a text message with a code arriving on your mobile phone at the moment you are logging into an app. In essence, the secret is so distrusted as to make it useless as a security credential and the one-time MFA code becomes the reliable form of identity-proof. 

Dilemma #3 - Machine Secrets 

With so many apps and services moving to the cloud, a new secrets  dilemma has emerged, and it is a dilemma of scale, reach, and speed. What works for humans does not necessarily work well for machines in the cloud. For one thing, there are a vast number of machines in the cloud; there are many more non-human entities than there are humans. This creates a significant problem of scale. Secondly, many machines are ephemeral and not persistent entities like humans. They appear and disappear randomly or periodically. For example, apps on mobile devices may only appear when they are activated or when the mobile device is connected to a wireless network. In other cases, virtual machines and kubernetes clusters may appear to handle additional user demand at certain times of the day or when certain events occur, such as a weather alert that causes many users to check a weather app at the same time. Nearly all of these machines are interconnected to share data through application programming interfaces (APIs). And many of these APIs require authentication before data is shared. This requires the handling and storage of a vast number of static API keys. The dilemma for machine secrets, such as API keys, is a by-product of the solutions to the secrets leakage dilemma; how does an enterprise frequently and instantaneously change short-lived keys across machines in all possible clouds during real-time operations?

Dilemma #4 - Secrets Injection 

Secrets Managers exist to help DevOps engineers and their security teams remove hard-coded static secrets from applications and lock them up in secure storage (a vault). Secrets Managers have their own APIs to allow external apps to programmatically access and remove secrets (API keys) from a secure vault when needed for use in making an API call to another app. (here is another instance of the Secrets Chaining dilemma, because the vault API needs to receive a key and authenticate the requesting app, too!) 

To overcome the secrets leakage dilemma, some Secrets Managers can rotate (regenerate) secrets that are inside the vault using a time-dependent “lease” approach. When the lease expires the secret is revoked and a new one is generated to replace it. But this creates another problem. A secret re-generated inside a vault must have a copy injected into the server App (API host) where authentication of the client app request occurs. This requires some special permissions and credentials for the Secrets Manager to access the server app (you guessed it, more secrets are needed for this, too) and replace the 'authentic' secret that is stored within it. If it doesn’t, then authentication would fail. In practice, key rotation works well with cloud service providers such as Azure or AWS where their secrets manager and the server app offering an API Endpoint are in the same vendor cloud. For example, the AWS Secrets Manager has an easy job of rotating secrets for an RDS database because all of the trust and permissions are managed by an AWS IAM configuration within AWS. 

But what if the API endpoint is with a server app that resides  in another cloud or inside an enterprise data center? AWS Secrets Manager won’t have the access permissions to inject secrets there. And the secrets injection dilemma becomes a lot harder when machines interoperate autonomously in a "lights out" scenario (without human interaction or triggering) or must scale rapidly within the cloud. Frequent secrets rotation may be essential for high security applications such as financial transactions; there are important use cases that are intolerant to service interruption needed for secrets injection. Stopping machines to re-generate secrets or waiting for a maintenance window to occur would affect the uptime of running services or be too infrequent as to achieve the desired security benefit of secrets rotation.

These four secrets dilemmas are truly dilemmas for security and risk professionals responsible for protecting machines and data in the cloud because the choices available to solve any of the dilemmas involve undesirable alternatives. For example, solving the secret leakage dilemma leads to the secrets injection dilemma.

Ideally, we should solve all four dilemmas in a single solution without creating another new dilemma in the process. It's a complex problem, but let’s consider a possible solution.

Solving the Four Secrets Dilemmas

This problem reminded me of a former manager who told me “Don’t bring me a problem without also bringing me a recommended solution.” I'm going to apply that advice here and describe three principles that would solve all four of the secrets dilemmas mentioned above without creating a new dilemma. These three principles are core to hopr’s solution for API threat protection, but it's not my intent to make this a marketing story. So I’ll describe each principle briefly for the purpose of educating at a high level. Feel welcomed to contact me if you want to explore any of these in more detail.

Principle 1: Build Secrets Where They Are Needed

The idea is to equip each machine with its own key generator so it can build the secrets it needs, such as the key to make an API call, rather than having the key generated inside a vault where a Secrets Injection dilemma is created. The generator is a block of code that is pre-configured with an algorithm. When the generator runs, the algorithm produces a key. Nothing special so far. But here’s the novel part, when two machines (say, a client and a trusted third party) share the same generator/algorithm and run it at nearly the same time, they will build identical keys. This overcomes a longstanding problem of exchanging or sharing a key between endpoints, a problem that has contributed to the dilemmas above. With this approach, there is no need for a machine to retrieve a secret from the vault or to have it injected into a remote App endpoint. Just like the registration of an App with an API service  provides a client App and an API service with the identical static API key, the registration of the client machine with a third party, such as hopr, provides the App with a key generator/algorithm that is an exact duplicate of the one the third party created. Building secrets where they are needed solves the dilemmas associated with secrets leakage and secrets injection.

Principle 2: Build Secrets Only When They Are Needed

Hopr’s key generator, like all key generators, produces different keys when it runs at different times. The method (protocol) for producing identical keys at two machines is to run the two key generators at nearly the same time. This means each time a series of Client API transactions are requested (an API session) the key generator of the Client App and the key generator held by a trusted third party (in this case that’s Hopr) produce identical keys, which only they could possibly create (Principle 3 explains how the keys are used). The addition of a Server API creates a three-party security pattern, a well-proven Kerberos design pattern, where the trusted third party verifies trust in each machine first and then provides a session token to enable their direct API exchange. The keys/token exist only as long as the session is active and vanish when the session closes. Building secrets when they are needed at each session solves the dilemmas associated with secrets leakage, secrets chaining, and secrets injection.

Principle 3: Keep Secrets Where They Are Built

Based on the first two principles, we have two ephemeral keys at two locations at the same time. In a classic authentication process, the client App would make its API call to the trusted third party and present its key to authenticate itself. But this approach passes the key where it is potentially exposed to leakage and theft. A better solution is to keep both keys where they were built and to use the key to encrypt/decrypt API messages between each other. Since both the client and the trusted third party have identical keys no key exchange is needed, decryption of an encrypted message verifies that the requesting App is authentic and trusted. This provides end-to-end encryption of API messages and makes them tamper-proof regardless of the route they may travel. And along with the first two principles, the keys are nearly impossible for an attacker to find and exfiltrate before they vanish. Keeping secrets where they are built solves the secrets leakage and machine secrets dilemmas. 

Conclusion

As more businesses embrace digital transformation and seek benefits by exploiting the new technologies and architectures the cloud offers, we’re seeing that existing approaches that worked for humans or private enterprise data centers aren’t effective in the cloud. Innovative security solutions that are “cloud native” with the ability to scale with demand, in real-time, at the speed of the cloud and across all clouds are needed.