Secrets management at scale with Hashicorp Vault and the hashi-tools library

Posted by Dani Shemesh on October 20, 2019
Secrets management at scale with Hashicorp Vault and the hashi-tools library | FullGC

Hashicorp Vault is a “secrets” management system, and one of the various Hashicorp open-source tools that the Fyber DevOps team uses.

This post describes the usage of Vault in Fyber as well as our newly open-sourced “hashi-tools” library.

The Security challenge

About a year ago, we deployed Vault to our production environment.

We embraced Vault to get rid of some bad practices that are quite common in the industry; storing credentials statically in a configuration file, or even hard-coded within the project’s repository.

Aside from the obvious security issue, this practice usually leads to a situation where multiple services share the same credentials, which means that these credentials need to have access to many resources. This breaks the least-privileges principle and makes the revoking process of these keys - in case they are compromised - very challenging.

For this reason (and others..), we wanted to change our way of working with secrets. We figured that it would be better to have:

  • Secrets for applications and systems in one secure, centralized place.

  • Dynamic and programmatic application access; services can get a temporary, unique credentials(called a lease) on demand, on runtime, per service.

  • A solution for the “first introduction” problem, where a service needs to hold a secret to be able to access other systems, whether directly or through a third party.

  • Easy key revocation.

  • Secure auditing.

Vault seemed like a natural fit, as it technically provides all of the above; it supports dynamic secrets for all popular systems and tools (cloud providers, databases, ssh, etc.), it provides a central, secure, configurable place to store secrets, it offers a variety of authentication methods which perform authentication and assigning policies to a user and services, and it has automatic revocation and auditing.

If you are not familiar with Vault, take a few minutes to review the documentation here.

image alt text

The plan

In Vault, we use policies to restrict access, enforcing the “need to know” principle and instrument a “Role-Based Access Control” by specifying access privileges.

For each backend (AWS/Mysql, etc.), we define roles; each has a different set of permissions granted.

We want that each service would be able to get credentials on demand, for each system, generated by the Vault server with the appropriate permissions. When a service is replaced, rebooted, or dies, its credentials should be revoked.

We also like to use Vault as, well, a “secret vault” to keep constant sensitive information there, just like as in Lastpass.

The flow should then be something like this:

The implementation challenge

So we constructed a Vault cluster. The remaining implementation of the above workflow is on the developer’s side. I feel safe in claiming that for most developers, dealing with secrets is no brainer. Credentials with more than enough permissions either find their way somehow to a configuration file inside the node, or they just all have access to them and they can put them wherever they like in the Git repo. With Vault, to be able to comunnicate with the Vault server, the developer will have to implement an authentication mechanism in each project. The way to do it is to:

  1. Store a pre-defined Vault token. Most of our services are dynamic in the sense that they scale up and down, so creating a mechanism where we generate and place a Vault token in the service config file is not ideal, and anyway we prefer not to keep any secret physically inside a node.

  2. Use a Vault authentication method. Vault has pluggable authentication methods, making it easy to authenticate with Vault using whatever form works best for you. For example, you can authenticate using your personal GitHub access token. However, as in this example, there is a need to store a secret physically, and the implementation is not always trivial. In both cases, the temporary token needs to be periodically renewed by the application.

After logging-in, in order to obtain credentials from the Vault server, an HTTP call with the appropriate properties is needed; and then parse the response and periodically renew the lease, along with the regulars, such as handle failures, monitoring, etc.

The ‘hashi-tools’ library

To ease the migration of the secrets management to Vault, I wrote an SDK library. Most of our services are written in Scala, and so is the client.

The library contains two clients: ‘Secrets’ for Vault and ‘Discovery’ for Consul. The library is used by all of our backend applications that have been written in the past year.

So then let’s update the workflow:

Main features

The main features for ‘Secrets’ are:

  • Scala-based.

  • Friendly DSL (with Java applications as well).

  • Implementation of the AWS AMI method.

  • Renews its Vault token automatically if needed.

  • Creates leases for Secret Engines and renews them automatically.

  • It exposes operational metrics.

  • It can be easily configured with the ‘application.conf’ file or with a Typesafe object.

  • Async functionality.

The full documentation can be found in the project’s repository.

Usage examples

image alt text

You can contribute!

There is still a long way to go to making the SKD more complete. The main tasks are here.