» Private Terraform Enterprise Azure Reference Architecture

» Introduction

This document provides recommended practices and a reference architecture for HashiCorp Private Terraform Enterprise (PTFE) implementations on Azure.

» Required Reading

Prior to making hardware sizing and architectural decisions, read through the installation information available for PTFE to familiarise yourself with the application components and architecture. Further, read the reliability and availability guidance as a primer to understanding the recommendations in this reference architecture.

» Infrastructure Requirements

Depending on the chosen operational mode, the infrastructure requirements for PTFE range from a single Azure VM instance for demo or proof of concept installations, to multiple instances connected to Azure Database for PostgreSQL, Azure Blob Storage, and an external Vault cluster for a stateless production installation.

The following table provides high level server recommendations, and is meant as a guideline. Of particular note is the strong recommendation to avoid non-fixed performance CPUs, or “Burstable CPU” in Azure terms, such as B-series instances.

» PTFE Servers (Azure VMs)

Type CPU Memory Disk Azure VM Sizes
Minimum 2 core 8 GB RAM 50GB Standard_D2_v3
Recommended 4-8 core 16-32 GB RAM 50GB Standard_D4_v3, Standard_D8_v3

» Hardware Sizing Considerations

  • The minimum size would be appropriate for most initial production deployments or for development/testing environments.

  • The recommended size is for production environments where there is a consistently high workload in the form of concurrent terraform runs.

  • The default osDisk size for most Linux images on Azure is 30GB. When increasing the size of the osDisk partition, there may be additional steps required to fully utilize the disk space, such as using a tool like fdisk. This process is documented in the Azure knowledge base article "How to: Resize Linux osDisk partition on Azure".

» PostgreSQL Database (Azure Database for PostgreSQL)

Type CPU Memory Storage Azure DB Sizes
Minimum 2 core 4 GB RAM 50GB General Purpose 2 vCores
Recommended 4-8 core 8-16 GB RAM 50GB General Purpose 4 vCores, General Purpose 8 vCores

» Hardware Sizing Considerations

  • The minimum size would be appropriate for most initial production deployments, or for development/testing environments.

  • The recommended size is for production environments where there is a consistent high workload in the form of concurrent terraform runs.

» Object Storage (Azure Blob Storage)

An Azure Blob Storage container must be specified during the PTFE installation for application data to be stored securely and redundantly away from the Azure VMs running the PTFE application. This Azure Blob Storage container must be in the same region as the VMs and Azure Database for PostgreSQL instance. It is recommended the virtual network containing the PTFE servers be configured with a Virtual Network (VNet) service endpoint for Azure Storage. Vault is used to encrypt all application data stored in the Azure Blob Storage container. This allows for further server-side encryption by Azure Blob Storage if required by your security policy.

» Vault Cluster

In order to provide a fully stateless application deployment, PTFE must be configured to speak with an external Vault cluster. This reference architecture assumes that a highly available Vault cluster is accessible at an endpoint the PTFE servers can reach.

» Other Considerations

» Additional Azure Resources

In order to successfully provision this reference architecture you must also be permitted to create the following Azure resources:

» Network

To deploy PTFE in Azure you will need to create new or use existing networking infrastructure. The infrastructure diagram highlights some of the key components. These elements are likely to be very unique to your environment and not something this Reference Architecture can specify in detail.

» DNS

DNS can be configured outside of Azure or using Azure DNS. The fully qualified domain name should resolve to the Load Balancer. Creating the required DNS entry is outside the scope of this guide.

» SSL/TLS

An SSL/TLS certificate is required for secure communication between clients and the PTFE application server. The certificate can be specified during the UI-based installation or the path to the certificate codified during an unattended installation.

» Infrastructure Diagram

azure-infrastructure-diagram

The above diagram show the infrastructure components at a high-level.

» Application Layer

The Application Layer is composed of two PTFE servers (Azure VMs) running in different subnets and operating in an active/standby configuration. Traffic is routed to the active PTFE server via the Load Balancer rules and health checks. In the event that the active PTFE server becomes unavailable, the traffic will then route to the standby PTFE server, making it the new active server. Routing changes can also be managed by a human triggering by triggering a change in the Load Balancer configuration to switch between the PTFE servers.

» Storage Layer

The Storage Layer is composed of multiple service endpoints (Azure Database for PostgreSQL, Azure Blob Storage, Vault) all configured with or benefitting from inherent resiliency provided by Azure (in the case of Azure Database for PostgreSQL and Azure Blob Storage) or assumed resiliency provided by a well-architected deployment (in the case of Vault).

» Additional Information

» Infrastructure Provisioning

The recommended way to deploy PTFE is through use of a Terraform configuration that defines the required resources, their references to other resources, and associated dependencies.

» Normal Operation

» Component Interaction

The Load Balancer routes all traffic to the active PTFE instance, which handles all requests to the PTFE application.

The PTFE application is connected to the PostgreSQL database via the Azure provided database server name endpoint. All database requests are routed to the highly available infrastructure supporting Azure Database for PostgreSQL.

The PTFE application is connected to object storage via the Azure Blob Storage endpoint for the defined container. All object storage requests are routed to the highly available infrastructure supporting Azure Storage.

The PTFE application is connected to the Vault cluster via the Vault cluster endpoint URL.

» Monitoring

While there is not currently a monitoring guide for PTFE, information around logging, diagnostics as well as reliability and availability can be found on our website.

» Upgrades

See the Upgrades section of the documentation.

» High Availability

» Failure Scenarios

The PTFE Reference Architecture is designed to handle different failure scenarios that have different probabilities. The ability to provide better service continuity will improve as the architecture evolves.

» Component Failure

» Single VM Failure

In the event of the active instance failing, the Load Balancer should be reconfigured (manually or automatically) to route all traffic to the standby instance.

When using the Production - External Services deployment model (PostgreSQL Database, Object Storage, Vault), there is still some application configuration data present on the PTFE server such as installation type, database connection settings, and hostname; however, this data rarely changes. If the application configuration has not changed since installation, both PTFE1 and PTFE2 will use the same configuration and no action is required.

If the configuration on the active instance changes, you should create a snapshot via the UI or CLI and recover this to the standby instance so that both instances use the same configuration.

» PostgreSQL Database

The Azure Database for PostgreSQL service provides a guaranteed high level of availability. The financially backed service level agreement (SLA) is 99.99% upon general availability. There is virtually no application down time when using this service. More information on Azure Database for PostgreSQL service redundancy is available in the Azure documentation.

» Object Storage

Using Azure Blob Storage as an external object store leverages the highly available infrastructure provided by Azure. More information on Azure Storage redundancy is available in the Azure documentation.

» Vault Servers

For the purposes of this guide, the external Vault cluster is expected to be deployed and configured in line with the HashiCorp Vault Enterprise Reference Architecture. This would provide high availability and disaster recovery support, minimising downtime in the event of an outage.

» Disaster Recovery

» Failure Scenarios

The PTFE Reference Architecture is designed to handle different failure scenarios that have different probabilities. The ability to provide better service continuity will improve as the architecture evolves.

» Region Failure

PTFE is currently architected to provide high availability within a single Azure Region. Using multiple Azure Regions will give you greater control over your recovery time in the event of a hard dependency failure on a regional Azure service. In this section, we’ll discuss various implementation patterns and their typical availability.

An identical infrastructure should be provisioned in a secondary Azure Region. In the event of the primary Azure Region hosting the PTFE application failing, the secondary Azure Region will require some configuration before traffic is directed to it along with some global services such as DNS.

» Data Corruption

The PTFE application architecture relies on multiple service endpoints (Azure DB, Azure Storage, Vault) all providing their own backup and recovery functionality to support a low MTTR in the event of data corruption.

» PTFE Servers

When using the Production - External Services deployment model (PostgreSQL Database, Object Storage, Vault), there is still some application configuration data present on the PTFE server such as installation type, database connection settings, and hostname; however, this data rarely changes. We recommend configuring automated snapshots for this installation data so it can be recovered in the event of data corruption.

» PostgreSQL Database

Backup and recovery of PostgreSQL is managed by Azure and configured through the Azure portal or CLI. More details of Azure DB for PostgreSQL features are available here and summarised below:

Automated Backups – Azure Database for PostgreSQL automatically creates server backups and stores them in user configured locally redundant or geo-redundant storage.

Backup redundancy – Azure Database for PostgreSQL provides the flexibility to choose between locally redundant or geo-redundant backup storage.

» Object Storage

There is no automatic backup/snapshot of Azure Blob Storage by Azure, so it is recommended to script a container copy process from the container used by the PTFE application to a “backup container” in Azure Blob Storage that runs at regular intervals. It is important the copy process is not so frequent that data corruption in the source content is copied to the backup before it is identified.

» Vault Cluster

The recommended Vault Reference Architecture uses Consul for storage. Consul provides the underlying snapshot functionality to support Vault backup and recovery. Vault Backup/Restore doc.