• Overview
    • Enforce Policy as Code
    • Infrastructure as Code
    • Inject Secrets into Terraform
    • Integrate with Existing Workflows
    • Manage Kubernetes
    • Manage Virtual Machine Images
    • Multi-Cloud Deployment
    • Network Infrastructure Automation
    • Terraform CLI
    • Terraform Cloud
    • Terraform Enterprise
  • Registry
  • Tutorials
    • About the Docs
    • Intro to Terraform
    • Configuration Language
    • Terraform CLI
    • Terraform Cloud
    • Terraform Enterprise
    • Provider Use
    • Plugin Development
    • Registry Publishing
    • Integration Program
    • Terraform Tools
    • CDK for Terraform
    • Glossary
  • Community
GitHubTerraform Cloud
Download

    Terraform Enterprise Admin

  • Overview
    • Credentials
    • Hardware
      • Supported OS
      • RedHat Linux
      • CentOS Linux
      • Operational Mode
      • PostgreSQL
      • Minio Setup Guide
      • External Vault
    • Network
    • Docker Engine
  • Operational Modes
    • Overview
    • AWS Reference Architecture
    • Azure Reference Architecture
    • GCP Reference Architecture
    • VMware Reference Architecture
    • Pre-Install Checklist
      • 1. Run Installer
      • 2. Configure in Browser
      • Automated Installation
      • Active/Active
      • Initial User Automation
      • Encryption Password
    • Uninstall
    • Configuration
    • Team Membership
    • Attributes
    • Login
      • Sample Auth Request
      • ADFS
      • Azure Active Directory
      • Okta
      • OneLogin
    • Troubleshooting
    • Overview
      • Automated Recovery
      • Upgrades
      • Log Forwarding
      • Monitoring
      • Backups and Restores
      • Admin CLI Commands
      • Terraform Cloud Agents on TFE
      • Demo to Mounted Disk Migration
    • Terraform Cloud Agents on TFE
      • Accessing the Admin Interface
      • General Settings
      • Customization
      • Integration Settings
      • Managing Accounts & Resources
      • Module Sharing
      • Admin API
      • Updating Terraform Enterprise License
    • Terraform Enterprise Logs
    • Overview
    • Architecture Summary
    • Reliability & Availability
    • Capacity & Performance
    • Security Model
    • Overview
      • Overview
      • v202206-1
      • v202205-1
      • v202204-2
      • v202204-1
      • v202203-1
      • v202202-1
      • v202201-2
      • v202201-1
      • Overview
      • v202112-2
      • v202112-1
      • v202111-1
      • v202110-1
      • v202109-2
      • v202109-1
      • v202108-1
      • v202107-1
      • v202106-1
      • v202105-1
      • v202104-1
      • v202103-3
      • v202103-2
      • v202103-1
      • v202102-2
      • v202102-1
      • v202101-1
      • Overview
      • Overview
      • Overview
  • Support
  • Application Usage

  • Overview
  • Plans and Features
  • Getting Started
    • API Docs template
    • Overview
    • Account
    • Agent Pools
    • Agent Tokens
    • Applies
    • Audit Trails
    • Comments
    • Configuration Versions
    • Cost Estimates
    • Feature Sets
    • Invoices
    • IP Ranges
    • Notification Configurations
    • OAuth Clients
    • OAuth Tokens
    • Organizations
    • Organization Memberships
    • Organization Tags
    • Organization Tokens
    • Plan Exports
    • Plans
    • Policies
    • Policy Checks
    • Policy Sets
    • Policy Set Parameters
      • Modules
      • Providers
      • Private Provider Versions and Platforms
      • GPG Keys
    • Runs
      • Run Tasks
      • Stages and Results
      • Custom Integration
    • Run Triggers
    • SSH Keys
    • State Versions
    • State Version Outputs
    • Subscriptions
    • Team Access
    • Team Membership
    • Team Tokens
    • Teams
    • User Tokens
    • Users
    • Variables
    • VCS Events
    • Workspaces
    • Workspace-Specific Variables
    • Workspace Resources
    • Variable Sets
      • Overview
      • Module Sharing
      • Organizations
      • Runs
      • Settings
      • Terraform Versions
      • Users
      • Workspaces
    • Changelog
    • Stability Policy
    • Overview
    • Creating Workspaces
    • Naming
    • Terraform Configurations
      • Overview
      • Managing Variables
      • Overview
      • VCS Connections
      • Access
      • Drift Detection
      • Notifications
      • SSH Keys for Modules
      • Run Triggers
      • Run Tasks
    • Terraform State
    • JSON Filtering
    • Remote Operations
    • Viewing and Managing Runs
    • Run States and Stages
    • Run Modes and Options
    • UI/VCS-driven Runs
    • API-driven Runs
    • CLI-driven Runs
    • The Run Environment
    • Installing Software
    • Users
    • Teams
    • Organizations
    • Permissions
    • Two-factor Authentication
    • API Tokens
      • Overview
      • Microsoft Azure AD
      • Okta
      • SAML
      • Linking a User Account
      • Testing
    • Overview
    • GitHub.com
    • GitHub.com (OAuth)
    • GitHub Enterprise
    • GitLab.com
    • GitLab EE and CE
    • Bitbucket Cloud
    • Bitbucket Server and Data Center
    • Azure DevOps Services
    • Azure DevOps Server
    • Troubleshooting
    • Overview
    • Adding Public Providers and Modules
    • Publishing Private Providers
    • Publishing Private Modules
    • Using Providers and Modules
    • Configuration Designer
  • Migrating to Terraform Cloud
    • Overview
    • Using Sentinel with Terraform 0.12
    • Manage Policies
    • Enforce and Override Policies
    • Mocking Terraform Sentinel Data
    • Working With JSON Result Data
      • Overview
      • tfconfig
      • tfconfig/v2
      • tfplan
      • tfplan/v2
      • tfstate
      • tfstate/v2
      • tfrun
    • Example Policies
    • Overview
    • AWS
    • GCP
    • Azure
      • Overview
      • Service Catalog
      • Admin Guide
      • Developer Reference
      • Example Customizations
      • V1 Setup Instructions
    • Splunk Integration
    • Kubernetes Integration
    • Run Tasks Integration
    • Overview
    • IP Ranges
    • Data Security
    • Security Model
    • Overview
    • Part 1: Overview of Our Recommended Workflow
    • Part 2: Evaluating Your Current Provisioning Practices
    • Part 3: How to Evolve Your Provisioning Practices
    • Part 3.1: From Manual Changes to Semi-Automation
    • Part 3.2: From Semi-Automation to Infrastructure as Code
    • Part 3.3: From Infrastructure as Code to Collaborative Infrastructure as Code
    • Part 3.4: Advanced Workflow Improvements

  • Terraform Cloud Agents

  • Other Docs

  • Intro to Terraform
  • Configuration Language
  • Terraform CLI
  • Terraform Cloud
  • Terraform Enterprise
  • Provider Use
  • Plugin Development
  • Registry Publishing
  • Integration Program
  • Terraform Tools
  • CDK for Terraform
  • Glossary
Type '/' to Search

»Terraform Enterprise Automated Recovery

This guide explains how to configure automated recovery for a Terraform Enterprise installation. The goal is to provide a short Mean-Time-To-Recovery (MTTR) in the event of an outage. There are two steps in the automated recovery process:

  1. Configure snapshots on your current install to backup data
  2. Provision a new Terraform Enterprise instance using the latest snapshot

This guide will walk through both of these steps.

»Configure snapshots

Snapshots are taken on the Terraform Enterprise instance. That instance can have two types of data on it:

  • Terraform Enterprise application data: The core product data such as run history, configuration history, state history. This data changes frequently.
  • Terraform Enterprise installer data: The data used to configure Terraform Enterprise itself such as installation type, database connection settings, hostname. This data rarely changes.

In the Mounted Disk and External Services operational modes, only installer data is stored on the instance. Application data is stored in the mounted disk or in an external PostgreSQL instance.

Automated snapshots are more effective when using Mounted Disk or External Services as the amount of backed up data is smaller and less risky.

You can configure snapshots in the dashboard under Console Settings, in the Snapshot & Restore section. We recommend selecting Enable Automatic Scheduled Snapshots. The interval depends on the operational mode, but we recommend Daily for Mounted Disk or External Services because the snapshots contain only configuration data, not application data.

»Restore a snapshot in a new Terraform Enterprise instance

»Version Checking

Note: replicatedctl is located at /usr/local/bin/replicatedctl. On some operating systems, /usr/local/bin is not in the user's $PATH. In these cases, either add /usr/local/bin to the path or refer to replicatedctl with the full path.

Using the restore mechanism requires Replicated version 2.17.0 or greater. You can check the version using replicatedctl version.

»Script

Below are examples of restore scripts that are run on machine boot. There are many mechanisms that can run a script on boot (cloud-init, systemd, /etc/init.d). Which one is used is up to the user.

This script is presented as an example. Anyone using it needs to understand what it's doing and is free to modify it to meet any additional needs they have.

»S3

This example uses S3 as the mechanism to store snapshots on a debian-based system.

#!/bin/bash

set -e -u -o pipefail

bucket=your_bucket_to_store_snapshots
region=region_of_the_bucket
access_key=aws_access_key_id
secret_key=aws_secret_access_key

access="--store s3 --s3-bucket $bucket --s3-region $region --s3-key-id $access_key --s3-secret-key $secret_key"

# jq is used by this script, so install it. For other Linux distros, either preinstall jq
# and remove these lines, or change to the mechanism your distro uses to install jq.

apt-get update
apt-get install -y jq

# Run the installer.

curl https://install.terraform.io/ptfe/stable | bash -s fast-timeouts

# Wait for replicated to start before proceeding
until replicatedctl system status --template '{{and (eq .Replicated "ready") (eq .Retraced "ready")}}' | grep -q true; do
  sleep 1
  echo "Replicated is not yet ready."
done
echo "Replicated is ready."

# This retrieves a list of all the snapshots currently available.
replicatedctl snapshot ls $access -o json > /tmp/snapshots.json

# Pull just the snapshot id out of the list of snapshots
id=$(jq -r 'sort_by(.finished) | .[-1].id // ""' /tmp/snapshots.json)

# If there are no snapshots available, exit out
if test "$id" = ""; then
  echo "No snapshots found"
  exit 1
fi

echo "Restoring snapshot: $id"

# Restore the detected snapshot. This ignores preflight checks to be sure the application
# is booted.
replicatedctl snapshot restore $access --dismiss-preflight-checks "$id"

# Wait until the application reports itself as running. This step can be removed if
# something upstream is prepared to wait for the application to finish booting.
until curl -f -s --connect-timeout 1 http://localhost/_health_check; do
  sleep 1
done

echo
echo "Application booted!"
#!/bin/bash

set -e -u -o pipefail

bucket=your_bucket_to_store_snapshots
region=region_of_the_bucket
access_key=aws_access_key_id
secret_key=aws_secret_access_key

access="--store s3 --s3-bucket $bucket --s3-region $region --s3-key-id $access_key --s3-secret-key $secret_key"

# jq is used by this script, so install it. For other Linux distros, either preinstall jq
# and remove these lines, or change to the mechanism your distro uses to install jq.

apt-get update
apt-get install -y jq

# Run the installer.

curl https://install.terraform.io/ptfe/stable | bash -s fast-timeouts

# Wait for replicated to start before proceeding
until replicatedctl system status --template '{{and (eq .Replicated "ready") (eq .Retraced "ready")}}' | grep -q true; do
  sleep 1
  echo "Replicated is not yet ready."
done
echo "Replicated is ready."

# This retrieves a list of all the snapshots currently available.
replicatedctl snapshot ls $access -o json > /tmp/snapshots.json

# Pull just the snapshot id out of the list of snapshots
id=$(jq -r 'sort_by(.finished) | .[-1].id // ""' /tmp/snapshots.json)

# If there are no snapshots available, exit out
if test "$id" = ""; then
  echo "No snapshots found"
  exit 1
fi

echo "Restoring snapshot: $id"

# Restore the detected snapshot. This ignores preflight checks to be sure the application
# is booted.
replicatedctl snapshot restore $access --dismiss-preflight-checks "$id"

# Wait until the application reports itself as running. This step can be removed if
# something upstream is prepared to wait for the application to finish booting.
until curl -f -s --connect-timeout 1 http://localhost/_health_check; do
  sleep 1
done

echo
echo "Application booted!"

»SFTP

This example uses sftp to store the snapshots.

#!/bin/bash

set -e -u -o pipefail

key_file=path_to_your_ssh_key
key="$(base64 -w0 "$key_file")"
host=sftp_server_hostname_or_ip
user=user_to_sftp_on_the_remote_server

access="--store sftp --sftp-host $host --sftp-user $user --sftp-key $key"

# jq is used by this script, so install it. For other Linux distros, either preinstall jq
# and remove these lines, or change to the mechanism your distro uses to install jq.

apt-get update
apt-get install -y jq

# Run the installer.

curl https://install.terraform.io/ptfe/stable | bash -s fast-timeouts

# Wait for replicated to start before proceeding
until replicatedctl system status --template '{{and (eq .Replicated "ready") (eq .Retraced "ready")}}' | grep -q true; do
  sleep 1
  echo "Replicated is not yet ready."
done
echo "Replicated is ready."

# This retrieves a list of all the snapshots currently available.
replicatedctl snapshot ls $access -o json > /tmp/snapshots.json

# Pull just the snapshot id out of the list of snapshots
id=$(jq -r 'sort_by(.finished) | .[-1].id // ""' /tmp/snapshots.json)

# If there are no snapshots available, exit out
if test "$id" = ""; then
  echo "No snapshots found"
  exit 1
fi

echo "Restoring snapshot: $id"

# Restore the detected snapshot. This ignores preflight checks to be sure the application
# is booted.
replicatedctl snapshot restore $access --dismiss-preflight-checks "$id"

# Wait until the application reports itself as running. This step can be removed if
# something upstream is prepared to wait for the application to finish booting.
until curl -f -s --connect-timeout 1 http://localhost/_health_check; do
  sleep 1
done

echo
echo "Application booted!"
#!/bin/bash

set -e -u -o pipefail

key_file=path_to_your_ssh_key
key="$(base64 -w0 "$key_file")"
host=sftp_server_hostname_or_ip
user=user_to_sftp_on_the_remote_server

access="--store sftp --sftp-host $host --sftp-user $user --sftp-key $key"

# jq is used by this script, so install it. For other Linux distros, either preinstall jq
# and remove these lines, or change to the mechanism your distro uses to install jq.

apt-get update
apt-get install -y jq

# Run the installer.

curl https://install.terraform.io/ptfe/stable | bash -s fast-timeouts

# Wait for replicated to start before proceeding
until replicatedctl system status --template '{{and (eq .Replicated "ready") (eq .Retraced "ready")}}' | grep -q true; do
  sleep 1
  echo "Replicated is not yet ready."
done
echo "Replicated is ready."

# This retrieves a list of all the snapshots currently available.
replicatedctl snapshot ls $access -o json > /tmp/snapshots.json

# Pull just the snapshot id out of the list of snapshots
id=$(jq -r 'sort_by(.finished) | .[-1].id // ""' /tmp/snapshots.json)

# If there are no snapshots available, exit out
if test "$id" = ""; then
  echo "No snapshots found"
  exit 1
fi

echo "Restoring snapshot: $id"

# Restore the detected snapshot. This ignores preflight checks to be sure the application
# is booted.
replicatedctl snapshot restore $access --dismiss-preflight-checks "$id"

# Wait until the application reports itself as running. This step can be removed if
# something upstream is prepared to wait for the application to finish booting.
until curl -f -s --connect-timeout 1 http://localhost/_health_check; do
  sleep 1
done

echo
echo "Application booted!"

»Local directory

This example uses a local directory to store the snapshots. If this is intended to be run on a brand new system, the local directory would be either mounted block device or a network filesystem like NFS or CIFS.

#!/bin/bash

set -e -u -o pipefail

path=absolute_path_to_directory_of_snapshots

access="--store local --path $path"

# jq is used by this script, so install it. For other Linux distros, either preinstall jq
# and remove these lines, or change to the mechanism your distro uses to install jq.

apt-get update
apt-get install -y jq

# Run the installer.

curl https://install.terraform.io/ptfe/stable | bash -s fast-timeouts

# Wait for replicated to start before proceeding
until replicatedctl system status --template '{{and (eq .Replicated "ready") (eq .Retraced "ready")}}' | grep -q true; do
  sleep 1
  echo "Replicated is not yet ready."
done
echo "Replicated is ready."

# This retrieves a list of all the snapshots currently available.
replicatedctl snapshot ls $access -o json > /tmp/snapshots.json

# Pull just the snapshot id out of the list of snapshots
id=$(jq -r 'sort_by(.finished) | .[-1].id // ""' /tmp/snapshots.json)

# If there are no snapshots available, exit out
if test "$id" = ""; then
  echo "No snapshots found"
  exit 1
fi

echo "Restoring snapshot: $id"

# Restore the detected snapshot. This ignores preflight checks to be sure the application
# is booted.
replicatedctl snapshot restore $access --dismiss-preflight-checks "$id"

# Wait until the application reports itself as running. This step can be removed if
# something upstream is prepared to wait for the application to finish booting.
until curl -f -s --connect-timeout 1 http://localhost/_health_check; do
  sleep 1
done

echo
echo "Application booted!"
#!/bin/bash

set -e -u -o pipefail

path=absolute_path_to_directory_of_snapshots

access="--store local --path $path"

# jq is used by this script, so install it. For other Linux distros, either preinstall jq
# and remove these lines, or change to the mechanism your distro uses to install jq.

apt-get update
apt-get install -y jq

# Run the installer.

curl https://install.terraform.io/ptfe/stable | bash -s fast-timeouts

# Wait for replicated to start before proceeding
until replicatedctl system status --template '{{and (eq .Replicated "ready") (eq .Retraced "ready")}}' | grep -q true; do
  sleep 1
  echo "Replicated is not yet ready."
done
echo "Replicated is ready."

# This retrieves a list of all the snapshots currently available.
replicatedctl snapshot ls $access -o json > /tmp/snapshots.json

# Pull just the snapshot id out of the list of snapshots
id=$(jq -r 'sort_by(.finished) | .[-1].id // ""' /tmp/snapshots.json)

# If there are no snapshots available, exit out
if test "$id" = ""; then
  echo "No snapshots found"
  exit 1
fi

echo "Restoring snapshot: $id"

# Restore the detected snapshot. This ignores preflight checks to be sure the application
# is booted.
replicatedctl snapshot restore $access --dismiss-preflight-checks "$id"

# Wait until the application reports itself as running. This step can be removed if
# something upstream is prepared to wait for the application to finish booting.
until curl -f -s --connect-timeout 1 http://localhost/_health_check; do
  sleep 1
done

echo
echo "Application booted!"

»Airgap recovery considerations

The instructions above are tailored for the online install method. When restoring on an airgap instance, there are several additional considerations:

  1. The minimum version of Replicated is 2.31.0, rather than 2.17.0.
  2. The license file and airgap package must be in place on the new instance prior to restore. The restore process expects to find them in the same locations as they were on the original instance.
  3. The snapshot being used must also be from an airgap instance.
  4. The install.sh script and method used must be from the Replicated airgap installer boostrapper, using the process described for airgap installation.
  • Overview
  • Docs
  • Extend
  • Privacy
  • Security
  • Press Kit
  • Consent Manager