Deploying an EDB Postgres Distributed example cluster on AWS v5

This quick start sets up EDB Postgres Distributed with an Always On Single Location architecture using Amazon EC2.

Introducing TPA and PGD

We created TPA to make installing and managing various Postgres configurations easily repeatable. TPA orchestrates creating and deploying Postgres. In this quick start, you install TPA first. If you already have TPA installed, you can skip those steps. You can use TPA to deploy various configurations of Postgres clusters.

PGD is a multi-master replicating implementation of Postgres designed for high performance and availability. The installation of PGD is orchestrated by TPA. You will use TPA to generate a configuration file for a PGD demonstration cluster. This cluster uses Amazon EC2 instances configures your cluster with three data nodes, cohosting three PGD Proxy servers, along with a Barman node for backup. You can then use TPA to provision and deploy the required configuration and software to each node.

Preparation

Note

This set of steps is specifically for Ubuntu 22.04 LTS on Intel/AMD processors.

EDB account

To install both TPA and PGD, you need an EDB account.

Sign up for a free EDB account if you don't already have one. Signing up gives you a trial subscription to EDB's software repositories.

After you are registered, go to the EDB Repos 2.0 page, where you can obtain your repo token.

On your first visit to this page, select Request Access to generate your repo token. Copy the token using the Copy Token icon, and store it safely.

Install curl

You use the curl command to retrieve installation scripts from repositories. On Ubuntu, curl isn't installed by default. To see if it's present, run curl in the terminal:

$ curl
Command 'curl' not found, but can be installed with:
sudo apt install curl

If not found, run:

sudo apt -y install curl

Setting environment variables

First, set the EDB_SUBSCRIPTION_TOKEN environment variable to the value of your EDB repo token, obtained in the EDB account step.

export EDB_SUBSCRIPTION_TOKEN=<your-repo-token>

You can add this to your .bashrc script or similar shell profile to ensure it's always set.

Configure the repository

All the software needed for this example is available from the EDB Postgres Distributed package repository. The following command downloads and runs a script to configure the EDB Postgres Distributed repository. This repository also contains the TPA packages.

curl -1sLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/postgres_distributed/setup.deb.sh" | sudo -E bash
Troubleshooting repo access

The script should produce output starting with:

Executing the  setup script for the 'enterprisedb/postgres_distributed' repository ...

If it produces no output or an error, double-check that you entered your token correctly. If the problem persists, contact Support for assistance.

Installing Trusted Postgres Architect (TPA)

You'll use TPA to provision and deploy PGD. If you previously installed TPA, you can move on to the next step. You'll find full instructions for installing TPA in the Trusted Postgres Architect documentation, which we've also included here.

Linux environment

TPA supports several distributions of Linux as a host platform. These examples are written for Ubuntu 22.04, but steps are similar for other supported platforms.

Install the TPA package

sudo apt install tpaexec

Configuring TPA

You need to configure TPA, which configures TPA's Python environment. Call tpaexec with the command setup:

sudo /opt/EDB/TPA/bin/tpaexec setup
export PATH=$PATH:/opt/EDB/TPA/bin

You can add the export command to your shell's profile.

Testing the TPA installation

You can verify TPA is correctly installed by running selftest:

tpaexec selftest

TPA is now installed.

AWS Credentials

TPA uses your AWS credentials to perform the deployment onto AWS. Unless you have a corporate-managed account, you will need to get your credentials from AWS. Corporate-managed accounts will have their own process for obtaining credentials.

Your credentials will consist of an AWS Access Key ID and a Secret Access Key. You will also need to select an AWS default region for your work.

Set the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION to the values of your AWS credentials. You can add these to your .bashrc or similar shell profile to ensure they're always set.

$ export AWS_ACCESS_KEY_ID=THISISJUSTANEXAMPLE
$ export AWS_SECRET_ACCESS_KEY=d0ntU5E/Th1SAs1ts/jUs7anEXAMPLEKEY
$ export AWS_DEFAULT_REGION=us-west-2

Your account will need the necessary permissions to create and manage the resources that TPA uses. The TPA AWS platform page details the permissions that you'll need. Consult your AWS administrator if you need help with this.

Installing PGD using TPA

Generating a configuration file

Run the tpaexec configure command to generate a configuration folder:

tpaexec configure democluster \
  --architecture PGD-Always-ON \
  --platform aws \
  --region eu-west-1 \
  --edb-postgres-advanced 16 \
  --redwood \
  --location-names dc1 \
  --pgd-proxy-routing local \
  --no-git \
  --hostnames-unsorted

You specify the PGD-Always-ON architecture (--architecture PGD-Always-ON), which sets up the configuration for PGD 5's Always On architectures. As part of the default architecture, this configures your cluster with three data nodes, cohosting three PGD Proxy servers, along with a Barman node for backup.

Specify that you're using AWS (--platform aws) and eu-west-1 as the region (--region eu-west-1).

TPA defaults to t3.micro instances on AWS. This is enough for this demonstration and also suitable for use with an AWS free tier account.

AWS free tier limitations

AWS free tier limitations for EC2 are based on hours of instance usage. Depending on how much time you spend testing, you might exceed these limits and incur charges.

By default, TPA configures Debian as the default OS for all nodes on AWS.

Deployment platforms

Other Linux platforms are supported as deployment targets for PGD. See the EDB Postgres Distributed compatibility table for details.

Observe that you don't have to deploy PGD to the same platform you're using to run TPA!

Specify that the data nodes will be running EDB Postgres Advanced Server v16 (--edb-postgres-advanced 16) with Oracle compatibility (--redwood).

You set the notional location of the nodes to dc1 using --location-names. You then set --pgd-proxy-routing to local so that proxy routing can route traffic to all nodes within each location.

By default, TPA commits configuration changes to a Git repository. For this example, you don't need to do that, so you pass the --no-git flag.

Finally, you ask TPA to generate repeatable hostnames for the nodes by passing --hostnames-unsorted. Otherwise, it selects hostnames at random from a predefined list of suitable words.

This command creates a subdirectory in the current working directory called democluster. It contains the config.yml configuration file TPA uses to create the cluster. You can view it using:

less democluster/config.yml
Further reading
  • View the full set of available options by running:
    tpaexec configure --architecture PGD-Always-ON --help
  • More details on PGD-Always-ON configuration options in Deploying with TPA
  • PGD-Always-ON in the Trusted Postgres Architect documentation
  • tpaexec configure in the Trusted Postgres Architect documentation
  • AWS platform in the Trusted Postgres Architect documentation

Provisioning the cluster:

Next, allocate the resources needed to run the configuration you just created using the tpaexec provision command:

tpaexec provision democluster

Since you specified AWS as the platform (the default platform), TPA provisions EC2 instances, VPCs, subnets, routing tables, internet gateways, security groups, EBS volumes, elastic IPs, and so on.

Because you didn't specify an existing one when configuring, TPA also prompts you to confirm the creation of an S3 bucket.

Remember to remove the bucket when you're done testing!

TPA doesn't remove the bucket that it creates in this step when you later deprovision the cluster. Take note of the name now, so that you can be sure to remove it later.

Further reading

Deploying the cluster

With configuration in place and infrastructure provisioned, you can now deploy the distributed cluster:

tpaexec deploy democluster

TPA applies the configuration, installing the needed packages and setting up the actual EDB Postgres Distributed cluster.

Further reading

Connecting to the cluster

You're now ready to log into one of the nodes of the cluster with SSH and then connect to the database. Part of the configuration process set up SSH logins for all the nodes, complete with keys. To use the SSH configuration, you need to be in the democluster directory created by the tpaexec configure command earlier:

cd democluster

From there, you can run ssh -F ssh_config <hostname> to establish an SSH connection. You will connect to kaboom, the first database node in the cluster:

ssh -F ssh_config kaboom
Output
[admin@kaboom ~]# 

Notice that you're logged in as admin on kaboom.

You now need to adopt the identity of the enterprisedb user. This user is preconfigured and authorized to connect to the cluster's nodes.

sudo -iu enterprisedb
Output
enterprisedb@kaboom:~ $

You can now run the psql command to access the bdrdb database:

psql bdrdb
Output
psql (16.2.0, server 16.2.0)
Type "help" for help.

bdrdb=#

You're directly connected to the Postgres database running on the kaboom node and can start issuing SQL commands.

To leave the SQL client, enter exit.

Using PGD CLI

The pgd utility, also known as the PGD CLI, lets you control and manage your EDB Postgres Distributed cluster. It's already installed on the node.

You can use it to check the cluster's health by running pgd check-health:

pgd check-health
Output
Check      Status Message
-----      ------ -------
ClockSkew  Ok     All BDR node pairs have clockskew within permissible limit
Connection Ok     All BDR nodes are accessible
Raft       Ok     Raft Consensus is working correctly
Replslots  Ok     All BDR replication slots are working correctly
Version    Ok     All nodes are running same BDR versions
enterprisedb@kaboom:~ $

Or, you can use pgd show-nodes to ask PGD to show you the data-bearing nodes in the cluster:

pgd show-nodes
Output
Node   Node ID    Group        Type Current State Target State Status Seq ID
----   -------    -----        ---- ------------- ------------ ------ ------
kaboom 2710197610 dc1_subgroup data ACTIVE        ACTIVE       Up     1
kaftan 3490219809 dc1_subgroup data ACTIVE        ACTIVE       Up     3
kaolin 2111777360 dc1_subgroup data ACTIVE        ACTIVE       Up     2
enterprisedb@kaboom:~ $

Similarly, use pgd show-proxies to display the proxy connection nodes:

pgd show-proxies
Output
Proxy  Group        Listen Addresses Listen Port
-----  -----        ---------------- -----------
kaboom dc1_subgroup [0.0.0.0]        6432
kaftan dc1_subgroup [0.0.0.0]        6432
kaolin dc1_subgroup [0.0.0.0]        6432

The proxies provide high-availability connections to the cluster of data nodes for applications. You can connect to the proxies and, in turn, to the database with the command psql -h kaboom,kaftan,kaolin -p 6432 bdrdb:

psql -h kaboom,kaftan,kaolin -p 6432 bdrdb
Output
psql (16.2.0, server 16.2.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

bdrdb=#

Explore your cluster