Syncing Elasticsearch and RDS with abc on AWS

Elasticsearch is great for creating a very important customer experience: relevant search results. Reactivesearch is an open source library that allows you to import React search components powered by Elasticsearch. Quick and relevant search UI is great but, you may want to leverage the advantages of a database and the search-ability of Elasticsearch. How do you get the best of both worlds? Appbase understood the need to have an Elasticsearch index in sync with a database so they created abc import.

Avi Aryan’ s article “CLI for Indexing data from Postgres to Elasticsearch”, shows how to use abc import locally, but we will adapt it for deploying in AWS. Please read Avi Aryan’ s article first and get everything running locally!

We will learn how to launch an EC2 instance that keeps an Elasticsearch index hosted in Appbase in sync with a RDS PostgreSQL table. RDS will be the source for the Elasticsearch index.

Required services & software:

  • An appbase account*, which is where we will host our Elasticsearch index.
  • An AWS account*, which is where we will host our PostgreSQL instance and our EC2 instance.

*Please pay attention to free tier usage and costs of these services.

Perquisite knowledge:

Note bolded terms are for the most part searchable to help you Ctrl + F your way through a page.

Overview

Amazon RDS Postgres Setup

  • Creating the RDS DB
  • Connecting to RDS
  • Changing and applying the Parameter group
  • Creating Dummy Data
  • Creating the replication slot

Creating an Elasticsearch Index on Appbase

Launching an EC2 Instance with abc

  • Download abc
  • Allow inbound traffic for RDS Security Group
  • Launch EC2 and install abc
  • Run abc import
  • Review of Architecture
  • Transforming Data before Indexing into Elasticsearch

Amazon RDS Postgres Setup

Creating the RDS DB
  • Navigate to the Amazon RDS service.
  • Select create a database and choose PostgreSQL for your engine option. Additionally if you want to manage costs check only enable options eligible for RDS Free Usage Tier.
  • Select the most recent DB engine version. We will use 10.6-R1.
  • Create a DB instance identifier, a master username, and a master password.
  • Under network and security select Public accessibility, which will allow us to connect via psql from our desktop.

Finally create a Database name and then select Create database at the bottom.

All other settings where kept as the default settings. Allowing Public accessibility is the only required setting change to follow along.

RDS docs

Connecting to RDS

We will use the psql utility to connect to our DB. Assuming you have psql installed, enter the following in your terminal:

-- CODE language-markup --
psql --host=[endpoint] --port=5432 --username=[username] --password --dbname=[db name]

The [endpoint] is found in the Connectivity & security tab under Endpoint & port. The [username] and [db name] are the ones you established when creating the database.

Next insert your password into the prompt and you should have access to the RDS instance you just created.

psql docs

AWS Connecting to a DB Instance Running the PostgreSQL Database Engine

Changing and Applying the Parameter group

If you read CLI for Indexing data from Postgres to Elasticsearch you will notice that you had to change the following in the postgresql.conf file.

-- CODE language-markup --
wal_level=logical
max_replication_slots=1

In the psql shell try to enter the following:

-- CODE language-markup --select * from pg_create_logical_replication_slot('users_replication_slot', 'test_decoding');

You should get an error message that says

-- CODE language-markup --ERROR:  logical decoding requires wal_level >= logical

To look what your current wal_level setting is you can enter the following into the psql shell:

-- CODE language-markup --SHOW wal_level;

which will should return:

-- CODE language-markup --
wal_level
-----------
replica
(1 row)

You have to create a new parameter group with the wal_level=logical. In the RDS dashboard select Parameter groups and then press Create parameter group. Select the Parameter group family that matches the DB engine and version you selected. We are using postgres10. Create the parameter group and then select it.

Under parameters search for rds.logical_replication and edit the parameter so that its value is 1 and not 0. Changing rds.logical_replication is the same as changing the wal_level.

While you are in the parameters group change the wal_keep_segments to 1 from 32. How logical replication works is out of the scope of this article, but this prevents your database from running out of disk space quickly.

Creating a new parameter group creates a new postgres.conf file, however we need to reset our DB instance to use this file. Navigate to Databases in Amazon RDS and select your instance. There is a Modify button — select it and search for DB parameter group under Database options. Under the dropdown select the parameter group you just created then scroll to the bottom and select Continue, check Apply Immediately, and then select Modify DB instance.

You will need to reboot the instance after you modify it!

In our psql shell renter

-- CODE language-markup --SHOW wal_level;

and it should show

-- CODE language-markup --
wal_level
-----------
logical
(1 row)

Additionally if you want to look at the replication slots — enter the following into the psql shell:

-- CODE language-markup --
SHOW max_replication_slots;

which will return 10 replication slots — so our instance has available replication slots.

Creating the Replication Slot

Now you can enter the following in the psql shell and expect no errors and view the replication slot you created.

-- CODE language-markup --
select * from pg_create_logical_replication_slot('users_replication_slot', 'test_decoding');
SELECT * FROM pg_replication_slots;

Creating Dummy Data

While we are in the psql lets create some dummy data. We will follow along with CLI for Indexing data from Postgres to Elasticsearch, however there will be some changes.

We already have a database name that we specified earlier but we don’t have a table, so lets create one. Enter the following in the psql shell:

--- CODE language-markup --
CREATE TABLE users (
   _id varchar(500) NOT NULL,
   name varchar(100) NOT NULL,
   bio text,
   PRIMARY KEY(_id)
);

Take special note of _id. We want our primary key to be _id. When abc indexes the data from RDS it will create an “ _id” column. We want our database and Elasticsearch index _id key to be the same or we will run into errors later.

Additionally we will add some sample data to it.

-- CODE language-markup --
insert into users values ('foo@bar.com', 'foo', 'bar');
insert into users values ('zak@barker.com', 'zak', 'ceo');

Creating an Elasticsearch index on Appbase

Visit Appbase.io and sign up for an account. This is where we will be hosting our Elasticsearch index. Once you’ve created an account, create an app.


Launching an EC2 instance with abc

Download abc

Download the latest release of abc. We used abc-linux-0.10.2.zip.

Allow inbound traffic for RDS Security Group

Navigate to the RDS dashboard and find your db instance you created. Under Connectivity and security tab select the security group under VPC security groups. You will be taken to the security group settings. In the Inbound tab press the edit button and then Add Rule. Add a rule that allows inbound traffic from the same security group. The second entry in the photo below is the additional rule. By doing this you are allowing inbound traffic from any instance that has the same security group.

AWS Security Groups for Your VPC docs

VPC Design and New Capabilities for Amazon VPC video

Launch EC2 and install abc

Navigate to EC2 Dashboard in AWS and select Launch Instance. We will use Amazon Linux 2 AMI (HVM), SSD Volume Type with t2.micro. Create a new key pair and save the .pem file when launching, this will allow us to ssh into the machine.

One the instance has launched we need to configure it with the RDS security group. In the EC2 dashboard select the instance you created. Press the Actions button to see a dropdown and then select Networking > Change Security Groups. Find the RDS security group you previously modified and select Assign Security Groups. Next reboot the instance and it should be configured with the new security group.

In the EC2 dashboard select the instance and then press the connect button. Follow the instructions to connect to the machine. Once in create an abc folder. In your desktop terminal navigate to the folder where you saved abc and transfer it to your EC2 machine.

-- CODE language-markup --
scp -i [your pem key] [your file] [your instance endpoint]:~/abc/

scp docs

Confirm it is in your machine and then login

-- CODE language-markup --
./abc-0.10.2 login google

abc docs

Then you can see a list of the apps you created — you should see the one you created earlier when you run

-- CODE language-markup --
./abc-0.10.2 apps

Run abc import

In your EC2 run the following

-- CODE language-markup --
./abc-0.10.2 import --tail --src_type=postgres --src_uri="postgresql://[user]:[password]@[rds endpoint]/[db name]" --src_filter=.[table name] --replication_slot="users_replication_slot" [appbase app]

you should then see a response like below

-- CODE language-markup --
2 total data record(s) indexed

In your psql shell confirm that inserting and deleting is reflected in your Appbase Elasticsearch index.

Review of Architecture

Now we have gotten to a point where our RDS instance is the source of truth for our Appbase Elasticsearch index!

Transforming Data before Indexing into Elasticsearch

If you set up the users table as shown you shouldn’t have any problem syncing your RDS and Elasticsearch index. That is because you created the _id column. In order to tail the changes in RDS we need to explicitly provide _id for the table that gets indexed to Elasticsearch, otherwise an _id will be autogenerated in the Elasticsearch index. If we didn’t explicitly provide the _id , say we used id instead, the Elasticsearch index wouldn’t match up with what was deleted in RDS and abc will throw errors.

As shown in CLI for Indexing data from Postgres to Elasticsearch under Transforming Data before Indexing into Elasticsearch you can use transform_file.js to control how information is transformed from the the source (RDS) to the sink (Elasticsearch index). That way if you wanted the users table to use id instead of _id you could use the rename function to change id into_id.

abc Transform file docs

Summary

We walked through step by step instructions on how to set up abc in AWS. First we setup RDS with Postgres. Then we created an Elasticsearch Index on Appbase. Finally we launched an EC2 Instance with abc. Please contact me if anything needs any clarification or changes. Than