This is part of the Beyond The Shiny New Toys series where I write about AWS reInvent 2019 announcements
Amazon Redshift has been going through a series of major changes which tremendously simplifies schema design and overall management of workloads. Here are some of the new features that were announced around the re:Invent 2019 timeframe that I specifically think a lot of customers (based on my earlier interaction with them) would look to put in use
Materialized Views
This has been one of the most wanted asks from many customers who migrate from other DW systems into Redshift. Materialized Views (MV) have a significant improvement on query performance for repeated workloads such as Dashboarding, queries from BI tools or certain predictable steps in ETL pipelines.
Till now, Redshift lacked support for MV and the recommendation has been to either modify your workloads or implement architectural changes such as performing a query rewrite using pg_bouncer
You can now use the native MV (available in preview) capability to address such needs. There are some current limitations though. For example, you need to manually refresh the MV whenever your base tables undergo changes. Over time, I am sure AWS folks would address these limitations based on customer feedback. You can find the complete set current limitations here: https://docs.aws.amazon.com/redshift/latest/dg/mv-usage-notes.html
Automated Table Sort
One of the most important best practices when it comes to Redshift is to keep the data Sorted. This would directly improve query performance as Redshift can read specific blocks of data (when your query has a filter) and also apply compression better. If your data is NOT sorted well enough, Redshift may read unwanted blocks and then later skip them in the memory. So, on incremental data loads, you had to earlier run “VACUUM SORT” command to make sure the data blocks are sorted.
With this new feature, Redshift automatically performs the Sorting activity in the background without any interruption to query processing. However, if you do have large data loads, you may still want to run “VACUUM SORT” manually (as Automatic Sorting may take a while to fully Sort in the background).
You can also monitor the “vacuum_sort_benefit” and “unsorted” columns in the SVV_TABLE_INFO table. Together, these columns tell you the following:
- What percentage of a particular table is “unsorted”
- How much percentage benefit would you derive by running “VACUUM SORT” against the table
Check the following documentation for more details: https://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html#automatic-table-sort
ALTER SORT KEY Dynamically
When you start using Redshift, you pick Distribution and Sort Keys for your tables. However, over time, as your workload evolves there may be a need to modify the Sort Keys that you originally picked. Previously, this meant, recreating your table with the new set of Sort Keys and loading all the data into that newly created table. This was required because, Redshift physically sorts the data in the underlying disks. Changing your Sort Keys meant re-sorting your data.
With this new feature, you can now dynamically change the Sort Keys of your existing table. Redshift, behind the scenes will re-sort the data while your table continues to be available for querying. This provides more flexibility when it comes to schema design.
Cross Instance Restore
https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-redshift-launches-cross-instance-restore/
This is another important feature and one that has been long requested by customers. You may want to restore a snapshot of production DC2.8XL cluster into a smaller DC2.Large cluster for your test/dev purposes. Or you may have a DC2.Large cluster with many number of nodes. You have a snapshot of that cluster and wish to launch a cluster with smaller number of DC2.8XL cluster. This wasn’t possible until this capability was introduced.
One of the important aspects that you want to consider when doing this exercise is to undersatnd how would your “target” cluster’s storage utilization on each node would look like. The following command in the AWS CLI would throw you some options to consider:
aws redshift describe-node-configuration-options --snapshot-identifier <mycluster-snapshot> --region eu-west-1 -—action-type restore-cluster
Automatic Workload Management
This isn’t a re:Invent timeframe announcement as such. This was announced in September. But I am including it here because this is a big one and simplifies day to day operations of a Redshift cluster for an administrator.
Even some of the large Redshift customers find it cumbersome to perform Workload Management (WLM) on Redshift. WLM on itself is a pretty deep topic and is something that you cannot avoid once your workloads start scaling on Redshift.
WLM provides many controls for a Redshift administrator to manage different workloads and give better experience for all types of users of the system. Over the years, WLM has evolved from a static configuration to a dynamic configuration (of queues and memory) with Queue Priorities, Query Monitoring Rules, Queue Hopping, Short Query Acceleration and Concurrency Scaling.
However all of these require someone to continuously observe the workloads on the cluster and keep tweaking these configurations. With Automatic WLM, Redshift removes much of these overheads from the administrator.
With Automatic WLM, you still define Queues, Queue Priorities, User/Query Groups and configure Concurrency Scaling (for required Queues). Automatic WLM will then dynamically manage memory allocation and concurrency amongst these queues based on the workload. Automatic WLM also works with Short Query Acceleration allowing short running queries to complete.
If you are managing WLM manually today, it might be worthwhile taking a look at this feature. You can read more about how Automatic WLM works here: https://docs.aws.amazon.com/redshift/latest/dg/automatic-wlm.html
A few more noteworthy ones
These are few more features that got added over the couse of 2019 – just ICYMI
- Stored Procedure Support. A BIG BIG ask from many customers. More here: https://docs.aws.amazon.com/redshift/latest/dg/stored-procedure-create.html
- Auto VACUUM DELETE: https://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html#automatic-table-delete
- Auto ANALYZE: https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html#t_Analyzing_tables-auto-analyze
- AUTO Distribution Style: https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
I believe with all these new capabilities, Redshift has now automated a whole lot of operations making administrators’ life simpler. Put it in typical Amazon way, Redshift now takes care of most of the “undifferentiated heavy lifting” 🙂
Did I miss any new major announcement? What do you think about these features? Do let me know your thoughts in the comments section below.
