Migrating Data from LightDB-A 4.3 or 5 to LightDB-A 6
You can migrate data from LightDB-A Database 4.3 or 5 to LightDB-A 6 using the standard backup and restore procedures, gpbackup
and gprestore
, or by using gpcopy
for VMware LightDB-A.
Note Open source LightDB-A Database is available only for LightDB-A Database 5 and later.
Note You can upgrade a LightDB-A Database 5.28 system directly to LightDB-A 6.9 or later using gpupgrade. You cannot upgrade a LightDB-A Database 4.3 system directly to LightDB-A 6.
This topic identifies known issues you may encounter when moving data from LightDB-A 4.3 to LightDB-A 6. You can work around these problems by making needed changes to your LightDB-A 4.3 databases so that you can create backups that can be restored successfully to LightDB-A 6.
- Preparing the LightDB-A 6 Cluster
- Preparing LightDB-A 4.3 and 5 Databases for Backup
- Backing Up and Restoring a Database
- Completing the Migration
Parent topic: Upgrading to LightDB-A 6
Preparing the LightDB-A 6 Cluster
Install and initialize a new LightDB-A Database 6 cluster using the version 6
gpinitsystem
utility.Note
gprestore
only supports restoring data to a cluster that has an identical number of hosts and an identical number of segments per host, with each segment having the samecontent_id
as the segment in the original cluster. Usegpcopy
(VMware LightDB-A) if you need to migrate data to a different-sized LightDB-A 6 cluster.Note Set the LightDB-A Database 6 timezone to a value that is compatible with your host systems. Setting the LightDB-A Database timezone prevents LightDB-A Database from selecting a timezone each time the cluster is restarted. See Configuring Timezone and Localization Settings for more information.
Install the latest release of the LightDB-A Backup and Restore utilities, available to download from VMware Tanzu Network or github.
If you intend to install LightDB-A Database 6 on the same hardware as your 4.3 system, you will need enough disk space to accommodate over five times the original data set (two full copies of the primary and mirror data sets, plus the original backup data in ASCII format) in order to migrate data with
gpbackup
andgprestore
. Keep in mind that the ASCII backup data will require more disk space than the original data, which may be stored in compressed binary format. Offline backup solutions such as Dell EMC Data Domain can reduce the required disk space on each host.If you want to migrate your data on the same hardware but do not have enough free disk space on your host systems,
gpcopy
for VMware LightDB-A provides the--truncate-source-after
option to truncate each source table after copying the table to the destination cluster and validating that the copy succeeded. This reduces the amount of free space needed to migrate clusters that reside on the same hardware. See Migrating Data with gpcopy for more information.Install any external modules used in your LightDB-A 4.3 system in the LightDB-A 6 system before you restore the backup, for example MADlib or PostGIS. If versions of the external modules are not compatible, you may need to exclude tables that reference them when restoring the LightDB-A 4.3 backup to LightDB-A 6.
The LightDB-A 4.3 Oracle Compatibility Functions module is not compatible with LightDB-A 6. Uninstall the module from LightDB-A 4.3 by running the
uninstall_orafunc.sql
script before you back up your databases:$ $GPHOME/share/postgresql/contrib/uninstall_orafunc.sql
You will also need to drop any dependent database objects that reference compatibility functions.
Install the Oracle Compatibility Functions in LightDB-A 6 by creating the
orafce
module in each database where you require the functions:$ psql -d <dbname> -c 'CREATE EXTENSION orafce'
See Installing Additional Supplied Modules for information about installing
orafce
and other modules.When restoring language-based user-defined functions, the shared object file must be in the location specified in the
CREATE FUNCTION
SQL command and must have been recompiled on the LightDB-A 6 system. This applies to user-defined functions, user-defined types, and any other objects that use custom functions, such as aggregates created with theCREATE AGGREGATE
command.LightDB-A 6 provides resource groups, an alternative to managing resources using resource queues. Setting the
gp_resource_manager
server configuration parameter toqueue
orgroup
selects the resource management scheme the LightDB-A Database system will use. The default isqueue
, so no action is required when you move from LightDB-A version 4.3 to version 6. To more easily transition from resource queues to resource groups, you can set resource groups to allocate and manage memory in a way that is similar to resource queue memory management. To select this feature, set theMEMORY_LIMIT
andMEMORY_SPILL_RATIO
attributes of your resource groups to 0. See Using Resource Groups for information about enabling and configuring resource groups.Filespaces are removed in LightDB-A 6. When creating a tablespace, note that different tablespace locations for a primary-mirror pair is no longer supported in LightDB-A 6. See Creating and Managing Tablespaces for information about creating and configuring tablespaces.
Preparing LightDB-A 4.3 and 5 Databases for Backup
Note A LightDB-A 4 system must be at least version 4.3.22 to use the
gpbackup
andgprestore
utilities. A LightDB-A 5 system must be at least version 5.5. Be sure to use the latest release of the backup and restore utilities, available for download from VMware Tanzu Network or github.Important Make sure that you have a complete backup of all data in the LightDB-A Database 4.3 or 5 cluster, and that you can successfully restore the LightDB-A Database cluster if necessary.
Following are some issues that are known to cause errors when restoring a LightDB-A 4.3 or 5 backup to LightDB-A 6. Keep a list of any changes you make to the LightDB-A 4.3 or 5 database to enable migration so that you can fix them in LightDB-A 6 after restoring the backup.
- References to catalog tables or their attributes can cause a restore to fail due to catalog changes from LightDB-A 4.3 or 5 to LightDB-A 6. Here are some catalog issues to be aware of when migrating to LightDB-A 6:
- In the
pg_class
system table, thereltoastidxid
column has been removed. - In the
pg_stat_replication
system table, theprocpid
column is renamed topid
. - In the
pg_stat_activity
system table, theprocpid
column is renamed topid
. Thecurrent_query
column is replaced by two columns:state
(the state of the backend), andquery
(the last run query, or currently running query ifstate
isactive
). - In the
gp_distribution_policy
system table, theattrnums
column is renamed todistkey
and its data type is changed toint2vector
. A backend functionpg_get_table_distributedby()
was added to get the distribution policy for a table as a string. - The
__gp_localid
and__gp_masterid
columns are removed from thesession_level_memory_consumption
system view in LightDB-A 6. The underlying external tables and functions are removed from thegp_toolkit
schema. - Filespaces are removed in LightDB-A 6. The
pg_filespace
andpg_filespace_entry
system tables are removed. Any reference topg_filespace
orpg_filespace_entry
will fail in LightDB-A 6. - Restoring a LightDB-A 4 backup can fail due to lack of dependency checking in LightDB-A 4 catalog tables. For example, restoring a UDF can fail if it references a custom data type that is created later in the backup file.
- In the
- The
INTO error_table
clause of theCREATE EXTERNAL TABLE
andCOPY
commands was deprecated in LightDB-A 4.3 and is unsupported in LightDB-A 5 and 6. Remove this clause from any external table definitions before you create a backup of your LightDB-A 4.3 system. TheERROR_TABLE
parameter of thegpload
utility load control YAML file must also be removed from anygpload
YAML files before you rungpload
. - The
int4_avg_accum()
function signature changed in LightDB-A 6 fromint4_avg_accum(bytea, integer)
toint4_avg_accum(bigint[], integer)
. This function is the state transition function (sfunc) called when calculating the average of a series of 4-byte integers. If you have created a custom aggregate in a previous LightDB-A release that called the built-inint4_avg_accum()
function, you will need to revise your aggregate for the new signature. - The
string_agg(expression)
function has been removed from LightDB-A 6. The function concatenates text values into a string. You can replace the single argument function with the functionstring_agg(expression, delimiter)
and specify an empty string as thedelimiter
, for examplestring_agg(txt_col1, '').
- The
offset
argument of thelag(expr, offset[, default])
window function has changed fromint8
in LightDB-A 4.3 toint4
in LightDB-A 5 and 6. -
gpbackup
saves the distribution policy and distribution key for each table in the backup so that data can be restored to the same segment. If a table’s distribution key in the LightDB-A 4.3 or 5 database is incompatible with LightDB-A 6,gprestore
cannot restore the table to the correct segment in the LightDB-A 6 database. This can happen if the distribution key in the older LightDB-A release has columns with data types not allowed in LightDB-A 6 distribution keys, or if the data representation for data types has changed or is insufficient for LightDB-A 6 to generate the same hash value for a distribution key. You should correct these kinds of problems by altering distribution keys in the tables before you back up the LightDB-A database. - LightDB-A 6 requires primary keys and unique index keys to match a table’s distribution key. The leaf partitions of partitioned tables must have the same distribution policy as the root partition. These known issues should be corrected in the source LightDB-A database before you back up the database:
- If the primary key is different than the distribution key for a table, alter the table to either remove the primary key or change the primary key to match the distribution key.
- If the key columns for a unique index are not a subset of the distribution key columns, before you back up the source database, drop the index and, optionally, recreate it with a compatible key.
- If a partitioned table in the source database has a
DISTRIBUTED BY
distribution policy, but has leaf partitions that areDISTRIBUTED RANDOMLY
, alter the leaf tables to match the root table distribution policy before you back up the source database.
- In LightDB-A 4.3, the name provided for a constraint in a
CREATE TABLE
command was the name of the index created to enforce the constraint, which could lead to indexes having the same name. In LightDB-A 6, duplicate index names are not allowed; restoring from a LightDB-A 4.3 backup that has duplicate index names will generate errors. Columns of type
abstime
,reltime
,tinterval
,money
, oranyarray
are not supported as distribution keys in LightDB-A 6.If you have tables distributed on columns of type
abstime
,reltime
,tinterval
,money
, oranyarray
, use theALTER TABLE
command to set the distribution toRANDOM
before you back up the database. After the data is restored, you can set a new distribution policy.In LightDB-A 4.3 and 5, it was possible to
ALTER
a table that has a primary key or unique index to beDISTRIBUTED RANDOMLY
. LightDB-A 6 does not permit tablesDISTRIBUTED RANDOMLY
to have primary keys or unique indexes. Restoring such a table from a LightDB-A 4.3 or 5 backup will cause an error.LightDB-A 6 no longer automatically converts from the deprecated timestamp format
YYYYMMDDHH24MISS
. The format could not be parsed unambiguously in previous LightDB-A Database releases. You can still specify theYYYYMMDDHH24MISS
format in conversion functions such asto_timestamp
andto_char
for compatibility with other database systems. You can use input formats for converting text to date or timestamp values to avoid unexpected results or query execution failures. For example, thisSELECT
command returns a timestamp in LightDB-A Database 5 and fails in 6.SELECT to_timestamp('20190905140000');
To convert the string to a timestamp in LightDB-A Database 6, you must use a valid format. Both of these commands return a timestamp in LightDB-A Database 6. The first example explicitly specifies a timestamp format. The second example uses the string in a format that LightDB-A Database recognizes.
SELECT to_timestamp('20190905140000','YYYYMMDDHH24MISS'); SELECT to_timestamp('201909051 40000');
The timestamp issue also applies when you use the
::
syntax. In LightDB-A Database 6, the first command returns an error. The second command returns a timestamp.SELECT '20190905140000'::timestamp ; SELECT '20190905 140000'::timestamp ;
Creating a table using the
CREATE TABLE AS
command in LightDB-A 4.3 or 5 could create a table with a duplicate distribution key. Thegpbackup
utility saves the table to the backup using aCREATE TABLE
command that lists the duplicate keys in theDISTRIBUTED BY
clause. Restoring this backup will cause a duplicate distribution key error. TheCREATE TABLE AS
command was fixed in LightDB-A 5.10 to disallow duplicate distribution keys.LightDB-A 4.3 supports foreign key constraints on columns of different types, for example,
numeric
andbigint
, with implicit type conversion. LightDB-A 5 and 6 do not support implicit type conversion. Restoring a table with a foreign key on columns with different data types causes an error.Only Boolean operators can use Boolean negators. In LightDB-A Database 4.3 and 5 it was possible to create a non-Boolean operator that specifies a Boolean negator function. For example, this
CREATE OPERATOR
command creates an integer@@
operator with a Boolean negator:CREATE OPERATOR public.@@ ( PROCEDURE = int4pl, LEFTARG = integer, RIGHTARG = integer, NEGATOR = OPERATOR(public.!!) );
If you restore a backup containing an operator like this to a LightDB-A 6 system,
gprestore
produces an error:ERROR: only boolean operators can have negators (SQLSTATE 42P13)
.In LightDB-A Database 4.3 and 5, the undocumented server configuration parameter
allow_system_table_mods
could have a value ofnone
,ddl
,dml
, orall
. In LightDB-A 6, this parameter has changed to a Boolean value, with a default value offalse
. If there are any references to this parameter in the source database, remove them to prevent errors during the restore.
Backing Up and Restoring a Database
First use gpbackup
to create a --metadata-only
backup from the source LightDB-A database and restore it to the LightDB-A 6 system. This helps find any additional problems that are not identified in Preparing LightDB-A 4.3 and 5 Databases for Backup. Refer to the LightDB-A Backup and Restore documentation for syntax and examples for the gpbackup
and gprestore
utilities.
Review the gprestore
log file for error messages and correct any remaining problems in the source LightDB-A database.
When you are able to restore a metadata backup successfully, create the full backup and then restore it to the LightDB-A 6 system, or use gpcopy
(VMware LightDB-A) to transfer the data. If needed, use the gpbackup
or gprestore
filter options to omit schemas or tables that cannot be restored without error.
If you use gpcopy
to migrate VMware LightDB-A data, initiate the gpcopy
operation from the LightDB-A 4.3.26 (or later) or the 5.9 (or later) cluster. See Migrating Data with gpcopy for more information.
Important When you restore a backup taken from a LightDB-A Database 4.3 or 5 system,
gprestore
warns that the restore will use legacy hash operators when loading the data. This is because LightDB-A 6 has new hash algorithms that map distribution keys to segments, but the data in the backup set must be restored to the same segments as the cluster from which the backup was taken. Thegprestore
utility sets thegp_use_legacy_hashops
server configuration parameter toon
when restoring to LightDB-A 6 from an earlier version so that the restored tables are created using the legacy operator classes instead of the new default operator classes.
After restoring, you can redistribute these tables with the gp_use_legacy_hashops
parameter set to off
so that the tables use the new LightDB-A 6 hash operators. See Working With Hash Operator Classes in LightDB-A 6 for more information and examples.
Completing the Migration
Migrate any tables you skipped during the restore using other methods, for example using the COPY TO
command to create an external file and then loading the data from the external file into LightDB-A 6 with the COPY FROM
command.
Recreate any objects you dropped in the LightDB-A 4.3 or 5 database to enable migration, such as external tables, indexes, user-defined functions, or user-defined aggregates.
Here are some additional items to consider to complete your migration to LightDB-A 6.
- LightDB-A Database 5 and 6 remove automatic casts between the text type and other data types. After you migrate from LightDB-A Database version 4.3 to version 6, this changed behavior may impact existing applications and queries. Refer to About Implicit Text Casting in LightDB-A Database for information, including a discussion about VMware supported and unsupported workarounds.
- After migrating data you may need to modify SQL scripts, administration scripts, and user-defined functions as necessary to account for changes in LightDB-A Database version 6. Review the VMware LightDB-A 6.0.0 Release Notes for features and changes that may necessitate post-migration tasks.
- To use the new LightDB-A 6 default hash operator classes, use the command
ALTER TABLE <table> SET DISTRIBUTED BY (<key>)
to redistribute tables restored from LightDB-A 4.3 or 5 backups. Thegp_use_legacy_hashops
parameter must be set tooff
when you run the command. See Working With Hash Operator Classes in LightDB-A 6 for more information about hash operator classes.
Working With Hash Operator Classes in LightDB-A 6
LightDB-A 6 has new jump consistent hash operators that map distribution keys for distributed tables to the segments. The new hash operators enable faster database expansion because they don’t require redistributing rows unless they map to a different segment. The hash operators used in LightDB-A 4.3 and 5 are present in LightDB-A 6 as non-default legacy hash operator classes. For example, for integer columns, the new hash operator class is named int_ops
and the legacy operator class is named cdbhash_int_ops
.
This example creates a table using the legacy hash operator class cdbhash_int_ops
.
test=# SET gp_use_legacy_hashops=on;
SET
test=# CREATE TABLE t1 (
c1 integer,
c2 integer,
p integer
) DISTRIBUTED BY (c1);
CREATE TABLE
test=# \d+ t1
Table "public.t1"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
c1 | integer | | plain | |
c2 | integer | | plain | |
p | integer | | plain | |
Distributed by: (c1)
Notice that the distribution key is c1
. If the gp_use_legacy_hashops
parameter is on
and the operator class is a legacy operator class, the operator class name is not shown. However, if gp_use_legacy_hashops
is off
, the legacy operator class name is reported with the distribution key.
test=# SET gp_use_legacy_hashops=off;
SET
test=# \d+ t1
Table "public.t1"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
c1 | integer | | plain | |
c2 | integer | | plain | |
p | integer | | plain | |
Distributed by: (c1 cdbhash_int4_ops)
The operator class name is reported only when it does not match the setting of the gp_use_legacy_hashops
parameter.
To change the table to use the new jump consistent operator class, use the ALTER TABLE
command to redistribute the table with the gp_use_legacy_hashops
parameter set to off
.
Note Redistributing tables with a large amount of data can take a long time.
test=# SHOW gp_use_legacy_hashops;
gp_use_legacy_hashops
-----------------------
off
(1 row)
test=# ALTER TABLE t1 SET DISTRIBUTED BY (c1);
ALTER TABLE
test=# \d+ t1
Table "public.t1"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
c1 | integer | | plain | |
c2 | integer | | plain | |
p | integer | | plain | |
Distributed by: (c1)
To verify the default jump consistent operator class has been used, set gp_use_legacy_hashops
to on before you show the table definition.
test=# SET gp_use_legacy_hashops=on;
SET
test=# \d+ t1
Table "public.t1"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
c1 | integer | | plain | |
c2 | integer | | plain | |
p | integer | | plain | |
Distributed by: (c1 int4_ops)