Loading Data with gpload
The LightDB-A gpload
utility loads data using readable external tables and the LightDB-A parallel file server (gpfdist or gpfdists
). It handles parallel file-based external table setup and allows users to configure their data format, external table definition, and gpfdist or gpfdists
setup in a single configuration file.
Note
gpfdist
andgpload
are compatible only with the LightDB-A Database major version in which they are shipped. For example, agpfdist
utility that is installed with LightDB-A Database 4.x cannot be used with LightDB-A Database 5.x or 6.x.Note
MERGE
andUPDATE
operations are not supported if the target table column name is a reserved keyword, has capital letters, or includes any character that requires quotes (“ ”) to identify the column.
To use gpload
Ensure that your environment is set up to run
gpload
. Some dependent files from your LightDB-A Database installation are required, such as gpfdist and Python, as well as network access to the LightDB-A segment hosts.See the LightDB-A Database Reference Guide for details.
Create your load control file. This is a YAML-formatted file that specifies the LightDB-A Database connection information, gpfdist configuration information, external table options, and data format.
See the LightDB-A Database Reference Guide for details.
For example:
--- VERSION: 1.0.0.1 DATABASE: ops USER: gpadmin HOST: cdw-1 PORT: 5432 GPLOAD: INPUT: - SOURCE: LOCAL_HOSTNAME: - etl1-1 - etl1-2 - etl1-3 - etl1-4 PORT: 8081 FILE: - /var/load/data/* - COLUMNS: - name: text - amount: float4 - category: text - descr: text - date: date - FORMAT: text - DELIMITER: '|' - ERROR_LIMIT: 25 - LOG_ERRORS: true OUTPUT: - TABLE: payables.expenses - MODE: INSERT PRELOAD: - REUSE_TABLES: true SQL: - BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)" - AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"
Run
gpload
, passing in the load control file. For example:gpload -f my_load.yml
Parent topic: Loading and Unloading Data