gpmapreduce

Runs LightDB-A MapReduce jobs as defined in a YAML specification document.

Synopsis

gpmapreduce -f <config.yaml> [dbname [<username>]] 
     [-k <name=value> | --key <name=value>] 
     [-h <hostname> | --host <hostname>] [-p <port>| --port <port>] 
     [-U <username> | --username <username>] [-W] [-v]

gpmapreduce -x | --explain 

gpmapreduce -X | --explain-analyze

gpmapreduce -V | --version 

gpmapreduce -h | --help

Requirements

The following are required prior to running this program:

You must have your MapReduce job defined in a YAML file. See gpmapreduce.yaml for more information about the format of, and keywords supported in, the LightDB-A MapReduce YAML configuration file.
You must be a LightDB-A Database superuser to run MapReduce jobs written in untrusted Perl or Python.
You must be a LightDB-A Database superuser to run MapReduce jobs with EXEC and FILE inputs.
You must be a LightDB-A Database superuser to run MapReduce jobs with GPFDIST input unless the user has the appropriate rights granted.

Description

MapReduce is a programming model developed by Google for processing and generating large data sets on an array of commodity servers. LightDB-A MapReduce allows programmers who are familiar with the MapReduce paradigm to write map and reduce functions and submit them to the LightDB-A Database parallel engine for processing.

gpmapreduce is the LightDB-A MapReduce program. You configure a LightDB-A MapReduce job via a YAML-formatted configuration file that you pass to the program for execution by the LightDB-A Database parallel engine. The LightDB-A Database system distributes the input data, runs the program across a set of machines, handles machine failures, and manages the required inter-machine communication.

Options

-f config.yaml : Required. The YAML file that contains the LightDB-A MapReduce job definitions. Refer to gpmapreduce.yaml for the format and content of the parameters that you specify in this file.

-? | –help : Show help, then exit.

-V | –version : Show version information, then exit.

-v | –verbose : Show verbose output.

-x | –explain : Do not run MapReduce jobs, but produce explain plans.

-X | –explain-analyze : Run MapReduce jobs and produce explain-analyze plans.

-k | –keyname=value : Sets a YAML variable. A value is required. Defaults to “key” if no variable name is specified.

Connection Options

-h host | –host host : Specifies the host name of the machine on which the LightDB-A coordinator database server is running. If not specified, reads from the environment variable PGHOST or defaults to localhost.

-p port | –port port : Specifies the TCP port on which the LightDB-A coordinator database server is listening for connections. If not specified, reads from the environment variable PGPORT or defaults to 5432.

-U username | –username username : The database role name to connect as. If not specified, reads from the environment variable PGUSER or defaults to the current system user name.

-W | –password : Force a password prompt.

Examples

Run a MapReduce job as defined in my_mrjob.yaml and connect to the database mydatabase: