Canopy
11.1
Get Started
What is Canopy?
How Far Can Canopy Scale?
When to Use Canopy
Multi-Tenant Database
Real-Time Analytics
Considerations for Use
When Canopy is Inappropriate
Quick Tutorials
Multi-tenant Applications
Data model and sample data
Creating tables
Distributing tables and loading data
Running queries
Real-time Analytics
Data model and sample data
Creating tables
Distributing tables and loading data
Running queries
Install
Single-Node Canopy
CentOS or Red Hat
Multi-Node Canopy
CentOS or Red Hat
Steps to be executed on all nodes
Steps to be executed on the coordinator node
Use-Case Guides
Multi-tenant Applications
Let’s Make an App – Ad Analytics
Scaling the Relational Data Model
Preparing Tables and Ingesting Data
Try it Yourself
Integrating Applications
Sharing Data Between Tenants
Online Changes to the Schema
When Data Differs Across Tenants
Scaling Hardware Resources
Dealing with Big Tenants
Where to Go From Here
Real-Time Dashboards
Data Model
Rollups
Expiring Old Data
Approximate Distinct Counts
Unstructured Data with JSONB
Timeseries Data
Scaling Timeseries Data on Canopy
Automating Partition Creation
Architecture
Concepts
Nodes
Coordinator and Workers
Distributed Data
Table Types
Type 1: Distributed Tables
Type 2: Reference Tables
Type 3: Local Tables
Shards
Shard Placements
Co-Location
Parallelism
Query Execution
Develop
Determining Application Type
At a Glance
Examples and Characteristics
Choosing Distribution Column
Multi-Tenant Apps
Best Practices
Real-Time Apps
Best Practices
Timeseries Data
Best Practices
Table Co-Location
Data co-location in Canopy for hash-distributed tables
A practical example of co-location
Using Regular LightDB Tables
Distributing tables by ID
Distributing tables by tenant
Co-location means better feature support
Query Performance
Migrating an Existing App
Identify Distribution Strategy
Pick distribution key
Identify types of tables
Prepare Source Tables for Migration
Add distribution keys
Backfill newly created columns
Prepare Application for Canopy
Set up Development Canopy Cluster
Include distribution column in keys
Add distribution key to queries
Ruby on Rails
Django
ASP.NET
Other (SQL Principles)
Enable Secure Connections
Check for cross-node traffic
Migrate Production Data
Small Database Migration
SQL Reference
Creating and Modifying Distributed Objects (DDL)
Creating And Distributing Tables
Reference Tables
Distributing Coordinator Data
Co-Locating Tables
Dropping Tables
Modifying Tables
Adding/Modifying Columns
Adding/Removing Constraints
Using NOT VALID Constraints
Adding/Removing Indices
Types and Functions
Manual Modification
Ingesting, Modifying Data (DML)
Inserting Data
“From Select” Clause (Distributed Rollups)
COPY Command (Bulk load)
Caching Aggregations with Rollups
Updates and Deletion
Maximizing Write Performance
Querying Distributed Tables (SQL)
Aggregate Functions
Count (Distinct) Aggregates
Estimating Top N Items
Percentile Calculations
Limit Pushdown
Views on Distributed Tables
Joins
Co-located joins
Reference table joins
Repartition joins
Query Processing
Distributed Query Planner
Distributed Query Executor
Subquery/CTE Push-Pull Execution
LightDB planner and executor
Manual Query Propagation
Running on all Workers
Running on all Shards
Limitations
SQL Support and Workarounds
Workarounds
Work around limitations using CTEs
Temp Tables: the Workaround of Last Resort
Canopy API
Canopy Utility Functions
Table and Shard DDL
create_distributed_table
truncate_local_data_after_distributing_table
undistribute_table
alter_distributed_table
remove_local_tables_from_metadata
create_reference_table
canopy_add_local_table_to_metadata
mark_tables_colocated
update_distributed_table_colocation
create_distributed_function
create_time_partitions
drop_old_time_partitions
Metadata / Configuration Information
canopy_add_node
canopy_update_node
canopy_set_node_property
canopy_add_inactive_node
canopy_activate_node
canopy_disable_node
canopy_add_secondary_node
canopy_remove_node
canopy_get_active_worker_nodes
canopy_backend_gpid
canopy_check_cluster_node_health
canopy_set_coordinator_host
master_get_table_metadata
get_shard_id_for_distribution_column
column_to_column_name
canopy_relation_size
canopy_table_size
canopy_total_relation_size
canopy_stat_statements_reset
Cluster Management And Repair Functions
canopy_move_shard_placement
rebalance_table_shards
get_rebalance_table_shards_plan
get_rebalance_progress
canopy_add_rebalance_strategy
canopy_set_default_rebalance_strategy
canopy_remote_connection_stats
canopy_drain_node
isolate_tenant_to_new_shard
canopy_create_restore_point
Canopy Tables and Views
Coordinator Metadata
Partition table
Shard table
Shard information view
Shard placement table
Worker node table
Distributed object table
Canopy tables view
Time partitions view
Co-location group table
Rebalancer strategy table
Query statistics table
Distributed Query Activity
Tables on all Nodes
Connection Credentials Table
Connection Pooling Credentials
Configuration Reference
General configuration
canopy.max_worker_nodes_tracked (integer)
canopy.use_secondary_nodes (enum)
canopy.cluster_name (text)
canopy.enable_version_checks (boolean)
canopy.log_distributed_deadlock_detection (boolean)
canopy.distributed_deadlock_detection_factor (floating point)
canopy.node_connection_timeout (integer)
canopy.node_conninfo (text)
canopy.local_hostname (text)
canopy.show_shards_for_app_name_prefixes (text)
Query Statistics
canopy.stat_statements_purge_interval (integer)
canopy.stat_statements_max (integer)
canopy.stat_statements_track (enum)
Data Loading
canopy.multi_shard_commit_protocol (enum)
canopy.shard_count (integer)
canopy.shard_max_size (integer)
canopy.replicate_reference_tables_on_activate (boolean)
Planner Configuration
canopy.local_table_join_policy (enum)
canopy.limit_clause_row_fetch_count (integer)
canopy.count_distinct_error_rate (floating point)
canopy.task_assignment_policy (enum)
Intermediate Data Transfer
canopy.binary_worker_copy_format (boolean)
canopy.max_intermediate_result_size (integer)
DDL
canopy.enable_ddl_propagation (boolean)
canopy.enable_local_reference_table_foreign_keys (boolean)
Executor Configuration
General
Explain output
External Integrations
Ingesting Data from Kafka
Caveats
Ingesting Data from Spark
Business Intelligence with Tableau
Administer
Cluster Management
Choosing Cluster Size
Shard Count
Multi-Tenant SaaS Use-Case
Real-Time Analytics Use-Case
Initial Hardware Size
Multi-Tenant SaaS Use-Case
Real-Time Analytics Use-Case
Scaling the cluster
Add a worker
Rebalance Shards without Downtime
How it Works
Adding a coordinator
Dealing With Node Failures
Worker Node Failures
Coordinator Node Failures
Tenant Isolation
Viewing Query Statistics
Statistics Expiration
Resource Conservation
Limiting Long-Running Queries
Security
Connection Management
Setup Certificate Authority signed certificates
Increasing Worker Security
Row-Level Security
LightDB extensions
Creating a New Database
Table Management
Determining Table and Relation Size
Vacuuming Distributed Tables
Analyzing Distributed Tables
Troubleshoot
Query Performance Tuning
Table Distribution and Shards
LightDB tuning
Scaling Out Performance
Distributed Query Performance Tuning
General
Subquery/CTE Network Overhead
Advanced
Connection Management
Task Assignment Policy
Intermediate Data Transfer Format
Binary protocol
Scaling Out Data Ingestion
Real-time Insert and Updates
Insert Throughput
Update Throughput
Insert and Update: Throughput Checklist
Insert and Update: Latency
Staging Data Temporarily
Bulk Copy (250K - 2M/s)
Useful Diagnostic Queries
Finding which shard contains data for a specific tenant
Finding the distribution column for a table
Detecting locks
Querying the size of your shards
Querying the size of all distributed tables
Identifying unused indices
Monitoring client connection count
Viewing system queries
Active queries
Why are queries waiting
Index hit rate
Cache hit rate
Common Error Messages
Could not receive query results
Resolution
Canceling the transaction since it was involved in a distributed deadlock
Resolution
Could not connect to server: Cannot assign requested address
Resolution
SSL error: certificate verify failed
Resolution
Could not connect to any active placements
Resolution
Remaining connection slots are reserved for non-replication superuser connections
Resolution
PgBouncer cannot connect to server
Resolution
Relation
foo
is not distributed
Resolution
Unsupported clause type
Resolution
Cannot open new connections after the first modification command within a transaction
Resolution
Cannot create uniqueness constraint
Resolution
Function create_distributed_table does not exist
Resolution
STABLE functions used in UPDATE queries cannot be called with column references
Resolution
FAQ
Frequently Asked Questions
Can I create primary keys on distributed tables?
How do I add nodes to an existing Canopy cluster?
How does Canopy handle failure of a worker node?
How does Canopy handle failover of the coordinator node?
Are there any LightDB features not supported by Canopy?
How do I choose the shard count when I hash-partition my data?
How do I change the shard count for a hash partitioned table?
How does canopy support count(distinct) queries?
In which situations are uniqueness constraints supported on distributed tables?
How do I create database roles, functions, extensions etc in a Canopy cluster?
What if a worker node’s address changes?
Which shard contains data for a particular tenant?
I forgot the distribution column of a table, how do I find it?
Can I distribute a table by multiple keys?
Why does pg_relation_size report zero bytes for a distributed table?
Why am I seeing an error about max_intermediate_result_size?
Can I shard by schema on Canopy for multi-tenant applications?
Articles
Related Articles
LightDB Parallel Indexing in Canopy
Real-time Event Aggregation at Scale Using LightDB with Canopy
How Distributed Outer Joins on LightDB with Canopy Work
Distributed Outer Joins with Canopy
Designing your SaaS Database for Scale with LightDB
Sharding a Multi-Tenant App with LightDB
Tenancy
Multi-tenancy and co-location, a perfect pair
In conclusion
Sharding LightDB with Semi-Structured Data and Its Performance Implications
One large table, without joins
Enter Canopy
The query workload
Every distribution has its thorns
Scalable Real-time Product Search using LightDB with Canopy
Canopy
»
索引
索引