EDB WHITE PAPER Multitenancy Options in Postgres · multitenancy requirements for their application...

19
EDB WHITE PAPER Multitenancy Options in Postgres EnterpriseDB | www.enterprisedb.com Functionality, Isolation and Limitations By: Matthew Lewandowski Senior Sales Engineer

Transcript of EDB WHITE PAPER Multitenancy Options in Postgres · multitenancy requirements for their application...

EDB WHITE PAPER

Multitenancy Options in Postgres

EnterpriseDB | www.enterprisedb.com

Functionality, Isolation and Limitations

By: Matthew Lewandowski Senior Sales Engineer

CO

NT

EN

TS

03 Postgres Architecture Overview

07 Postgres Multitenancy Options

08 Multiple Databases in a Single Cluster

10 Multiple Schemas in a Single Database

EnterpriseDB, EDB and EDB Postgres are trademarks of EnterpriseDB Corporation.

Other names may be trademarks of their respective owners. Copyright© 2019. All rights reserved. 20190903 W W W . E N T E R P R I S E D B . C O M

12 Single Schema in a Single Database

13 Other Options

18 Summary

03 Multitenancy in Databases

EDB WHITE PAPER / PAGE 3 W W W . E N T E R P R I S E D B . C O M

Typically, multitenancy refers to a

software application that serves multiple

distinct groups of users sharing a single

instance of running software. From a

database perspective, this means that a

single instance of a database is used by

multiple applications. If you have a use for

such a multitenant database deployment,

this document will show you the different

ways that Postgres can help you achieve

a multitenant database architecture

and what the differences are in terms of

functionality and the level of separation

between different tenants. To be able to

explain this topic fully, we’ll start with a

description of the Postgres architecture

and relevant components.

Multitenancy in Databases

There are two main components that

make up a Postgres installation: the

actual database software and one or

more database instances. Figure 1 below

shows a Postgres installation.

Postgres Architecture Overview

Components of a Postgres Installation

Figure 1

EDB WHITE PAPER / PAGE 4 W W W . E N T E R P R I S E D B . C O M

The Postgres database software

installation contains the binaries, or

executables, and related software

libraries for running Postgres database

instances and interacting with them. In

addition to the executable programs and

related libraries, a Postgres installation

includes header files, extensions,

additional server plug-in modules, and

sample scripts and files.

Postgres refers to database instances as

database clusters because each cluster/

instance can contain multiple, logically

separate databases. At the physical

layer, a Postgres database cluster

consists of configuration files, data

files, and log files. A running Postgres

database cluster includes a single top

level “postmaster” process with several

child processes that perform various

functions related to running, maintaining,

and connecting and interacting with the

instance. A running Postgres instance

is configured to listen for and accept

connections on a specific network port

on one or more of the IP addresses

configured on its host server.

At the top level of its functional

hierarchy, a Postgres cluster contains

a set of roles (users and groups), one

or more databases, and one or more

tablespaces. Roles are authorized

(assigned functional rights) to the

tablespaces, databases and the objects

within those databases. Tablespaces

are used as an organizing construct

for specifying where objects within the

databases are stored on the file system.

Clients connect to specific databases

within a Postgres cluster by specifying

the host and port that the cluster is

running on, the name of the database

within the cluster, the user for the

connection (along with any required user

credentials), and optional parameters to

be used for the connection as desired.

Each database within a Postgres cluster

contains one or more schemas, as well

as catalogs and any installed extensions.

Schemas serve as namespaces under

which the various database objects

such as tables, views, sequences,

and functions are contained. At a

minimum, each database will have a

public schema, but other schemas can

be created as needed. Unlike some

other relational database management

systems (RDBMS), in Postgres schemas

are independent of users. Although each

schema is owned by a particular user,

not every user will necessarily have

their own schema. Users and groups are

granted usage and other permissions

for schemas and the objects within the

schemas.

EDB WHITE PAPER / PAGE 5 W W W . E N T E R P R I S E D B . C O M

Users are typically granted access to

specific schemas; however, that may not

always be the case depending on the

privileges that have been given on the

schemas and objects within them to the

connected use. In some cases, any of the

objects within a particular database can be

accessed via a single database connection.

Access to information in other Postgres

databases within the cluster requires a

separate connection for each database.

However, through the use of database links

or foreign tables that establish a connection

to another database “under the covers”,

it is possible to access the information

in multiple databases via a single user

session.

The figure below depicts some example

user access scenarios within a Postgres

database cluster. The intent is not to show

examples of all possible access privileges,

but rather to reinforce the concept that

users (and groups) are global objects within

a Postgres database cluster that can be

assigned access to information in one or

more databases in the cluster. The figure

also shows that users (and groups) can

be assigned access to different levels of

information within the object hierarchy of a

given Postgres database.

Figure 2

Figure 2 shows the object hierarchy within a Postgres database:

EDB WHITE PAPER / PAGE 6 W W W . E N T E R P R I S E D B . C O M

In Figure 3 above, User 1, User 2, the

Schema N Owner User, and the DB1

Owner User have only been given access

to objects in Database 1 whereas User 5,

User 6, and the DB2 Owner User only have

access to objects in Database 2. Superuser

1 and User 3 have access to objects in both

databases. Superuser 1, the DB1 Owner

User, and the DB2 Owner User are users

who have access to all objects within a

database. User 1, User 2, User 3, User 6,

and the Schema N Owner User are users

that have access to all objects only within a

specific schema. Finally, User 3 and User 5

are users who only have access to specific

tables within a schema.

In addition to being able to control access to

a particular schema and a particular set of

tables and views within a schema, Postgres

also provides the ability to restrict a user’s

access to a specific set of information within

the underlying tables using views, row-level

security policies, or a combination of both.

Example User Access Scenarios for Postgres Database Cluster

Figure 3

EDB WHITE PAPER / PAGE 7 W W W . E N T E R P R I S E D B . C O M

This overview of the Postgres architecture

should make it easier to explain and

understand the multitenancy options

available in Postgres. From a database

perspective, multitenancy means that a

single instance of a database running on a

server is used by multiple client applications.

Postgres Multitenancy OptionsWith Postgres there are three main

approaches that support multitenancy using

this definition: (1) multiple databases in a

single Postgres cluster (i.e., instance), (2)

multiple schemas in a single database within

a cluster, and (3) different or shared tables in

a single schema in a single database within

a cluster.

Figure 4

Figure 4 provides a conceptual example of restricting information within a table for

different users and applications using these features.

EDB WHITE PAPER / PAGE 8 W W W . E N T E R P R I S E D B . C O M

Each option has its own benefits and

limitations. Organizations should

choose an option based on the specific

multitenancy requirements for their

application or system. When deciding

which approach is most appropriate for

a given environment, note that it is also

possible to combine features from the

different options as part of a hybrid

solution.

Let’s take a closer look at each of the

three main multitenancy options in the

following sections.

The first option in Postgres is to create a

separate database for each application

within a single cluster. This option

provides a high level of data isolation

between each application. Data stored

in each of the databases cannot

1. Multiple Databases in a Single Clusterbe accessed without establishing

separate connections to each specific

database. Postgres provides additional

connection control features that allow

administrators to further limit who can

access a specific database, which

Figure 5

Figure 5 shows these options.

EDB WHITE PAPER / PAGE 9 W W W . E N T E R P R I S E D B . C O M

network locations can access a specific

database, and which authentication

methods are allowed for a specific

database. Finally, Postgres system

tables are private within a database.

For multitenancy, this means that one

application would not be able to view

information in another application’s

system tables.

Since Postgres extensions and

plug-in modules are installed at the

database level, each application

can use a different set of extensions

to suit its needs without impacting

the other tenant applications. Also,

since each application has its own

database, the database schema

(i.e., metadata) used by each tenant

application can be easily customized

or even be completely different. Using

independent databases for each tenant

also means that application developers

do not need to worry about building

features into the application to only

retrieve data for a specific tenant

and to protect one tenant’s data from

leakage to another tenant, simplifying

development.

Since the databases are contained in

a single Postgres cluster, this option

supports shared administration and

resource usage across the different

tenants, which is often a key motivator

for deciding to use a multitenant

architecture. At the same time, this

option provides some flexibility to

separately configure the different

databases, as many standard

Postgres configuration parameters

can be set on a per-database basis.

In addition, the EnterpriseDB Postgres

Advanced Server (EPAS) edition

includes a resource manager feature

that can be used to define resource

groups that can be assigned at the

database level. Finally, by including

the database name in all log entries,

administrators and auditors can easily

differentiate one tenant application’s

specific database activity and errors

from another. The EnterpriseDB audit

feature available in EPAS provides

the ability to configure different audit

settings per database.

Although using a separate database

for each application can be a good

option for some multitenant use cases,

it does present some challenges.

One of the major drawbacks of this

option is that it is more difficult to

2. Multiple Schemas in a Single Database

Another multitenancy option available

in Postgres is to create separate

schemas for each application within a

single database in a cluster. By using

schema-level privileges to restrict the

access of a tenant application to only

the application’s specific schema, this

option provides some level of data

isolation between each application,

though not to the same degree as using

separate databases.

A major benefit of using separate

schemas in a single database is that,

since the schemas are in the same

database, cross-schema queries are

possible without the use of database

links or foreign tables. This is useful if

application schemas need to share a

common set of non-application-specific

information. To secure application-

specific information, shared information

is often placed in its own separate

schema. One of the advantages

of using a multi-schema approach

for multitenancy with one or more

common schemas is that it reduces

duplicate information. Cross-schema

queries make aggregating information

from multiple applications easier for

business intelligence and decision

support purposes, effectively breaking

down the walls between information

silos.

Like the multi-database multitenancy

approach, a multi-schema multitenancy

approach makes it easier to customize

EDB WHITE PAPER / PAGE 10 W W W . E N T E R P R I S E D B . C O M

aggregate information from multiple

tenants, should that need exist. To

query the contents of each database

a separate connection to each of

those databases is required. While it is

possible to query the contents of one

Postgres database from within another

Postgres database by using database

links or foreign tables, these are

additional components that would need

to be set up and managed. Another

drawback is that from an operations

and maintenance perspective it may

not scale well for a large number of

tenants. However, with the proper

monitoring and management tools,

like EnterpriseDB Postgres Enterprise

Manager (PEM), this concern can be

addressed.

Despite the drawbacks, using separate

databases for each application is a

good multitenant database approach

to take for certain use cases. If security

and data isolation are the most

important factors in deciding which

multitenancy approach to take, this is

an ideal option to choose.

the tables and other database objects

to suit a particular application’s needs.

Also, like the multi-database approach,

using independent schemas for each

tenant also means that application

developers do not need to worry about

building special filtering and data

leakage prevention features into their

applications. In addition, if the database

components of a tenant application

need to be copied or moved, the

Postgres dump and restore facilities

provide the ability to do so on a per-

schema basis. EDB Postgres provides

a clone schema feature to support

this use case as well. A multi-schema

approach also provides the benefit of

shared administration of and shared

resource usage by tenants.

The multi-schema approach to

multitenancy does have some

weaknesses under certain use cases.

For instance, it does not offer as high a

level of data isolation and configuration

control as using separate databases.

Since all application schemas are in

the same database, information in the

system tables is not private to any

one application. Also, in a schema-

per-tenant model, if one tenant runs

some resource-intensive process it could

impact the other tenants. If there is a large

number of tenant schemas, each with

a large number of tables, there may be

additional challenges in managing system

performance, as Postgres vacuuming

operations would need to search across

and be performed against a higher

number of relations (i.e., tables). Finally,

from an application upgrade perspective,

as the number of tenant schemas

increases, additional table definitions and

other metadata may need to be updated,

which could affect the time required to roll

out application changes.

Although there are situations where

a schema-per-tenant approach is not

ideal, it is a good option for many use

cases as it provides a good mix of data

isolation, cross-container query capability,

application schema customizability,

shared administration, and shared

resource usage. This option is well-suited

for a moderate number of tenants with a

moderate set of tables in each application

schema. Serving the middle ground, this

option is often chosen as a starting point

when requirements and long-term needs

do not clearly point to using one of the

other options.

EDB WHITE PAPER / PAGE 11 W W W . E N T E R P R I S E D B . C O M

The third available multitenancy option

is to have all tenants share a single

schema in a single database within

a cluster. Using this approach, each

tenant typically shares a common set

of tables with a column in the tables

for identifying each tenant. Postgres

features such as row-level security and

security barrier views are then used to

restrict an application’s access to only a

specific application tenant at any given

time. Note that it would also be possible

for different tenant applications to use

different sets of tables within a single

schema. However, in most cases there

would be little value in doing so over

having each tenant use separate

schemas, as this would present

unnecessary object naming challenges.

A main benefit of a single-schema,

shared-tables approach is that only

a single set of tables needs to be

maintained, thus enabling a high

degree of shared administration and

shared resource usage. This simplifies

and often reduces the time required

to roll out table and other database

object definition updates that are

part of an application upgrade. Also,

since the data for all tenants is stored

in a single set of tables, querying

and aggregating data from multiple

tenants is much simpler. Finally, this

multitenancy approach also scales

very well for applications that may end

3. Single Schema in a Single Databaseup having many thousands of tenants,

especially if the core application uses

a relatively small number of tables

and each tenant stores a relatively

small amount of data.

Using a shared set of tables presents

some challenges that need to be

understood and potentially overcome

before deciding if it is the right option.

First of all, since the data for all tenants

is stored in a shared set of tables, if

data isolation is required then extra

steps need to be taken to ensure that

one tenant’s data is not accessible

to or leaked to another tenant,

potentially leveraging features such

as row-level security. Similarly, if the

application needs to be able to copy

or move a tenant to another server,

features would need to be built into

the application to support this. Next,

per-tenant customizations may be

difficult to implement and often result

in unconventional or non-standard

constructs, which may be difficult to

maintain in the long run. Also, since

each of the tenants shares a common

set of table definitions, tenant-specific

application updates may be impossible.

Finally, storing a large amount of data

from a large number of tenants in a

single table may impact performance

and require additional monitoring and

tuning. Similarly, if a single tenant

stores and updates a large amount of

EDB WHITE PAPER / PAGE 12 W W W . E N T E R P R I S E D B . C O M

data in a shared table it could impact

the performance of all the tenants: the

“noisy neighbor” effect.

Under the right use cases, the single-

schema, shared-tables multitenancy

option can be a good choice, despite

some challenges. This option is

especially well-suited for many SaaS

applications that may scale to many

thousands of tenants. For these

applications, shared administration

and resource usage are often the most

important factor. This option doesn’t

allow the underlying data model to be

customized without impacting other

tenant applications; however, the built-

in NoSQL capabilities of Postgres can

be used to work around this limitation if

needed.

Other Options

The three options discussed so far

are the Postgres multitenancy options

that most closely match the common

understanding of multitenancy—that

the application serves multiple tenants

using a single instance of the software

running on a single server. However,

there are other options that do not

exactly fit within this definition, but are

nevertheless worth considering as they

may meet other needs. For example,

for some environments it might be

appropriate to run different Postgres

clusters on the same server for different

applications. Deploying and running

Postgres via Docker containers is

another option that may be suitable

for many modern applications. A brief

description of both of these options

follows.

EDB WHITE PAPER / PAGE 13 W W W . E N T E R P R I S E D B . C O M

EDB WHITE PAPER / PAGE 14 W W W . E N T E R P R I S E D B . C O M

Multiple Clusters on the Same Server

As discussed previously, it is possible

to run multiple Postgres clusters on

a single server. Each instance could

then be used to support one or more

applications. Since multiple instances

of Postgres would be running, this

option does not conform to the

conventional definition of multitenancy.

However, it does provide many of the

same benefits as the single-instance,

multiple-database approach as well as

some additional ones. Figure 6 shows

multitenancy using multiple Postgres

clusters.

Figure 6

Using multiple clusters provides a

greater level of data isolation and

security than the other previously

discussed options. One of the major

benefits of having a dedicated

Postgres cluster for an application

is that point-in-time recovery (PITR)

operations can be performed without

impacting other applications. Also,

like the multi-database and multi-

schema approaches, a multi-cluster

approach provides the ability for

each application to easily tailor table

definitions and other metadata to suit

its specific needs.

Although one of the goals of

multitenancy is to promote shared

administration and shared resource

usage, for some use cases this may not

be as important, or even desired. Each

cluster runs its own set of processes

and has its own set of configuration

parameters and database users. As

such, in a multi-cluster environment

most of the administrative activities will

be cluster-specific. This can benefit

organizations who have a need for

isolated administration and isolated

resource usage.

With a proper set of tools, the

challenges of managing multiple

Postgres clusters can be reduced.

For example, EnterpriseDB has tools

like Postgres Enterprise Manager

(PEM), which provides a single

“pane of glass” interface that can

be used to monitor and manage

multiple Postgres clusters across an

organization’s enterprise. In addition,

EnterpriseDB’s tools for managing

backup and recovery, EDB Postgres

Backup and Recovery Tool (BART), and

high-availability configurations, EDB

Postgres Enterprise Failover Manager

(EFM), support multiple clusters.

Although each cluster would mostly be

administered separately, the clusters

would all be using the same Postgres

installation. Therefore, activities such

as applying patches or minor version

updates would apply to every cluster.

Depending on requirements, running

separate clusters on the same server

for different applications might be the

best choice. It is a good candidate for

consideration if data, administration,

and resource isolation is required

or acceptable. Prior to choosing

this option, be sure to understand

the server resources required for

each cluster to ensure that they are

sufficient.

Deployment via Docker Containers

More and more organizations are

beginning to use Docker containers

for deploying their applications and

databases. Containers offer increased

portability, simple and fast deployment,

enhanced productivity, and improved

security. They are a key technology in

EDB WHITE PAPER / PAGE 15 W W W . E N T E R P R I S E D B . C O M

EDB WHITE PAPER / PAGE 16 W W W . E N T E R P R I S E D B . C O M

today’s modern microservices-based

applications. Since containers have

a lightweight footprint and minimal

overhead, they make it possible to

deploy multiple containers running

Postgres on a single machine. It is

worth considering their use in support of

some multitenancy use cases. As part

of a multitenancy solution, a separate

Postgres container is often used for

each application.

Each running Postgres container

contains an installation of the Postgres

software, at least one Postgres cluster,

and the processes corresponding to

the cluster running in the container. In

production deployments, the cluster

data directory is normally mapped

to a storage volume attached to the

host. In addition, an orchestration

framework is usually used to run and

manage containers in production

environments. Kubernetes-based

orchestration frameworks are the most

common. Figure 7 shows a conceptual

example of using containers as part of

a multitenancy solution.

Figure 7

EDB WHITE PAPER / PAGE 17

As part of a multitenancy strategy,

the use of Postgres containers offers

the same benefits as a traditional

multi-cluster deployment, plus some

additional ones. For one, they allow

for an even higher degree of security,

connection control, and data and

process isolation, which is more along

the lines of running Postgres clusters

on separate machines. Running

Postgres in containers also makes

it easier to use different versions of

Postgres for different applications. Due

to the inherent nature of containers,

it is also much easier and faster to

spin up new database instances.

Not only does this make adding

new applications easier and faster,

but it also makes scaling up (and

down) database instances to support

changing usage requirements easier

and faster.

Organizations with a multitenancy

need that have embraced the use

of containers for their application

deployments may want to consider

the use of containers for their

databases as well. Since there are

some additional considerations when

running databases in containers,

most organizations would benefit

from working with vendors such as

EnterpriseDB, who have expertise

in deploying Postgres in containers

and related technologies such as

Kubernetes. EnterpriseDB not only

makes preconfigured containers

with different versions of Postgres

available, but also provides containers

that support Postgres high-availability

deployments, monitoring and

management, backup and recovery,

and load balancing. EnterpriseDB

containers have been designed to run

standalone or in Kubernetes-based

environments such as Google’s GKE

and Red Hat’s OpenShift. In addition,

the EnterpriseDB Professional Services

team can help an organization with

their container-based deployments of

Postgres.

W W W . E N T E R P R I S E D B . C O M

EDB® WHITE PAPER / PAGE 16 W W W . E N T E R P R I S E D B . C O M

A high-level overview of the Postgres

architecture helps provide some contextual

understanding of the multitenancy options

available with Postgres. There are three main

Postgres multitenancy options:

• Using multiple databases in a single

Postgres cluster (i.e., instance)

• Using multiple schemas in a single

Postgres database

• Using shared tables in a single

schema in a single Postgres database

Each option has its own strengths

and weaknesses. Other options worth

considering, are using multiple Postgres

clusters and deploying Postgres via Docker

containers.

SummaryWhen deciding among Postgres

multitenancy options, you should consider

system requirements and expected

usage. Depending on application needs,

you can also create a hybrid solution

combining elements from the different

options. EnterpriseDB, a company

providing Postgres-related support,

products, and services, has expertise in

deploying and maintaining Postgres and

can review an organization’s database

needs and help with deciding the right

database multitenancy strategy for an

environment.

EDB WHITE PAPER / PAGE 18 W W W . E N T E R P R I S E D B . C O M

EnterpriseDB | www.enterprisedb.com EnterpriseDB, EDB and EDB Postgres are trademarks of EnterpriseDB Corporation.

Other names may be trademarks of their respective owners. Copyright© 2019. All rights reserved. 20190903

DESIGN YOUR DATABASEARCHITECTUREWITH A TRUSTED PARTNER

EnterpriseDB (EDB), the Enterprise Postgres company, delivers an open source-based data management

platform based on PostgreSQL, optimized for greater scalability, security, and reliability. EDB Postgres

makes organizations smarter while reducing risk and complexity with enterprise-proven management tools,

security enhancements and Oracle compatibility. Over 4,000 customers worldwide deploy diverse workloads

including transaction processing, data warehousing, customer analytics and web-based applications, both

on-premises and in the cloud.

EDB is an innovator and major contributor to the Postgres community, serving 20% of the Fortune 500

and 15% of Forbes Global 2000 companies worldwide.

EDB is based in the Bedford, Massachusetts with offices around the globe.

About EnterpriseDB