0% found this document useful (0 votes)
15 views

Postgresql 14 A4 2

The document discusses different types of constraints that can be applied to PostgreSQL tables including check, not null, unique, primary key, and foreign key constraints. It provides syntax examples for applying each type of constraint and notes on how they work and how constraints can be combined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Postgresql 14 A4 2

The document discusses different types of constraints that can be applied to PostgreSQL tables including check, not null, unique, primary key, and foreign key constraints. It provides syntax examples for applying each type of constraint and notes on how they work and how constraints can be combined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Data Definition

Names can be assigned to table constraints in the same way as column constraints:

CREATE TABLE products (


product_no integer,
name text,
price numeric,
CHECK (price > 0),
discounted_price numeric,
CHECK (discounted_price > 0),
CONSTRAINT valid_discount CHECK (price > discounted_price)
);

It should be noted that a check constraint is satisfied if the check expression evaluates to true or the
null value. Since most expressions will evaluate to the null value if any operand is null, they will not
prevent null values in the constrained columns. To ensure that a column does not contain null values,
the not-null constraint described in the next section can be used.

Note
PostgreSQL does not support CHECK constraints that reference table data other than the new
or updated row being checked. While a CHECK constraint that violates this rule may appear
to work in simple tests, it cannot guarantee that the database will not reach a state in which
the constraint condition is false (due to subsequent changes of the other row(s) involved). This
would cause a database dump and restore to fail. The restore could fail even when the complete
database state is consistent with the constraint, due to rows not being loaded in an order that
will satisfy the constraint. If possible, use UNIQUE, EXCLUDE, or FOREIGN KEY constraints
to express cross-row and cross-table restrictions.

If what you desire is a one-time check against other rows at row insertion, rather than a con-
tinuously-maintained consistency guarantee, a custom trigger can be used to implement that.
(This approach avoids the dump/restore problem because pg_dump does not reinstall triggers
until after restoring data, so that the check will not be enforced during a dump/restore.)

Note
PostgreSQL assumes that CHECK constraints' conditions are immutable, that is, they will al-
ways give the same result for the same input row. This assumption is what justifies examin-
ing CHECK constraints only when rows are inserted or updated, and not at other times. (The
warning above about not referencing other table data is really a special case of this restriction.)

An example of a common way to break this assumption is to reference a user-defined function


in a CHECK expression, and then change the behavior of that function. PostgreSQL does not
disallow that, but it will not notice if there are rows in the table that now violate the CHECK
constraint. That would cause a subsequent database dump and restore to fail. The recommend-
ed way to handle such a change is to drop the constraint (using ALTER TABLE), adjust the
function definition, and re-add the constraint, thereby rechecking it against all table rows.

5.4.2. Not-Null Constraints


A not-null constraint simply specifies that a column must not assume the null value. A syntax example:

CREATE TABLE products (

63
Data Definition

product_no integer NOT NULL,


name text NOT NULL,
price numeric
);

A not-null constraint is always written as a column constraint. A not-null constraint is functionally


equivalent to creating a check constraint CHECK (column_name IS NOT NULL), but in Post-
greSQL creating an explicit not-null constraint is more efficient. The drawback is that you cannot give
explicit names to not-null constraints created this way.

Of course, a column can have more than one constraint. Just write the constraints one after another:

CREATE TABLE products (


product_no integer NOT NULL,
name text NOT NULL,
price numeric NOT NULL CHECK (price > 0)
);

The order doesn't matter. It does not necessarily determine in which order the constraints are checked.

The NOT NULL constraint has an inverse: the NULL constraint. This does not mean that the column
must be null, which would surely be useless. Instead, this simply selects the default behavior that the
column might be null. The NULL constraint is not present in the SQL standard and should not be used
in portable applications. (It was only added to PostgreSQL to be compatible with some other database
systems.) Some users, however, like it because it makes it easy to toggle the constraint in a script file.
For example, you could start with:

CREATE TABLE products (


product_no integer NULL,
name text NULL,
price numeric NULL
);

and then insert the NOT key word where desired.

Tip
In most database designs the majority of columns should be marked not null.

5.4.3. Unique Constraints


Unique constraints ensure that the data contained in a column, or a group of columns, is unique among
all the rows in the table. The syntax is:

CREATE TABLE products (


product_no integer UNIQUE,
name text,
price numeric
);

when written as a column constraint, and:

CREATE TABLE products (

64
Data Definition

product_no integer,
name text,
price numeric,
UNIQUE (product_no)
);

when written as a table constraint.

To define a unique constraint for a group of columns, write it as a table constraint with the column
names separated by commas:

CREATE TABLE example (


a integer,
b integer,
c integer,
UNIQUE (a, c)
);

This specifies that the combination of values in the indicated columns is unique across the whole table,
though any one of the columns need not be (and ordinarily isn't) unique.

You can assign your own name for a unique constraint, in the usual way:

CREATE TABLE products (


product_no integer CONSTRAINT must_be_different UNIQUE,
name text,
price numeric
);

Adding a unique constraint will automatically create a unique B-tree index on the column or group of
columns listed in the constraint. A uniqueness restriction covering only some rows cannot be written
as a unique constraint, but it is possible to enforce such a restriction by creating a unique partial index.

In general, a unique constraint is violated if there is more than one row in the table where the values of
all of the columns included in the constraint are equal. However, two null values are never considered
equal in this comparison. That means even in the presence of a unique constraint it is possible to
store duplicate rows that contain a null value in at least one of the constrained columns. This behavior
conforms to the SQL standard, but we have heard that other SQL databases might not follow this rule.
So be careful when developing applications that are intended to be portable.

5.4.4. Primary Keys


A primary key constraint indicates that a column, or group of columns, can be used as a unique iden-
tifier for rows in the table. This requires that the values be both unique and not null. So, the following
two table definitions accept the same data:

CREATE TABLE products (


product_no integer UNIQUE NOT NULL,
name text,
price numeric
);

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,

65
Data Definition

price numeric
);

Primary keys can span more than one column; the syntax is similar to unique constraints:

CREATE TABLE example (


a integer,
b integer,
c integer,
PRIMARY KEY (a, c)
);

Adding a primary key will automatically create a unique B-tree index on the column or group of
columns listed in the primary key, and will force the column(s) to be marked NOT NULL.

A table can have at most one primary key. (There can be any number of unique and not-null constraints,
which are functionally almost the same thing, but only one can be identified as the primary key.)
Relational database theory dictates that every table must have a primary key. This rule is not enforced
by PostgreSQL, but it is usually best to follow it.

Primary keys are useful both for documentation purposes and for client applications. For example, a
GUI application that allows modifying row values probably needs to know the primary key of a table
to be able to identify rows uniquely. There are also various ways in which the database system makes
use of a primary key if one has been declared; for example, the primary key defines the default target
column(s) for foreign keys referencing its table.

5.4.5. Foreign Keys


A foreign key constraint specifies that the values in a column (or a group of columns) must match the
values appearing in some row of another table. We say this maintains the referential integrity between
two related tables.

Say you have the product table that we have used several times already:

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

Let's also assume you have a table storing orders of those products. We want to ensure that the orders
table only contains orders of products that actually exist. So we define a foreign key constraint in the
orders table that references the products table:

CREATE TABLE orders (


order_id integer PRIMARY KEY,
product_no integer REFERENCES products (product_no),
quantity integer
);

Now it is impossible to create orders with non-NULL product_no entries that do not appear in
the products table.

We say that in this situation the orders table is the referencing table and the products table is the
referenced table. Similarly, there are referencing and referenced columns.

You can also shorten the above command to:

66
Data Definition

CREATE TABLE orders (


order_id integer PRIMARY KEY,
product_no integer REFERENCES products,
quantity integer
);

because in absence of a column list the primary key of the referenced table is used as the referenced
column(s).

You can assign your own name for a foreign key constraint, in the usual way.

A foreign key can also constrain and reference a group of columns. As usual, it then needs to be written
in table constraint form. Here is a contrived syntax example:

CREATE TABLE t1 (
a integer PRIMARY KEY,
b integer,
c integer,
FOREIGN KEY (b, c) REFERENCES other_table (c1, c2)
);

Of course, the number and type of the constrained columns need to match the number and type of
the referenced columns.

Sometimes it is useful for the “other table” of a foreign key constraint to be the same table; this is
called a self-referential foreign key. For example, if you want rows of a table to represent nodes of
a tree structure, you could write

CREATE TABLE tree (


node_id integer PRIMARY KEY,
parent_id integer REFERENCES tree,
name text,
...
);

A top-level node would have NULL parent_id, while non-NULL parent_id entries would be
constrained to reference valid rows of the table.

A table can have more than one foreign key constraint. This is used to implement many-to-many
relationships between tables. Say you have tables about products and orders, but now you want to
allow one order to contain possibly many products (which the structure above did not allow). You
could use this table structure:

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

CREATE TABLE orders (


order_id integer PRIMARY KEY,
shipping_address text,
...
);

67
Data Definition

CREATE TABLE order_items (


product_no integer REFERENCES products,
order_id integer REFERENCES orders,
quantity integer,
PRIMARY KEY (product_no, order_id)
);

Notice that the primary key overlaps with the foreign keys in the last table.

We know that the foreign keys disallow creation of orders that do not relate to any products. But what
if a product is removed after an order is created that references it? SQL allows you to handle that as
well. Intuitively, we have a few options:

• Disallow deleting a referenced product


• Delete the orders as well
• Something else?

To illustrate this, let's implement the following policy on the many-to-many relationship exam-
ple above: when someone wants to remove a product that is still referenced by an order (via or-
der_items), we disallow it. If someone removes an order, the order items are removed as well:

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

CREATE TABLE orders (


order_id integer PRIMARY KEY,
shipping_address text,
...
);

CREATE TABLE order_items (


product_no integer REFERENCES products ON DELETE RESTRICT,
order_id integer REFERENCES orders ON DELETE CASCADE,
quantity integer,
PRIMARY KEY (product_no, order_id)
);

Restricting and cascading deletes are the two most common options. RESTRICT prevents deletion of
a referenced row. NO ACTION means that if any referencing rows still exist when the constraint is
checked, an error is raised; this is the default behavior if you do not specify anything. (The essential
difference between these two choices is that NO ACTION allows the check to be deferred until later
in the transaction, whereas RESTRICT does not.) CASCADE specifies that when a referenced row is
deleted, row(s) referencing it should be automatically deleted as well. There are two other options:
SET NULL and SET DEFAULT. These cause the referencing column(s) in the referencing row(s) to
be set to nulls or their default values, respectively, when the referenced row is deleted. Note that these
do not excuse you from observing any constraints. For example, if an action specifies SET DEFAULT
but the default value would not satisfy the foreign key constraint, the operation will fail.

Analogous to ON DELETE there is also ON UPDATE which is invoked when a referenced column is
changed (updated). The possible actions are the same. In this case, CASCADE means that the updated
values of the referenced column(s) should be copied into the referencing row(s).

Normally, a referencing row need not satisfy the foreign key constraint if any of its referencing
columns are null. If MATCH FULL is added to the foreign key declaration, a referencing row escapes

68
Data Definition

satisfying the constraint only if all its referencing columns are null (so a mix of null and non-null
values is guaranteed to fail a MATCH FULL constraint). If you don't want referencing rows to be able
to avoid satisfying the foreign key constraint, declare the referencing column(s) as NOT NULL.

A foreign key must reference columns that either are a primary key or form a unique constraint. This
means that the referenced columns always have an index (the one underlying the primary key or unique
constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE
of a row from the referenced table or an UPDATE of a referenced column will require a scan of the
referencing table for rows matching the old value, it is often a good idea to index the referencing
columns too. Because this is not always needed, and there are many choices available on how to
index, declaration of a foreign key constraint does not automatically create an index on the referencing
columns.

More information about updating and deleting data is in Chapter 6. Also see the description of foreign
key constraint syntax in the reference documentation for CREATE TABLE.

5.4.6. Exclusion Constraints


Exclusion constraints ensure that if any two rows are compared on the specified columns or expressions
using the specified operators, at least one of these operator comparisons will return false or null. The
syntax is:

CREATE TABLE circles (


c circle,
EXCLUDE USING gist (c WITH &&)
);

See also CREATE TABLE ... CONSTRAINT ... EXCLUDE for details.

Adding an exclusion constraint will automatically create an index of the type specified in the constraint
declaration.

5.5. System Columns


Every table has several system columns that are implicitly defined by the system. Therefore, these
names cannot be used as names of user-defined columns. (Note that these restrictions are separate from
whether the name is a key word or not; quoting a name will not allow you to escape these restrictions.)
You do not really need to be concerned about these columns; just know they exist.

tableoid

The OID of the table containing this row. This column is particularly handy for queries that se-
lect from partitioned tables (see Section 5.11) or inheritance hierarchies (see Section 5.10), since
without it, it's difficult to tell which individual table a row came from. The tableoid can be
joined against the oid column of pg_class to obtain the table name.

xmin

The identity (transaction ID) of the inserting transaction for this row version. (A row version is an
individual state of a row; each update of a row creates a new row version for the same logical row.)

cmin

The command identifier (starting at zero) within the inserting transaction.

xmax

69
Data Definition

The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It
is possible for this column to be nonzero in a visible row version. That usually indicates that the
deleting transaction hasn't committed yet, or that an attempted deletion was rolled back.

cmax

The command identifier within the deleting transaction, or zero.

ctid

The physical location of the row version within its table. Note that although the ctid can be used
to locate the row version very quickly, a row's ctid will change if it is updated or moved by
VACUUM FULL. Therefore ctid is useless as a long-term row identifier. A primary key should
be used to identify logical rows.

Transaction identifiers are also 32-bit quantities. In a long-lived database it is possible for transaction
IDs to wrap around. This is not a fatal problem given appropriate maintenance procedures; see Chap-
ter 25 for details. It is unwise, however, to depend on the uniqueness of transaction IDs over the long
term (more than one billion transactions).

Command identifiers are also 32-bit quantities. This creates a hard limit of 232 (4 billion) SQL com-
mands within a single transaction. In practice this limit is not a problem — note that the limit is on
the number of SQL commands, not the number of rows processed. Also, only commands that actually
modify the database contents will consume a command identifier.

5.6. Modifying Tables


When you create a table and you realize that you made a mistake, or the requirements of the application
change, you can drop the table and create it again. But this is not a convenient option if the table is
already filled with data, or if the table is referenced by other database objects (for instance a foreign key
constraint). Therefore PostgreSQL provides a family of commands to make modifications to existing
tables. Note that this is conceptually distinct from altering the data contained in the table: here we are
interested in altering the definition, or structure, of the table.

You can:

• Add columns
• Remove columns
• Add constraints
• Remove constraints
• Change default values
• Change column data types
• Rename columns
• Rename tables

All these actions are performed using the ALTER TABLE command, whose reference page contains
details beyond those given here.

5.6.1. Adding a Column


To add a column, use a command like:

ALTER TABLE products ADD COLUMN description text;

The new column is initially filled with whatever default value is given (null if you don't specify a
DEFAULT clause).

70
Data Definition

Tip
From PostgreSQL 11, adding a column with a constant default value no longer means that
each row of the table needs to be updated when the ALTER TABLE statement is executed.
Instead, the default value will be returned the next time the row is accessed, and applied when
the table is rewritten, making the ALTER TABLE very fast even on large tables.

However, if the default value is volatile (e.g., clock_timestamp()) each row will need
to be updated with the value calculated at the time ALTER TABLE is executed. To avoid a
potentially lengthy update operation, particularly if you intend to fill the column with mostly
nondefault values anyway, it may be preferable to add the column with no default, insert the
correct values using UPDATE, and then add any desired default as described below.

You can also define constraints on the column at the same time, using the usual syntax:

ALTER TABLE products ADD COLUMN description text CHECK (description


<> '');

In fact all the options that can be applied to a column description in CREATE TABLE can be used here.
Keep in mind however that the default value must satisfy the given constraints, or the ADD will fail.
Alternatively, you can add constraints later (see below) after you've filled in the new column correctly.

5.6.2. Removing a Column


To remove a column, use a command like:

ALTER TABLE products DROP COLUMN description;

Whatever data was in the column disappears. Table constraints involving the column are dropped, too.
However, if the column is referenced by a foreign key constraint of another table, PostgreSQL will
not silently drop that constraint. You can authorize dropping everything that depends on the column
by adding CASCADE:

ALTER TABLE products DROP COLUMN description CASCADE;

See Section 5.14 for a description of the general mechanism behind this.

5.6.3. Adding a Constraint


To add a constraint, the table constraint syntax is used. For example:

ALTER TABLE products ADD CHECK (name <> '');


ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES
product_groups;

To add a not-null constraint, which cannot be written as a table constraint, use this syntax:

ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;

The constraint will be checked immediately, so the table data must satisfy the constraint before it can
be added.

71
Data Definition

5.6.4. Removing a Constraint


To remove a constraint you need to know its name. If you gave it a name then that's easy. Otherwise
the system assigned a generated name, which you need to find out. The psql command \d table-
name can be helpful here; other interfaces might also provide a way to inspect table details. Then
the command is:

ALTER TABLE products DROP CONSTRAINT some_name;

(If you are dealing with a generated constraint name like $2, don't forget that you'll need to dou-
ble-quote it to make it a valid identifier.)

As with dropping a column, you need to add CASCADE if you want to drop a constraint that something
else depends on. An example is that a foreign key constraint depends on a unique or primary key
constraint on the referenced column(s).

This works the same for all constraint types except not-null constraints. To drop a not null constraint
use:

ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL;

(Recall that not-null constraints do not have names.)

5.6.5. Changing a Column's Default Value


To set a new default for a column, use a command like:

ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;

Note that this doesn't affect any existing rows in the table, it just changes the default for future INSERT
commands.

To remove any default value, use:

ALTER TABLE products ALTER COLUMN price DROP DEFAULT;

This is effectively the same as setting the default to null. As a consequence, it is not an error to drop
a default where one hadn't been defined, because the default is implicitly the null value.

5.6.6. Changing a Column's Data Type


To convert a column to a different data type, use a command like:

ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);

This will succeed only if each existing entry in the column can be converted to the new type by an
implicit cast. If a more complex conversion is needed, you can add a USING clause that specifies how
to compute the new values from the old.

PostgreSQL will attempt to convert the column's default value (if any) to the new type, as well as
any constraints that involve the column. But these conversions might fail, or might produce surprising
results. It's often best to drop any constraints on the column before altering its type, and then add back
suitably modified constraints afterwards.

72
Data Definition

5.6.7. Renaming a Column


To rename a column:

ALTER TABLE products RENAME COLUMN product_no TO product_number;

5.6.8. Renaming a Table


To rename a table:

ALTER TABLE products RENAME TO items;

5.7. Privileges
When an object is created, it is assigned an owner. The owner is normally the role that executed the
creation statement. For most kinds of objects, the initial state is that only the owner (or a superuser)
can do anything with the object. To allow other roles to use it, privileges must be granted.

There are different kinds of privileges: SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REF-
ERENCES, TRIGGER, CREATE, CONNECT, TEMPORARY, EXECUTE, and USAGE. The privileges
applicable to a particular object vary depending on the object's type (table, function, etc). More detail
about the meanings of these privileges appears below. The following sections and chapters will also
show you how these privileges are used.

The right to modify or destroy an object is inherent in being the object's owner, and cannot be granted
or revoked in itself. (However, like all privileges, that right can be inherited by members of the owning
role; see Section 22.3.)

An object can be assigned to a new owner with an ALTER command of the appropriate kind for the
object, for example

ALTER TABLE table_name OWNER TO new_owner;

Superusers can always do this; ordinary roles can only do it if they are both the current owner of the
object (or a member of the owning role) and a member of the new owning role.

To assign privileges, the GRANT command is used. For example, if joe is an existing role, and
accounts is an existing table, the privilege to update the table can be granted with:

GRANT UPDATE ON accounts TO joe;

Writing ALL in place of a specific privilege grants all privileges that are relevant for the object type.

The special “role” name PUBLIC can be used to grant a privilege to every role on the system. Also,
“group” roles can be set up to help manage privileges when there are many users of a database —
for details see Chapter 22.

To revoke a previously-granted privilege, use the fittingly named REVOKE command:

REVOKE ALL ON accounts FROM PUBLIC;

Ordinarily, only the object's owner (or a superuser) can grant or revoke privileges on an object. How-
ever, it is possible to grant a privilege “with grant option”, which gives the recipient the right to grant
it in turn to others. If the grant option is subsequently revoked then all who received the privilege from

73
Data Definition

that recipient (directly or through a chain of grants) will lose the privilege. For details see the GRANT
and REVOKE reference pages.

An object's owner can choose to revoke their own ordinary privileges, for example to make a table
read-only for themselves as well as others. But owners are always treated as holding all grant options,
so they can always re-grant their own privileges.

The available privileges are:

SELECT

Allows SELECT from any column, or specific column(s), of a table, view, materialized view, or
other table-like object. Also allows use of COPY TO. This privilege is also needed to reference
existing column values in UPDATE or DELETE. For sequences, this privilege also allows use of
the currval function. For large objects, this privilege allows the object to be read.

INSERT

Allows INSERT of a new row into a table, view, etc. Can be granted on specific column(s), in
which case only those columns may be assigned to in the INSERT command (other columns will
therefore receive default values). Also allows use of COPY FROM.

UPDATE

Allows UPDATE of any column, or specific column(s), of a table, view, etc. (In practice, any
nontrivial UPDATE command will require SELECT privilege as well, since it must reference
table columns to determine which rows to update, and/or to compute new values for columns.)
SELECT ... FOR UPDATE and SELECT ... FOR SHARE also require this privilege on
at least one column, in addition to the SELECT privilege. For sequences, this privilege allows
use of the nextval and setval functions. For large objects, this privilege allows writing or
truncating the object.

DELETE

Allows DELETE of a row from a table, view, etc. (In practice, any nontrivial DELETE command
will require SELECT privilege as well, since it must reference table columns to determine which
rows to delete.)

TRUNCATE

Allows TRUNCATE on a table.

REFERENCES

Allows creation of a foreign key constraint referencing a table, or specific column(s) of a table.

TRIGGER

Allows creation of a trigger on a table, view, etc.

CREATE

For databases, allows new schemas and publications to be created within the database, and allows
trusted extensions to be installed within the database.

For schemas, allows new objects to be created within the schema. To rename an existing object,
you must own the object and have this privilege for the containing schema.

For tablespaces, allows tables, indexes, and temporary files to be created within the tablespace,
and allows databases to be created that have the tablespace as their default tablespace.

Note that revoking this privilege will not alter the existence or location of existing objects.

74
Data Definition

CONNECT

Allows the grantee to connect to the database. This privilege is checked at connection startup (in
addition to checking any restrictions imposed by pg_hba.conf).

TEMPORARY

Allows temporary tables to be created while using the database.

EXECUTE

Allows calling a function or procedure, including use of any operators that are implemented on
top of the function. This is the only type of privilege that is applicable to functions and procedures.

USAGE

For procedural languages, allows use of the language for the creation of functions in that language.
This is the only type of privilege that is applicable to procedural languages.

For schemas, allows access to objects contained in the schema (assuming that the objects' own
privilege requirements are also met). Essentially this allows the grantee to “look up” objects within
the schema. Without this permission, it is still possible to see the object names, e.g., by querying
system catalogs. Also, after revoking this permission, existing sessions might have statements
that have previously performed this lookup, so this is not a completely secure way to prevent
object access.

For sequences, allows use of the currval and nextval functions.

For types and domains, allows use of the type or domain in the creation of tables, functions, and
other schema objects. (Note that this privilege does not control all “usage” of the type, such as
values of the type appearing in queries. It only prevents objects from being created that depend on
the type. The main purpose of this privilege is controlling which users can create dependencies
on a type, which could prevent the owner from changing the type later.)

For foreign-data wrappers, allows creation of new servers using the foreign-data wrapper.

For foreign servers, allows creation of foreign tables using the server. Grantees may also create,
alter, or drop their own user mappings associated with that server.

The privileges required by other commands are listed on the reference page of the respective command.

PostgreSQL grants privileges on some types of objects to PUBLIC by default when the objects are
created. No privileges are granted to PUBLIC by default on tables, table columns, sequences, foreign
data wrappers, foreign servers, large objects, schemas, or tablespaces. For other types of objects, the
default privileges granted to PUBLIC are as follows: CONNECT and TEMPORARY (create temporary
tables) privileges for databases; EXECUTE privilege for functions and procedures; and USAGE priv-
ilege for languages and data types (including domains). The object owner can, of course, REVOKE
both default and expressly granted privileges. (For maximum security, issue the REVOKE in the same
transaction that creates the object; then there is no window in which another user can use the object.)
Also, these default privilege settings can be overridden using the ALTER DEFAULT PRIVILEGES
command.

Table 5.1 shows the one-letter abbreviations that are used for these privilege types in ACL (Access
Control List) values. You will see these letters in the output of the psql commands listed below, or
when looking at ACL columns of system catalogs.

Table 5.1. ACL Privilege Abbreviations


Privilege Abbreviation Applicable Object Types
SELECT r (“read”) LARGE OBJECT, SEQUENCE, TABLE (and ta-
ble-like objects), table column

75
Data Definition

Privilege Abbreviation Applicable Object Types


INSERT a (“append”) TABLE, table column
UPDATE w (“write”) LARGE OBJECT, SEQUENCE, TABLE, table
column
DELETE d TABLE
TRUNCATE D TABLE
REFERENCES x TABLE, table column
TRIGGER t TABLE
CREATE C DATABASE, SCHEMA, TABLESPACE
CONNECT c DATABASE
TEMPORARY T DATABASE
EXECUTE X FUNCTION, PROCEDURE
USAGE U DOMAIN, FOREIGN DATA WRAPPER,
FOREIGN SERVER, LANGUAGE, SCHEMA,
SEQUENCE, TYPE

Table 5.2 summarizes the privileges available for each type of SQL object, using the abbreviations
shown above. It also shows the psql command that can be used to examine privilege settings for each
object type.

Table 5.2. Summary of Access Privileges


Object Type All Privileges Default PUBLIC psql Command
Privileges
DATABASE CTc Tc \l
DOMAIN U U \dD+
FUNCTION or PROCEDURE X X \df+
FOREIGN DATA WRAPPER U none \dew+
FOREIGN SERVER U none \des+
LANGUAGE U U \dL+
LARGE OBJECT rw none
SCHEMA UC none \dn+
SEQUENCE rwU none \dp
TABLE (and table-like objects) arwdDxt none \dp
Table column arwx none \dp
TABLESPACE C none \db+
TYPE U U \dT+

The privileges that have been granted for a particular object are displayed as a list of aclitem
entries, where each aclitem describes the permissions of one grantee that have been granted by
a particular grantor. For example, calvin=r*w/hobbes specifies that the role calvin has the
privilege SELECT (r) with grant option (*) as well as the non-grantable privilege UPDATE (w), both
granted by the role hobbes. If calvin also has some privileges on the same object granted by a
different grantor, those would appear as a separate aclitem entry. An empty grantee field in an
aclitem stands for PUBLIC.

As an example, suppose that user miriam creates table mytable and does:

76
Data Definition

GRANT SELECT ON mytable TO PUBLIC;


GRANT SELECT, UPDATE, INSERT ON mytable TO admin;
GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw;

Then psql's \dp command would show:

=> \dp mytable


Access privileges
Schema | Name | Type | Access privileges | Column
privileges | Policies
--------+---------+-------+-----------------------
+-----------------------+----------
public | mytable | table | miriam=arwdDxt/miriam+| col1:
+|
| | | =r/miriam +| miriam_rw=rw/
miriam |
| | | admin=arw/miriam |
|
(1 row)

If the “Access privileges” column is empty for a given object, it means the object has default privileges
(that is, its privileges entry in the relevant system catalog is null). Default privileges always include all
privileges for the owner, and can include some privileges for PUBLIC depending on the object type,
as explained above. The first GRANT or REVOKE on an object will instantiate the default privileges
(producing, for example, miriam=arwdDxt/miriam) and then modify them per the specified re-
quest. Similarly, entries are shown in “Column privileges” only for columns with nondefault privi-
leges. (Note: for this purpose, “default privileges” always means the built-in default privileges for the
object's type. An object whose privileges have been affected by an ALTER DEFAULT PRIVILEGES
command will always be shown with an explicit privilege entry that includes the effects of the ALTER.)

Notice that the owner's implicit grant options are not marked in the access privileges display. A * will
appear only when grant options have been explicitly granted to someone.

5.8. Row Security Policies


In addition to the SQL-standard privilege system available through GRANT, tables can have row
security policies that restrict, on a per-user basis, which rows can be returned by normal queries or
inserted, updated, or deleted by data modification commands. This feature is also known as Row-Level
Security. By default, tables do not have any policies, so that if a user has access privileges to a table
according to the SQL privilege system, all rows within it are equally available for querying or updating.

When row security is enabled on a table (with ALTER TABLE ... ENABLE ROW LEVEL SECURI-
TY), all normal access to the table for selecting rows or modifying rows must be allowed by a row
security policy. (However, the table's owner is typically not subject to row security policies.) If no
policy exists for the table, a default-deny policy is used, meaning that no rows are visible or can be
modified. Operations that apply to the whole table, such as TRUNCATE and REFERENCES, are not
subject to row security.

Row security policies can be specific to commands, or to roles, or to both. A policy can be specified
to apply to ALL commands, or to SELECT, INSERT, UPDATE, or DELETE. Multiple roles can be
assigned to a given policy, and normal role membership and inheritance rules apply.

To specify which rows are visible or modifiable according to a policy, an expression is required that
returns a Boolean result. This expression will be evaluated for each row prior to any conditions or
functions coming from the user's query. (The only exceptions to this rule are leakproof functions,
which are guaranteed to not leak information; the optimizer may choose to apply such functions ahead
of the row-security check.) Rows for which the expression does not return true will not be processed.
Separate expressions may be specified to provide independent control over the rows which are visible

77
Data Definition

and the rows which are allowed to be modified. Policy expressions are run as part of the query and
with the privileges of the user running the query, although security-definer functions can be used to
access data not available to the calling user.

Superusers and roles with the BYPASSRLS attribute always bypass the row security system when
accessing a table. Table owners normally bypass row security as well, though a table owner can choose
to be subject to row security with ALTER TABLE ... FORCE ROW LEVEL SECURITY.

Enabling and disabling row security, as well as adding policies to a table, is always the privilege of
the table owner only.

Policies are created using the CREATE POLICY command, altered using the ALTER POLICY com-
mand, and dropped using the DROP POLICY command. To enable and disable row security for a
given table, use the ALTER TABLE command.

Each policy has a name and multiple policies can be defined for a table. As policies are table-specific,
each policy for a table must have a unique name. Different tables may have policies with the same
name.

When multiple policies apply to a given query, they are combined using either OR (for permissive
policies, which are the default) or using AND (for restrictive policies). This is similar to the rule that a
given role has the privileges of all roles that they are a member of. Permissive vs. restrictive policies
are discussed further below.

As a simple example, here is how to create a policy on the account relation to allow only members
of the managers role to access rows, and only rows of their accounts:

CREATE TABLE accounts (manager text, company text, contact_email


text);

ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;

CREATE POLICY account_managers ON accounts TO managers


USING (manager = current_user);

The policy above implicitly provides a WITH CHECK clause identical to its USING clause, so that
the constraint applies both to rows selected by a command (so a manager cannot SELECT, UPDATE,
or DELETE existing rows belonging to a different manager) and to rows modified by a command (so
rows belonging to a different manager cannot be created via INSERT or UPDATE).

If no role is specified, or the special user name PUBLIC is used, then the policy applies to all users
on the system. To allow all users to access only their own row in a users table, a simple policy
can be used:

CREATE POLICY user_policy ON users


USING (user_name = current_user);

This works similarly to the previous example.

To use a different policy for rows that are being added to the table compared to those rows that are
visible, multiple policies can be combined. This pair of policies would allow all users to view all rows
in the users table, but only modify their own:

CREATE POLICY user_sel_policy ON users


FOR SELECT
USING (true);
CREATE POLICY user_mod_policy ON users

78
Data Definition

USING (user_name = current_user);

In a SELECT command, these two policies are combined using OR, with the net effect being that all
rows can be selected. In other command types, only the second policy applies, so that the effects are
the same as before.

Row security can also be disabled with the ALTER TABLE command. Disabling row security does
not remove any policies that are defined on the table; they are simply ignored. Then all rows in the
table are visible and modifiable, subject to the standard SQL privileges system.

Below is a larger example of how this feature can be used in production environments. The table
passwd emulates a Unix password file:

-- Simple passwd-file based example


CREATE TABLE passwd (
user_name text UNIQUE NOT NULL,
pwhash text,
uid int PRIMARY KEY,
gid int NOT NULL,
real_name text NOT NULL,
home_phone text,
extra_info text,
home_dir text NOT NULL,
shell text NOT NULL
);

CREATE ROLE admin; -- Administrator


CREATE ROLE bob; -- Normal user
CREATE ROLE alice; -- Normal user

-- Populate the table


INSERT INTO passwd VALUES
('admin','xxx',0,0,'Admin','111-222-3333',null,'/root','/bin/
dash');
INSERT INTO passwd VALUES
('bob','xxx',1,1,'Bob','123-456-7890',null,'/home/bob','/bin/
zsh');
INSERT INTO passwd VALUES
('alice','xxx',2,1,'Alice','098-765-4321',null,'/home/alice','/
bin/zsh');

-- Be sure to enable row-level security on the table


ALTER TABLE passwd ENABLE ROW LEVEL SECURITY;

-- Create policies
-- Administrator can see all rows and add any rows
CREATE POLICY admin_all ON passwd TO admin USING (true) WITH CHECK
(true);
-- Normal users can view all rows
CREATE POLICY all_view ON passwd FOR SELECT USING (true);
-- Normal users can update their own records, but
-- limit which shells a normal user is allowed to set
CREATE POLICY user_mod ON passwd FOR UPDATE
USING (current_user = user_name)
WITH CHECK (
current_user = user_name AND
shell IN ('/bin/bash','/bin/sh','/bin/dash','/bin/zsh','/bin/
tcsh')

79
Data Definition

);

-- Allow admin all normal rights


GRANT SELECT, INSERT, UPDATE, DELETE ON passwd TO admin;
-- Users only get select access on public columns
GRANT SELECT
(user_name, uid, gid, real_name, home_phone, extra_info,
home_dir, shell)
ON passwd TO public;
-- Allow users to update certain columns
GRANT UPDATE
(pwhash, real_name, home_phone, extra_info, shell)
ON passwd TO public;

As with any security settings, it's important to test and ensure that the system is behaving as expected.
Using the example above, this demonstrates that the permission system is working properly.

-- admin can view all rows and fields


postgres=> set role admin;
SET
postgres=> table passwd;
user_name | pwhash | uid | gid | real_name | home_phone |
extra_info | home_dir | shell
-----------+--------+-----+-----+-----------+--------------
+------------+-------------+-----------
admin | xxx | 0 | 0 | Admin | 111-222-3333 |
| /root | /bin/dash
bob | xxx | 1 | 1 | Bob | 123-456-7890 |
| /home/bob | /bin/zsh
alice | xxx | 2 | 1 | Alice | 098-765-4321 |
| /home/alice | /bin/zsh
(3 rows)

-- Test what Alice is able to do


postgres=> set role alice;
SET
postgres=> table passwd;
ERROR: permission denied for table passwd
postgres=> select
user_name,real_name,home_phone,extra_info,home_dir,shell from
passwd;
user_name | real_name | home_phone | extra_info | home_dir |
shell
-----------+-----------+--------------+------------+-------------
+-----------
admin | Admin | 111-222-3333 | | /root
| /bin/dash
bob | Bob | 123-456-7890 | | /home/bob
| /bin/zsh
alice | Alice | 098-765-4321 | | /home/alice
| /bin/zsh
(3 rows)

postgres=> update passwd set user_name = 'joe';


ERROR: permission denied for table passwd
-- Alice is allowed to change her own real_name, but no others
postgres=> update passwd set real_name = 'Alice Doe';

80
Data Definition

UPDATE 1
postgres=> update passwd set real_name = 'John Doe' where user_name
= 'admin';
UPDATE 0
postgres=> update passwd set shell = '/bin/xx';
ERROR: new row violates WITH CHECK OPTION for "passwd"
postgres=> delete from passwd;
ERROR: permission denied for table passwd
postgres=> insert into passwd (user_name) values ('xxx');
ERROR: permission denied for table passwd
-- Alice can change her own password; RLS silently prevents
updating other rows
postgres=> update passwd set pwhash = 'abc';
UPDATE 1

All of the policies constructed thus far have been permissive policies, meaning that when multiple
policies are applied they are combined using the “OR” Boolean operator. While permissive policies
can be constructed to only allow access to rows in the intended cases, it can be simpler to combine
permissive policies with restrictive policies (which the records must pass and which are combined
using the “AND” Boolean operator). Building on the example above, we add a restrictive policy to
require the administrator to be connected over a local Unix socket to access the records of the passwd
table:

CREATE POLICY admin_local_only ON passwd AS RESTRICTIVE TO admin


USING (pg_catalog.inet_client_addr() IS NULL);

We can then see that an administrator connecting over a network will not see any records, due to the
restrictive policy:

=> SELECT current_user;


current_user
--------------
admin
(1 row)

=> select inet_client_addr();


inet_client_addr
------------------
127.0.0.1
(1 row)

=> TABLE passwd;


user_name | pwhash | uid | gid | real_name | home_phone |
extra_info | home_dir | shell
-----------+--------+-----+-----+-----------+------------
+------------+----------+-------
(0 rows)

=> UPDATE passwd set pwhash = NULL;


UPDATE 0

Referential integrity checks, such as unique or primary key constraints and foreign key references,
always bypass row security to ensure that data integrity is maintained. Care must be taken when de-
veloping schemas and row level policies to avoid “covert channel” leaks of information through such
referential integrity checks.

In some contexts it is important to be sure that row security is not being applied. For example, when
taking a backup, it could be disastrous if row security silently caused some rows to be omitted from

81
Data Definition

the backup. In such a situation, you can set the row_security configuration parameter to off. This
does not in itself bypass row security; what it does is throw an error if any query's results would get
filtered by a policy. The reason for the error can then be investigated and fixed.

In the examples above, the policy expressions consider only the current values in the row to be ac-
cessed or updated. This is the simplest and best-performing case; when possible, it's best to design row
security applications to work this way. If it is necessary to consult other rows or other tables to make
a policy decision, that can be accomplished using sub-SELECTs, or functions that contain SELECTs,
in the policy expressions. Be aware however that such accesses can create race conditions that could
allow information leakage if care is not taken. As an example, consider the following table design:

-- definition of privilege groups


CREATE TABLE groups (group_id int PRIMARY KEY,
group_name text NOT NULL);

INSERT INTO groups VALUES


(1, 'low'),
(2, 'medium'),
(5, 'high');

GRANT ALL ON groups TO alice; -- alice is the administrator


GRANT SELECT ON groups TO public;

-- definition of users' privilege levels


CREATE TABLE users (user_name text PRIMARY KEY,
group_id int NOT NULL REFERENCES groups);

INSERT INTO users VALUES


('alice', 5),
('bob', 2),
('mallory', 2);

GRANT ALL ON users TO alice;


GRANT SELECT ON users TO public;

-- table holding the information to be protected


CREATE TABLE information (info text,
group_id int NOT NULL REFERENCES groups);

INSERT INTO information VALUES


('barely secret', 1),
('slightly secret', 2),
('very secret', 5);

ALTER TABLE information ENABLE ROW LEVEL SECURITY;

-- a row should be visible to/updatable by users whose security


group_id is
-- greater than or equal to the row's group_id
CREATE POLICY fp_s ON information FOR SELECT
USING (group_id <= (SELECT group_id FROM users WHERE user_name =
current_user));
CREATE POLICY fp_u ON information FOR UPDATE
USING (group_id <= (SELECT group_id FROM users WHERE user_name =
current_user));

-- we rely only on RLS to protect the information table

82
Data Definition

GRANT ALL ON information TO public;

Now suppose that alice wishes to change the “slightly secret” information, but decides that mal-
lory should not be trusted with the new content of that row, so she does:

BEGIN;
UPDATE users SET group_id = 1 WHERE user_name = 'mallory';
UPDATE information SET info = 'secret from mallory' WHERE group_id
= 2;
COMMIT;

That looks safe; there is no window wherein mallory should be able to see the “secret from mallory”
string. However, there is a race condition here. If mallory is concurrently doing, say,

SELECT * FROM information WHERE group_id = 2 FOR UPDATE;

and her transaction is in READ COMMITTED mode, it is possible for her to see “secret from mallory”.
That happens if her transaction reaches the information row just after alice's does. It blocks
waiting for alice's transaction to commit, then fetches the updated row contents thanks to the FOR
UPDATE clause. However, it does not fetch an updated row for the implicit SELECT from users,
because that sub-SELECT did not have FOR UPDATE; instead the users row is read with the snap-
shot taken at the start of the query. Therefore, the policy expression tests the old value of mallory's
privilege level and allows her to see the updated row.

There are several ways around this problem. One simple answer is to use SELECT ... FOR
SHARE in sub-SELECTs in row security policies. However, that requires granting UPDATE privilege
on the referenced table (here users) to the affected users, which might be undesirable. (But another
row security policy could be applied to prevent them from actually exercising that privilege; or the
sub-SELECT could be embedded into a security definer function.) Also, heavy concurrent use of row
share locks on the referenced table could pose a performance problem, especially if updates of it are
frequent. Another solution, practical if updates of the referenced table are infrequent, is to take an
ACCESS EXCLUSIVE lock on the referenced table when updating it, so that no concurrent transac-
tions could be examining old row values. Or one could just wait for all concurrent transactions to end
after committing an update of the referenced table and before making changes that rely on the new
security situation.

For additional details see CREATE POLICY and ALTER TABLE.

5.9. Schemas
A PostgreSQL database cluster contains one or more named databases. Roles and a few other object
types are shared across the entire cluster. A client connection to the server can only access data in a
single database, the one specified in the connection request.

Note
Users of a cluster do not necessarily have the privilege to access every database in the cluster.
Sharing of role names means that there cannot be different roles named, say, joe in two
databases in the same cluster; but the system can be configured to allow joe access to only
some of the databases.

A database contains one or more named schemas, which in turn contain tables. Schemas also contain
other kinds of named objects, including data types, functions, and operators. The same object name
can be used in different schemas without conflict; for example, both schema1 and myschema can

83
Data Definition

contain tables named mytable. Unlike databases, schemas are not rigidly separated: a user can access
objects in any of the schemas in the database they are connected to, if they have privileges to do so.

There are several reasons why one might want to use schemas:

• To allow many users to use one database without interfering with each other.

• To organize database objects into logical groups to make them more manageable.

• Third-party applications can be put into separate schemas so they do not collide with the names
of other objects.

Schemas are analogous to directories at the operating system level, except that schemas cannot be
nested.

5.9.1. Creating a Schema


To create a schema, use the CREATE SCHEMA command. Give the schema a name of your choice.
For example:

CREATE SCHEMA myschema;

To create or access objects in a schema, write a qualified name consisting of the schema name and
table name separated by a dot:

schema.table

This works anywhere a table name is expected, including the table modification commands and the
data access commands discussed in the following chapters. (For brevity we will speak of tables only,
but the same ideas apply to other kinds of named objects, such as types and functions.)

Actually, the even more general syntax

database.schema.table

can be used too, but at present this is just for pro forma compliance with the SQL standard. If you
write a database name, it must be the same as the database you are connected to.

So to create a table in the new schema, use:

CREATE TABLE myschema.mytable (


...
);

To drop a schema if it's empty (all objects in it have been dropped), use:

DROP SCHEMA myschema;

To drop a schema including all contained objects, use:

DROP SCHEMA myschema CASCADE;

See Section 5.14 for a description of the general mechanism behind this.

84
Data Definition

Often you will want to create a schema owned by someone else (since this is one of the ways to restrict
the activities of your users to well-defined namespaces). The syntax for that is:

CREATE SCHEMA schema_name AUTHORIZATION user_name;

You can even omit the schema name, in which case the schema name will be the same as the user
name. See Section 5.9.6 for how this can be useful.

Schema names beginning with pg_ are reserved for system purposes and cannot be created by users.

5.9.2. The Public Schema


In the previous sections we created tables without specifying any schema names. By default such
tables (and other objects) are automatically put into a schema named “public”. Every new database
contains such a schema. Thus, the following are equivalent:

CREATE TABLE products ( ... );

and:

CREATE TABLE public.products ( ... );

5.9.3. The Schema Search Path


Qualified names are tedious to write, and it's often best not to wire a particular schema name into
applications anyway. Therefore tables are often referred to by unqualified names, which consist of
just the table name. The system determines which table is meant by following a search path, which is
a list of schemas to look in. The first matching table in the search path is taken to be the one wanted.
If there is no match in the search path, an error is reported, even if matching table names exist in other
schemas in the database.

The ability to create like-named objects in different schemas complicates writing a query that refer-
ences precisely the same objects every time. It also opens up the potential for users to change the be-
havior of other users' queries, maliciously or accidentally. Due to the prevalence of unqualified names
in queries and their use in PostgreSQL internals, adding a schema to search_path effectively trusts
all users having CREATE privilege on that schema. When you run an ordinary query, a malicious user
able to create objects in a schema of your search path can take control and execute arbitrary SQL
functions as though you executed them.

The first schema named in the search path is called the current schema. Aside from being the first
schema searched, it is also the schema in which new tables will be created if the CREATE TABLE
command does not specify a schema name.

To show the current search path, use the following command:

SHOW search_path;

In the default setup this returns:

search_path
--------------
"$user", public

85
Data Definition

The first element specifies that a schema with the same name as the current user is to be searched.
If no such schema exists, the entry is ignored. The second element refers to the public schema that
we have seen already.

The first schema in the search path that exists is the default location for creating new objects. That
is the reason that by default objects are created in the public schema. When objects are referenced
in any other context without schema qualification (table modification, data modification, or query
commands) the search path is traversed until a matching object is found. Therefore, in the default
configuration, any unqualified access again can only refer to the public schema.

To put our new schema in the path, we use:

SET search_path TO myschema,public;

(We omit the $user here because we have no immediate need for it.) And then we can access the
table without schema qualification:

DROP TABLE mytable;

Also, since myschema is the first element in the path, new objects would by default be created in it.

We could also have written:

SET search_path TO myschema;

Then we no longer have access to the public schema without explicit qualification. There is nothing
special about the public schema except that it exists by default. It can be dropped, too.

See also Section 9.26 for other ways to manipulate the schema search path.

The search path works in the same way for data type names, function names, and operator names as it
does for table names. Data type and function names can be qualified in exactly the same way as table
names. If you need to write a qualified operator name in an expression, there is a special provision:
you must write

OPERATOR(schema.operator)

This is needed to avoid syntactic ambiguity. An example is:

SELECT 3 OPERATOR(pg_catalog.+) 4;

In practice one usually relies on the search path for operators, so as not to have to write anything so
ugly as that.

5.9.4. Schemas and Privileges


By default, users cannot access any objects in schemas they do not own. To allow that, the owner of
the schema must grant the USAGE privilege on the schema. To allow users to make use of the objects
in the schema, additional privileges might need to be granted, as appropriate for the object.

A user can also be allowed to create objects in someone else's schema. To allow that, the CREATE
privilege on the schema needs to be granted. Note that by default, everyone has CREATE and USAGE
privileges on the schema public. This allows all users that are able to connect to a given database
to create objects in its public schema. Some usage patterns call for revoking that privilege:

86
Data Definition

REVOKE CREATE ON SCHEMA public FROM PUBLIC;

(The first “public” is the schema, the second “public” means “every user”. In the first sense it is an
identifier, in the second sense it is a key word, hence the different capitalization; recall the guidelines
from Section 4.1.1.)

5.9.5. The System Catalog Schema


In addition to public and user-created schemas, each database contains a pg_catalog schema,
which contains the system tables and all the built-in data types, functions, and operators. pg_cat-
alog is always effectively part of the search path. If it is not named explicitly in the path then it is
implicitly searched before searching the path's schemas. This ensures that built-in names will always
be findable. However, you can explicitly place pg_catalog at the end of your search path if you
prefer to have user-defined names override built-in names.

Since system table names begin with pg_, it is best to avoid such names to ensure that you won't suffer
a conflict if some future version defines a system table named the same as your table. (With the default
search path, an unqualified reference to your table name would then be resolved as the system table
instead.) System tables will continue to follow the convention of having names beginning with pg_,
so that they will not conflict with unqualified user-table names so long as users avoid the pg_ prefix.

5.9.6. Usage Patterns


Schemas can be used to organize your data in many ways. A secure schema usage pattern prevents un-
trusted users from changing the behavior of other users' queries. When a database does not use a secure
schema usage pattern, users wishing to securely query that database would take protective action at the
beginning of each session. Specifically, they would begin each session by setting search_path to
the empty string or otherwise removing non-superuser-writable schemas from search_path. There
are a few usage patterns easily supported by the default configuration:

• Constrain ordinary users to user-private schemas. To implement this, issue REVOKE CREATE ON
SCHEMA public FROM PUBLIC, and create a schema for each user with the same name as
that user. Recall that the default search path starts with $user, which resolves to the user name.
Therefore, if each user has a separate schema, they access their own schemas by default. After
adopting this pattern in a database where untrusted users had already logged in, consider auditing
the public schema for objects named like objects in schema pg_catalog. This pattern is a secure
schema usage pattern unless an untrusted user is the database owner or holds the CREATEROLE
privilege, in which case no secure schema usage pattern exists.

• Remove the public schema from the default search path, by modifying postgresql.conf or
by issuing ALTER ROLE ALL SET search_path = "$user". Everyone retains the
ability to create objects in the public schema, but only qualified names will choose those objects.
While qualified table references are fine, calls to functions in the public schema will be unsafe or
unreliable. If you create functions or extensions in the public schema, use the first pattern instead.
Otherwise, like the first pattern, this is secure unless an untrusted user is the database owner or
holds the CREATEROLE privilege.

• Keep the default. All users access the public schema implicitly. This simulates the situation where
schemas are not available at all, giving a smooth transition from the non-schema-aware world.
However, this is never a secure pattern. It is acceptable only when the database has a single user
or a few mutually-trusting users.

For any pattern, to install shared applications (tables to be used by everyone, additional functions pro-
vided by third parties, etc.), put them into separate schemas. Remember to grant appropriate privileges
to allow the other users to access them. Users can then refer to these additional objects by qualifying
the names with a schema name, or they can put the additional schemas into their search path, as they
choose.

87
Data Definition

5.9.7. Portability
In the SQL standard, the notion of objects in the same schema being owned by different users does
not exist. Moreover, some implementations do not allow you to create schemas that have a different
name than their owner. In fact, the concepts of schema and user are nearly equivalent in a database
system that implements only the basic schema support specified in the standard. Therefore, many users
consider qualified names to really consist of user_name.table_name. This is how PostgreSQL
will effectively behave if you create a per-user schema for every user.

Also, there is no concept of a public schema in the SQL standard. For maximum conformance to
the standard, you should not use the public schema.

Of course, some SQL database systems might not implement schemas at all, or provide namespace
support by allowing (possibly limited) cross-database access. If you need to work with those systems,
then maximum portability would be achieved by not using schemas at all.

5.10. Inheritance
PostgreSQL implements table inheritance, which can be a useful tool for database designers.
(SQL:1999 and later define a type inheritance feature, which differs in many respects from the features
described here.)

Let's start with an example: suppose we are trying to build a data model for cities. Each state has many
cities, but only one capital. We want to be able to quickly retrieve the capital city for any particular
state. This can be done by creating two tables, one for state capitals and one for cities that are not
capitals. However, what happens when we want to ask for data about a city, regardless of whether it is
a capital or not? The inheritance feature can help to resolve this problem. We define the capitals
table so that it inherits from cities:

CREATE TABLE cities (


name text,
population float,
elevation int -- in feet
);

CREATE TABLE capitals (


state char(2)
) INHERITS (cities);

In this case, the capitals table inherits all the columns of its parent table, cities. State capitals
also have an extra column, state, that shows their state.

In PostgreSQL, a table can inherit from zero or more other tables, and a query can reference either
all rows of a table or all rows of a table plus all of its descendant tables. The latter behavior is the
default. For example, the following query finds the names of all cities, including state capitals, that
are located at an elevation over 500 feet:

SELECT name, elevation


FROM cities
WHERE elevation > 500;

Given the sample data from the PostgreSQL tutorial (see Section 2.1), this returns:

name | elevation
-----------+-----------
Las Vegas | 2174

88
Data Definition

Mariposa | 1953
Madison | 845

On the other hand, the following query finds all the cities that are not state capitals and are situated
at an elevation over 500 feet:

SELECT name, elevation


FROM ONLY cities
WHERE elevation > 500;

name | elevation
-----------+-----------
Las Vegas | 2174
Mariposa | 1953

Here the ONLY keyword indicates that the query should apply only to cities, and not any tables
below cities in the inheritance hierarchy. Many of the commands that we have already discussed
— SELECT, UPDATE and DELETE — support the ONLY keyword.

You can also write the table name with a trailing * to explicitly specify that descendant tables are
included:

SELECT name, elevation


FROM cities*
WHERE elevation > 500;

Writing * is not necessary, since this behavior is always the default. However, this syntax is still
supported for compatibility with older releases where the default could be changed.

In some cases you might wish to know which table a particular row originated from. There is a system
column called tableoid in each table which can tell you the originating table:

SELECT c.tableoid, c.name, c.elevation


FROM cities c
WHERE c.elevation > 500;

which returns:

tableoid | name | elevation


----------+-----------+-----------
139793 | Las Vegas | 2174
139793 | Mariposa | 1953
139798 | Madison | 845

(If you try to reproduce this example, you will probably get different numeric OIDs.) By doing a join
with pg_class you can see the actual table names:

SELECT p.relname, c.name, c.elevation


FROM cities c, pg_class p
WHERE c.elevation > 500 AND c.tableoid = p.oid;

which returns:

relname | name | elevation


----------+-----------+-----------

89
Data Definition

cities | Las Vegas | 2174


cities | Mariposa | 1953
capitals | Madison | 845

Another way to get the same effect is to use the regclass alias type, which will print the table OID
symbolically:

SELECT c.tableoid::regclass, c.name, c.elevation


FROM cities c
WHERE c.elevation > 500;

Inheritance does not automatically propagate data from INSERT or COPY commands to other tables
in the inheritance hierarchy. In our example, the following INSERT statement will fail:

INSERT INTO cities (name, population, elevation, state)


VALUES ('Albany', NULL, NULL, 'NY');

We might hope that the data would somehow be routed to the capitals table, but this does not
happen: INSERT always inserts into exactly the table specified. In some cases it is possible to redirect
the insertion using a rule (see Chapter 41). However that does not help for the above case because
the cities table does not contain the column state, and so the command will be rejected before
the rule can be applied.

All check constraints and not-null constraints on a parent table are automatically inherited by its chil-
dren, unless explicitly specified otherwise with NO INHERIT clauses. Other types of constraints
(unique, primary key, and foreign key constraints) are not inherited.

A table can inherit from more than one parent table, in which case it has the union of the columns
defined by the parent tables. Any columns declared in the child table's definition are added to these.
If the same column name appears in multiple parent tables, or in both a parent table and the child's
definition, then these columns are “merged” so that there is only one such column in the child table. To
be merged, columns must have the same data types, else an error is raised. Inheritable check constraints
and not-null constraints are merged in a similar fashion. Thus, for example, a merged column will be
marked not-null if any one of the column definitions it came from is marked not-null. Check constraints
are merged if they have the same name, and the merge will fail if their conditions are different.

Table inheritance is typically established when the child table is created, using the INHERITS clause
of the CREATE TABLE statement. Alternatively, a table which is already defined in a compatible way
can have a new parent relationship added, using the INHERIT variant of ALTER TABLE. To do this
the new child table must already include columns with the same names and types as the columns of the
parent. It must also include check constraints with the same names and check expressions as those of
the parent. Similarly an inheritance link can be removed from a child using the NO INHERIT variant
of ALTER TABLE. Dynamically adding and removing inheritance links like this can be useful when
the inheritance relationship is being used for table partitioning (see Section 5.11).

One convenient way to create a compatible table that will later be made a new child is to use the
LIKE clause in CREATE TABLE. This creates a new table with the same columns as the source table.
If there are any CHECK constraints defined on the source table, the INCLUDING CONSTRAINTS
option to LIKE should be specified, as the new child must have constraints matching the parent to
be considered compatible.

A parent table cannot be dropped while any of its children remain. Neither can columns or check
constraints of child tables be dropped or altered if they are inherited from any parent tables. If you
wish to remove a table and all of its descendants, one easy way is to drop the parent table with the
CASCADE option (see Section 5.14).

ALTER TABLE will propagate any changes in column data definitions and check constraints down
the inheritance hierarchy. Again, dropping columns that are depended on by other tables is only pos-

90
Data Definition

sible when using the CASCADE option. ALTER TABLE follows the same rules for duplicate column
merging and rejection that apply during CREATE TABLE.

Inherited queries perform access permission checks on the parent table only. Thus, for example, grant-
ing UPDATE permission on the cities table implies permission to update rows in the capitals
table as well, when they are accessed through cities. This preserves the appearance that the data
is (also) in the parent table. But the capitals table could not be updated directly without an addi-
tional grant. In a similar way, the parent table's row security policies (see Section 5.8) are applied to
rows coming from child tables during an inherited query. A child table's policies, if any, are applied
only when it is the table explicitly named in the query; and in that case, any policies attached to its
parent(s) are ignored.

Foreign tables (see Section 5.12) can also be part of inheritance hierarchies, either as parent or child
tables, just as regular tables can be. If a foreign table is part of an inheritance hierarchy then any
operations not supported by the foreign table are not supported on the whole hierarchy either.

5.10.1. Caveats
Note that not all SQL commands are able to work on inheritance hierarchies. Commands that are used
for data querying, data modification, or schema modification (e.g., SELECT, UPDATE, DELETE, most
variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically default
to including child tables and support the ONLY notation to exclude them. Commands that do database
maintenance and tuning (e.g., REINDEX, VACUUM) typically only work on individual, physical tables
and do not support recursing over inheritance hierarchies. The respective behavior of each individual
command is documented in its reference page (SQL Commands).

A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign
key constraints only apply to single tables, not to their inheritance children. This is true on both the
referencing and referenced sides of a foreign key constraint. Thus, in the terms of the above example:

• If we declared cities.name to be UNIQUE or a PRIMARY KEY, this would not stop the cap-
itals table from having rows with names duplicating rows in cities. And those duplicate rows
would by default show up in queries from cities. In fact, by default capitals would have no
unique constraint at all, and so could contain multiple rows with the same name. You could add a
unique constraint to capitals, but this would not prevent duplication compared to cities.

• Similarly, if we were to specify that cities.name REFERENCES some other table, this constraint
would not automatically propagate to capitals. In this case you could work around it by manually
adding the same REFERENCES constraint to capitals.

• Specifying that another table's column REFERENCES cities(name) would allow the other
table to contain city names, but not capital names. There is no good workaround for this case.

Some functionality not implemented for inheritance hierarchies is implemented for declarative parti-
tioning. Considerable care is needed in deciding whether partitioning with legacy inheritance is useful
for your application.

5.11. Table Partitioning


PostgreSQL supports basic table partitioning. This section describes why and how to implement par-
titioning as part of your database design.

5.11.1. Overview
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning
can provide several benefits:

• Query performance can be improved dramatically in certain situations, particularly when most of
the heavily accessed rows of the table are in a single partition or a small number of partitions.

91
Data Definition

Partitioning effectively substitutes for the upper tree levels of indexes, making it more likely that
the heavily-used parts of the indexes fit in memory.

• When queries or updates access a large percentage of a single partition, performance can be im-
proved by using a sequential scan of that partition instead of using an index, which would require
random-access reads scattered across the whole table.

• Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is
accounted for in the partitioning design. Dropping an individual partition using DROP TABLE, or
doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands
also entirely avoid the VACUUM overhead caused by a bulk DELETE.

• Seldom-used data can be migrated to cheaper and slower storage media.

These benefits will normally be worthwhile only when a table would otherwise be very large. The
exact point at which a table will benefit from partitioning depends on the application, although a rule
of thumb is that the size of the table should exceed the physical memory of the database server.

PostgreSQL offers built-in support for the following forms of partitioning:

Range Partitioning

The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap
between the ranges of values assigned to different partitions. For example, one might partition by
date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are
understood as being inclusive at the lower end and exclusive at the upper end. For example, if
one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10
belongs to the second partition not the first.

List Partitioning

The table is partitioned by explicitly listing which key value(s) appear in each partition.

Hash Partitioning

The table is partitioned by specifying a modulus and a remainder for each partition. Each partition
will hold the rows for which the hash value of the partition key divided by the specified modulus
will produce the specified remainder.

If your application needs to use other forms of partitioning not listed above, alternative methods such
as inheritance and UNION ALL views can be used instead. Such methods offer flexibility but do not
have some of the performance benefits of built-in declarative partitioning.

5.11.2. Declarative Partitioning


PostgreSQL allows you to declare that a table is divided into partitions. The table that is divided is
referred to as a partitioned table. The declaration includes the partitioning method as described above,
plus a list of columns or expressions to be used as the partition key.

The partitioned table itself is a “virtual” table having no storage of its own. Instead, the storage belongs
to partitions, which are otherwise-ordinary tables associated with the partitioned table. Each partition
stores a subset of the data as defined by its partition bounds. All rows inserted into a partitioned table
will be routed to the appropriate one of the partitions based on the values of the partition key column(s).
Updating the partition key of a row will cause it to be moved into a different partition if it no longer
satisfies the partition bounds of its original partition.

Partitions may themselves be defined as partitioned tables, resulting in sub-partitioning. Although


all partitions must have the same columns as their partitioned parent, partitions may have their own
indexes, constraints and default values, distinct from those of other partitions. See CREATE TABLE
for more details on creating partitioned tables and partitions.

92
Data Definition

It is not possible to turn a regular table into a partitioned table or vice versa. However, it is possible to
add an existing regular or partitioned table as a partition of a partitioned table, or remove a partition
from a partitioned table turning it into a standalone table; this can simplify and speed up many main-
tenance processes. See ALTER TABLE to learn more about the ATTACH PARTITION and DETACH
PARTITION sub-commands.

Partitions can also be foreign tables, although considerable care is needed because it is then the user's
responsibility that the contents of the foreign table satisfy the partitioning rule. There are some other
restrictions as well. See CREATE FOREIGN TABLE for more information.

5.11.2.1. Example
Suppose we are constructing a database for a large ice cream company. The company measures peak
temperatures every day as well as ice cream sales in each region. Conceptually, we want a table like:

CREATE TABLE measurement (


city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);

We know that most queries will access just the last week's, month's or quarter's data, since the main
use of this table will be to prepare online reports for management. To reduce the amount of old data
that needs to be stored, we decide to keep only the most recent 3 years worth of data. At the beginning
of each month we will remove the oldest month's data. In this situation we can use partitioning to help
us meet all of our different requirements for the measurements table.

To use declarative partitioning in this case, use the following steps:

1. Create the measurement table as a partitioned table by specifying the PARTITION BY clause,
which includes the partitioning method (RANGE in this case) and the list of column(s) to use as
the partition key.

CREATE TABLE measurement (


city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
2. Create partitions. Each partition's definition must specify bounds that correspond to the partitioning
method and partition key of the parent. Note that specifying bounds such that the new partition's
values would overlap with those in one or more existing partitions will cause an error.

Partitions thus created are in every way normal PostgreSQL tables (or, possibly, foreign tables).
It is possible to specify a tablespace and storage parameters for each partition separately.

For our example, each partition should hold one month's worth of data, to match the requirement
of deleting one month's data at a time. So the commands might look like:

CREATE TABLE measurement_y2006m02 PARTITION OF measurement


FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement


FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

93
Data Definition

...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

CREATE TABLE measurement_y2007m12 PARTITION OF measurement


FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
TABLESPACE fasttablespace;

CREATE TABLE measurement_y2008m01 PARTITION OF measurement


FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
WITH (parallel_workers = 4)
TABLESPACE fasttablespace;

(Recall that adjacent partitions can share a bound value, since range upper bounds are treated as
exclusive bounds.)

If you wish to implement sub-partitioning, again specify the PARTITION BY clause in the com-
mands used to create individual partitions, for example:

CREATE TABLE measurement_y2006m02 PARTITION OF measurement


FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
PARTITION BY RANGE (peaktemp);

After creating partitions of measurement_y2006m02, any data inserted into measurement


that is mapped to measurement_y2006m02 (or data that is directly inserted into measure-
ment_y2006m02, which is allowed provided its partition constraint is satisfied) will be further
redirected to one of its partitions based on the peaktemp column. The partition key specified may
overlap with the parent's partition key, although care should be taken when specifying the bounds
of a sub-partition such that the set of data it accepts constitutes a subset of what the partition's own
bounds allow; the system does not try to check whether that's really the case.

Inserting data into the parent table that does not map to one of the existing partitions will cause an
error; an appropriate partition must be added manually.

It is not necessary to manually create table constraints describing the partition boundary conditions
for partitions. Such constraints will be created automatically.
3. Create an index on the key column(s), as well as any other indexes you might want, on the par-
titioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This
automatically creates a matching index on each partition, and any partitions you create or attach
later will also have such an index. An index or unique constraint declared on a partitioned table
is “virtual” in the same way that the partitioned table is: the actual data is in child indexes on the
individual partition tables.

CREATE INDEX ON measurement (logdate);


4. Ensure that the enable_partition_pruning configuration parameter is not disabled in post-
gresql.conf. If it is, queries will not be optimized as desired.

In the above example we would be creating a new partition each month, so it might be wise to write
a script that generates the required DDL automatically.

5.11.2.2. Partition Maintenance


Normally the set of partitions established when initially defining the table is not intended to remain
static. It is common to want to remove partitions holding old data and periodically add new partitions
for new data. One of the most important advantages of partitioning is precisely that it allows this
otherwise painful task to be executed nearly instantaneously by manipulating the partition structure,
rather than physically moving large amounts of data around.

94
Data Definition

The simplest option for removing old data is to drop the partition that is no longer necessary:

DROP TABLE measurement_y2006m02;

This can very quickly delete millions of records because it doesn't have to individually delete every
record. Note however that the above command requires taking an ACCESS EXCLUSIVE lock on
the parent table.

Another option that is often preferable is to remove the partition from the partitioned table but retain
access to it as a table in its own right. This has two forms:

ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;


ALTER TABLE measurement DETACH PARTITION measurement_y2006m02
CONCURRENTLY;

These allow further operations to be performed on the data before it is dropped. For example, this is
often a useful time to back up the data using COPY, pg_dump, or similar tools. It might also be a useful
time to aggregate data into smaller formats, perform other data manipulations, or run reports. The
first form of the command requires an ACCESS EXCLUSIVE lock on the parent table. Adding the
CONCURRENTLY qualifier as in the second form allows the detach operation to require only SHARE
UPDATE EXCLUSIVE lock on the parent table, but see ALTER TABLE ... DETACH PARTITION
for details on the restrictions.

Similarly we can add a new partition to handle new data. We can create an empty partition in the
partitioned table just as the original partitions were created above:

CREATE TABLE measurement_y2008m02 PARTITION OF measurement


FOR VALUES FROM ('2008-02-01') TO ('2008-03-01')
TABLESPACE fasttablespace;

As an alternative, it is sometimes more convenient to create the new table outside the partition struc-
ture, and attach it as a partition later. This allows new data to be loaded, checked, and transformed
prior to it appearing in the partitioned table. Moreover, the ATTACH PARTITION operation requires
only SHARE UPDATE EXCLUSIVE lock on the partitioned table, as opposed to the ACCESS EX-
CLUSIVE lock that is required by CREATE TABLE ... PARTITION OF, so it is more friendly
to concurrent operations on the partitioned table. The CREATE TABLE ... LIKE option is helpful
to avoid tediously repeating the parent table's definition:

CREATE TABLE measurement_y2008m02


(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS)
TABLESPACE fasttablespace;

ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02


CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE
'2008-03-01' );

\copy measurement_y2008m02 from 'measurement_y2008m02'


-- possibly some other data preparation work

ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02


FOR VALUES FROM ('2008-02-01') TO ('2008-03-01' );

Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint
on the table to be attached that matches the expected partition constraint, as illustrated above. That
way, the system will be able to skip the scan which is otherwise needed to validate the implicit partition

95
Data Definition

constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint
while holding an ACCESS EXCLUSIVE lock on that partition. It is recommended to drop the now-
redundant CHECK constraint after the ATTACH PARTITION is complete. If the table being attached
is itself a partitioned table, then each of its sub-partitions will be recursively locked and scanned until
either a suitable CHECK constraint is encountered or the leaf partitions are reached.

Similarly, if the partitioned table has a DEFAULT partition, it is recommended to create a CHECK con-
straint which excludes the to-be-attached partition's constraint. If this is not done then the DEFAULT
partition will be scanned to verify that it contains no records which should be located in the partition
being attached. This operation will be performed whilst holding an ACCESS EXCLUSIVE lock on the
DEFAULT partition. If the DEFAULT partition is itself a partitioned table, then each of its partitions
will be recursively checked in the same way as the table being attached, as mentioned above.

As explained above, it is possible to create indexes on partitioned tables so that they are applied au-
tomatically to the entire hierarchy. This is very convenient, as not only will the existing partitions
become indexed, but also any partitions that are created in the future will. One limitation is that it's
not possible to use the CONCURRENTLY qualifier when creating such a partitioned index. To avoid
long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index
is marked invalid, and the partitions do not get the index applied automatically. The indexes on parti-
tions can be created individually using CONCURRENTLY, and then attached to the index on the parent
using ALTER INDEX .. ATTACH PARTITION. Once indexes for all partitions are attached to
the parent index, the parent index is marked valid automatically. Example:

CREATE INDEX measurement_usls_idx ON ONLY measurement (unitsales);

CREATE INDEX measurement_usls_200602_idx


ON measurement_y2006m02 (unitsales);
ALTER INDEX measurement_usls_idx
ATTACH PARTITION measurement_usls_200602_idx;
...

This technique can be used with UNIQUE and PRIMARY KEY constraints too; the indexes are created
implicitly when the constraint is created. Example:

ALTER TABLE ONLY measurement ADD UNIQUE (city_id, logdate);

ALTER TABLE measurement_y2006m02 ADD UNIQUE (city_id, logdate);


ALTER INDEX measurement_city_id_logdate_key
ATTACH PARTITION measurement_y2006m02_city_id_logdate_key;
...

5.11.2.3. Limitations
The following limitations apply to partitioned tables:

• To create a unique or primary key constraint on a partitioned table, the partition keys must not in-
clude any expressions or function calls and the constraint's columns must include all of the partition
key columns. This limitation exists because the individual indexes making up the constraint can
only directly enforce uniqueness within their own partitions; therefore, the partition structure itself
must guarantee that there are not duplicates in different partitions.

• There is no way to create an exclusion constraint spanning the whole partitioned table. It is only
possible to put such a constraint on each leaf partition individually. Again, this limitation stems
from not being able to enforce cross-partition restrictions.

• BEFORE ROW triggers on INSERT cannot change which partition is the final destination for a
new row.

96
Data Definition

• Mixing temporary and permanent relations in the same partition tree is not allowed. Hence, if the
partitioned table is permanent, so must be its partitions and likewise if the partitioned table is tem-
porary. When using temporary relations, all members of the partition tree have to be from the same
session.

Individual partitions are linked to their partitioned table using inheritance behind-the-scenes. However,
it is not possible to use all of the generic features of inheritance with declaratively partitioned tables
or their partitions, as discussed below. Notably, a partition cannot have any parents other than the
partitioned table it is a partition of, nor can a table inherit from both a partitioned table and a regular
table. That means partitioned tables and their partitions never share an inheritance hierarchy with
regular tables.

Since a partition hierarchy consisting of the partitioned table and its partitions is still an inheritance
hierarchy, tableoid and all the normal rules of inheritance apply as described in Section 5.10, with
a few exceptions:

• Partitions cannot have columns that are not present in the parent. It is not possible to specify columns
when creating partitions with CREATE TABLE, nor is it possible to add columns to partitions
after-the-fact using ALTER TABLE. Tables may be added as a partition with ALTER TABLE ...
ATTACH PARTITION only if their columns exactly match the parent.

• Both CHECK and NOT NULL constraints of a partitioned table are always inherited by all its parti-
tions. CHECK constraints that are marked NO INHERIT are not allowed to be created on partitioned
tables. You cannot drop a NOT NULL constraint on a partition's column if the same constraint is
present in the parent table.

• Using ONLY to add or drop a constraint on only the partitioned table is supported as long as there
are no partitions. Once partitions exist, using ONLY will result in an error. Instead, constraints on
the partitions themselves can be added and (if they are not present in the parent table) dropped.

• As a partitioned table does not have any data itself, attempts to use TRUNCATE ONLY on a parti-
tioned table will always return an error.

5.11.3. Partitioning Using Inheritance


While the built-in declarative partitioning is suitable for most common use cases, there are some
circumstances where a more flexible approach may be useful. Partitioning can be implemented using
table inheritance, which allows for several features not supported by declarative partitioning, such as:

• For declarative partitioning, partitions must have exactly the same set of columns as the partitioned
table, whereas with table inheritance, child tables may have extra columns not present in the parent.

• Table inheritance allows for multiple inheritance.

• Declarative partitioning only supports range, list and hash partitioning, whereas table inheritance
allows data to be divided in a manner of the user's choosing. (Note, however, that if constraint
exclusion is unable to prune child tables effectively, query performance might be poor.)

5.11.3.1. Example
This example builds a partitioning structure equivalent to the declarative partitioning example above.
Use the following steps:

1. Create the “root” table, from which all of the “child” tables will inherit. This table will contain
no data. Do not define any check constraints on this table, unless you intend them to be applied
equally to all child tables. There is no point in defining any indexes or unique constraints on it,
either. For our example, the root table is the measurement table as originally defined:

CREATE TABLE measurement (

97
Data Definition

city_id int not null,


logdate date not null,
peaktemp int,
unitsales int
);
2. Create several “child” tables that each inherit from the root table. Normally, these tables will not
add any columns to the set inherited from the root. Just as with declarative partitioning, these tables
are in every way normal PostgreSQL tables (or foreign tables).

CREATE TABLE measurement_y2006m02 () INHERITS (measurement);


CREATE TABLE measurement_y2006m03 () INHERITS (measurement);
...
CREATE TABLE measurement_y2007m11 () INHERITS (measurement);
CREATE TABLE measurement_y2007m12 () INHERITS (measurement);
CREATE TABLE measurement_y2008m01 () INHERITS (measurement);
3. Add non-overlapping table constraints to the child tables to define the allowed key values in each.

Typical examples would be:

CHECK ( x = 1 )
CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire',
'Warwickshire' ))
CHECK ( outletID >= 100 AND outletID < 200 )

Ensure that the constraints guarantee that there is no overlap between the key values permitted in
different child tables. A common mistake is to set up range constraints like:

CHECK ( outletID BETWEEN 100 AND 200 )


CHECK ( outletID BETWEEN 200 AND 300 )

This is wrong since it is not clear which child table the key value 200 belongs in. Instead, ranges
should be defined in this style:

CREATE TABLE measurement_y2006m02 (


CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE
'2006-03-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2006m03 (


CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE
'2006-04-01' )
) INHERITS (measurement);

...
CREATE TABLE measurement_y2007m11 (
CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE
'2007-12-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2007m12 (


CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE
'2008-01-01' )
) INHERITS (measurement);

CREATE TABLE measurement_y2008m01 (

98
Data Definition

CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE


'2008-02-01' )
) INHERITS (measurement);
4. For each child table, create an index on the key column(s), as well as any other indexes you might
want.

CREATE INDEX measurement_y2006m02_logdate ON


measurement_y2006m02 (logdate);
CREATE INDEX measurement_y2006m03_logdate ON
measurement_y2006m03 (logdate);
CREATE INDEX measurement_y2007m11_logdate ON
measurement_y2007m11 (logdate);
CREATE INDEX measurement_y2007m12_logdate ON
measurement_y2007m12 (logdate);
CREATE INDEX measurement_y2008m01_logdate ON
measurement_y2008m01 (logdate);
5. We want our application to be able to say INSERT INTO measurement ... and have the
data be redirected into the appropriate child table. We can arrange that by attaching a suitable
trigger function to the root table. If data will be added only to the latest child, we can use a very
simple trigger function:

CREATE OR REPLACE FUNCTION measurement_insert_trigger()


RETURNS TRIGGER AS $$
BEGIN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

After creating the function, we create a trigger which calls the trigger function:

CREATE TRIGGER insert_measurement_trigger


BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE FUNCTION measurement_insert_trigger();

We must redefine the trigger function each month so that it always inserts into the current child
table. The trigger definition does not need to be updated, however.

We might want to insert data and have the server automatically locate the child table into which
the row should be added. We could do this with a more complex trigger function, for example:

CREATE OR REPLACE FUNCTION measurement_insert_trigger()


RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.logdate >= DATE '2006-02-01' AND
NEW.logdate < DATE '2006-03-01' ) THEN
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
NEW.logdate < DATE '2006-04-01' ) THEN
INSERT INTO measurement_y2006m03 VALUES (NEW.*);
...
ELSIF ( NEW.logdate >= DATE '2008-01-01' AND
NEW.logdate < DATE '2008-02-01' ) THEN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);

99
Data Definition

ELSE
RAISE EXCEPTION 'Date out of range. Fix the
measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

The trigger definition is the same as before. Note that each IF test must exactly match the CHECK
constraint for its child table.

While this function is more complex than the single-month case, it doesn't need to be updated as
often, since branches can be added in advance of being needed.

Note
In practice, it might be best to check the newest child first, if most inserts go into that
child. For simplicity, we have shown the trigger's tests in the same order as in other parts
of this example.

A different approach to redirecting inserts into the appropriate child table is to set up rules, instead
of a trigger, on the root table. For example:

CREATE RULE measurement_insert_y2006m02 AS


ON INSERT TO measurement WHERE
( logdate >= DATE '2006-02-01' AND logdate < DATE
'2006-03-01' )
DO INSTEAD
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
...
CREATE RULE measurement_insert_y2008m01 AS
ON INSERT TO measurement WHERE
( logdate >= DATE '2008-01-01' AND logdate < DATE
'2008-02-01' )
DO INSTEAD
INSERT INTO measurement_y2008m01 VALUES (NEW.*);

A rule has significantly more overhead than a trigger, but the overhead is paid once per query
rather than once per row, so this method might be advantageous for bulk-insert situations. In most
cases, however, the trigger method will offer better performance.

Be aware that COPY ignores rules. If you want to use COPY to insert data, you'll need to copy into
the correct child table rather than directly into the root. COPY does fire triggers, so you can use
it normally if you use the trigger approach.

Another disadvantage of the rule approach is that there is no simple way to force an error if the set
of rules doesn't cover the insertion date; the data will silently go into the root table instead.
6. Ensure that the constraint_exclusion configuration parameter is not disabled in post-
gresql.conf; otherwise child tables may be accessed unnecessarily.

As we can see, a complex table hierarchy could require a substantial amount of DDL. In the above
example we would be creating a new child table each month, so it might be wise to write a script that
generates the required DDL automatically.

100
Data Definition

5.11.3.2. Maintenance for Inheritance Partitioning


To remove old data quickly, simply drop the child table that is no longer necessary:

DROP TABLE measurement_y2006m02;

To remove the child table from the inheritance hierarchy table but retain access to it as a table in its
own right:

ALTER TABLE measurement_y2006m02 NO INHERIT measurement;

To add a new child table to handle new data, create an empty child table just as the original children
were created above:

CREATE TABLE measurement_y2008m02 (


CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE
'2008-03-01' )
) INHERITS (measurement);

Alternatively, one may want to create and populate the new child table before adding it to the table
hierarchy. This could allow data to be loaded, checked, and transformed before being made visible
to queries on the parent table.

CREATE TABLE measurement_y2008m02


(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE
'2008-03-01' );
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement_y2008m02 INHERIT measurement;

5.11.3.3. Caveats
The following caveats apply to partitioning implemented using inheritance:

• There is no automatic way to verify that all of the CHECK constraints are mutually exclusive. It is
safer to create code that generates child tables and creates and/or modifies associated objects than
to write each by hand.

• Indexes and foreign key constraints apply to single tables and not to their inheritance children, hence
they have some caveats to be aware of.

• The schemes shown here assume that the values of a row's key column(s) never change, or at least do
not change enough to require it to move to another partition. An UPDATE that attempts to do that will
fail because of the CHECK constraints. If you need to handle such cases, you can put suitable update
triggers on the child tables, but it makes management of the structure much more complicated.

• If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them
on each child table individually. A command like:

ANALYZE measurement;

will only process the root table.

101
Data Definition

• INSERT statements with ON CONFLICT clauses are unlikely to work as expected, as the ON
CONFLICT action is only taken in case of unique violations on the specified target relation, not
its child relations.

• Triggers or rules will be needed to route rows to the desired child table, unless the application is
explicitly aware of the partitioning scheme. Triggers may be complicated to write, and will be much
slower than the tuple routing performed internally by declarative partitioning.

5.11.4. Partition Pruning


Partition pruning is a query optimization technique that improves performance for declaratively par-
titioned tables. As an example:

SET enable_partition_pruning = on; -- the default


SELECT count(*) FROM measurement WHERE logdate >= DATE
'2008-01-01';

Without partition pruning, the above query would scan each of the partitions of the measurement
table. With partition pruning enabled, the planner will examine the definition of each partition and
prove that the partition need not be scanned because it could not contain any rows meeting the query's
WHERE clause. When the planner can prove this, it excludes (prunes) the partition from the query plan.

By using the EXPLAIN command and the enable_partition_pruning configuration parameter, it's pos-
sible to show the difference between a plan for which partitions have been pruned and one for which
they have not. A typical unoptimized plan for this type of table setup is:

SET enable_partition_pruning = off;


EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE
'2008-01-01';
QUERY PLAN
-------------------------------------------------------------------
----------------
Aggregate (cost=188.76..188.77 rows=1 width=8)
-> Append (cost=0.00..181.05 rows=3085 width=0)
-> Seq Scan on measurement_y2006m02 (cost=0.00..33.12
rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2006m03 (cost=0.00..33.12
rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
...
-> Seq Scan on measurement_y2007m11 (cost=0.00..33.12
rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2007m12 (cost=0.00..33.12
rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12
rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)

Some or all of the partitions might use index scans instead of full-table sequential scans, but the point
here is that there is no need to scan the older partitions at all to answer this query. When we enable
partition pruning, we get a significantly cheaper plan that will deliver the same answer:

SET enable_partition_pruning = on;

102
Data Definition

EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE


'2008-01-01';
QUERY PLAN
-------------------------------------------------------------------
----------------
Aggregate (cost=37.75..37.76 rows=1 width=8)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617
width=0)
Filter: (logdate >= '2008-01-01'::date)

Note that partition pruning is driven only by the constraints defined implicitly by the partition keys,
not by the presence of indexes. Therefore it isn't necessary to define indexes on the key columns.
Whether an index needs to be created for a given partition depends on whether you expect that queries
that scan the partition will generally scan a large part of the partition or just a small part. An index
will be helpful in the latter case but not the former.

Partition pruning can be performed not only during the planning of a given query, but also during its
execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions
whose values are not known at query planning time, for example, parameters defined in a PREPARE
statement, using a value obtained from a subquery, or using a parameterized value on the inner side of
a nested loop join. Partition pruning during execution can be performed at any of the following times:

• During initialization of the query plan. Partition pruning can be performed here for parameter values
which are known during the initialization phase of execution. Partitions which are pruned during
this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE. It is possible to de-
termine the number of partitions which were removed during this phase by observing the “Subplans
Removed” property in the EXPLAIN output.

• During actual execution of the query plan. Partition pruning may also be performed here to remove
partitions using values which are only known during actual query execution. This includes values
from subqueries and values from execution-time parameters such as those from parameterized nest-
ed loop joins. Since the value of these parameters may change many times during the execution of
the query, partition pruning is performed whenever one of the execution parameters being used by
partition pruning changes. Determining if partitions were pruned during this phase requires careful
inspection of the loops property in the EXPLAIN ANALYZE output. Subplans corresponding to
different partitions may have different values for it depending on how many times each of them
was pruned during execution. Some may be shown as (never executed) if they were pruned
every time.

Partition pruning can be disabled using the enable_partition_pruning setting.

5.11.5. Partitioning and Constraint Exclusion


Constraint exclusion is a query optimization technique similar to partition pruning. While it is primar-
ily used for partitioning implemented using the legacy inheritance method, it can be used for other
purposes, including with declarative partitioning.

Constraint exclusion works in a very similar way to partition pruning, except that it uses each table's
CHECK constraints — which gives it its name — whereas partition pruning uses the table's partition
bounds, which exist only in the case of declarative partitioning. Another difference is that constraint
exclusion is only applied at plan time; there is no attempt to remove partitions at execution time.

The fact that constraint exclusion uses CHECK constraints, which makes it slow compared to partition
pruning, can sometimes be used as an advantage: because constraints can be defined even on declar-
atively-partitioned tables, in addition to their internal partition bounds, constraint exclusion may be
able to elide additional partitions from the query plan.

The default (and recommended) setting of constraint_exclusion is neither on nor off, but an inter-
mediate setting called partition, which causes the technique to be applied only to queries that are

103
Data Definition

likely to be working on inheritance partitioned tables. The on setting causes the planner to examine
CHECK constraints in all queries, even simple ones that are unlikely to benefit.

The following caveats apply to constraint exclusion:

• Constraint exclusion is only applied during query planning, unlike partition pruning, which can also
be applied during query execution.

• Constraint exclusion only works when the query's WHERE clause contains constants (or externally
supplied parameters). For example, a comparison against a non-immutable function such as CUR-
RENT_TIMESTAMP cannot be optimized, since the planner cannot know which child table the
function's value might fall into at run time.

• Keep the partitioning constraints simple, else the planner may not be able to prove that child tables
might not need to be visited. Use simple equality conditions for list partitioning, or simple range
tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that
partitioning constraints should contain only comparisons of the partitioning column(s) to constants
using B-tree-indexable operators, because only B-tree-indexable column(s) are allowed in the par-
tition key.

• All constraints on all children of the parent table are examined during constraint exclusion, so large
numbers of children are likely to increase query planning time considerably. So the legacy inheri-
tance based partitioning will work well with up to perhaps a hundred child tables; don't try to use
many thousands of children.

5.11.6. Best Practices for Declarative Partitioning


The choice of how to partition a table should be made carefully, as the performance of query planning
and execution can be negatively affected by poor design.

One of the most critical design decisions will be the column or columns by which you partition your
data. Often the best choice will be to partition by the column or set of columns which most commonly
appear in WHERE clauses of queries being executed on the partitioned table. WHERE clauses that are
compatible with the partition bound constraints can be used to prune unneeded partitions. However,
you may be forced into making other decisions by requirements for the PRIMARY KEY or a UNIQUE
constraint. Removal of unwanted data is also a factor to consider when planning your partitioning
strategy. An entire partition can be detached fairly quickly, so it may be beneficial to design the par-
tition strategy in such a way that all data to be removed at once is located in a single partition.

Choosing the target number of partitions that the table should be divided into is also a critical decision
to make. Not having enough partitions may mean that indexes remain too large and that data locality
remains poor which could result in low cache hit ratios. However, dividing the table into too many
partitions can also cause issues. Too many partitions can mean longer query planning times and higher
memory consumption during both query planning and execution, as further described below. When
choosing how to partition your table, it's also important to consider what changes may occur in the
future. For example, if you choose to have one partition per customer and you currently have a small
number of large customers, consider the implications if in several years you instead find yourself with
a large number of small customers. In this case, it may be better to choose to partition by HASH and
choose a reasonable number of partitions rather than trying to partition by LIST and hoping that the
number of customers does not increase beyond what it is practical to partition the data by.

Sub-partitioning can be useful to further divide partitions that are expected to become larger than other
partitions. Another option is to use range partitioning with multiple columns in the partition key. Either
of these can easily lead to excessive numbers of partitions, so restraint is advisable.

It is important to consider the overhead of partitioning during query planning and execution. The
query planner is generally able to handle partition hierarchies with up to a few thousand partitions
fairly well, provided that typical queries allow the query planner to prune all but a small number
of partitions. Planning times become longer and memory consumption becomes higher when more

104
Data Definition

partitions remain after the planner performs partition pruning. Another reason to be concerned about
having a large number of partitions is that the server's memory consumption may grow significantly
over time, especially if many sessions touch large numbers of partitions. That's because each partition
requires its metadata to be loaded into the local memory of each session that touches it.

With data warehouse type workloads, it can make sense to use a larger number of partitions than with
an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as
the majority of processing time is spent during query execution. With either of these two types of
workload, it is important to make the right decisions early, as re-partitioning large quantities of data
can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the
partitioning strategy. Never just assume that more partitions are better than fewer partitions, nor vice-
versa.

5.12. Foreign Data


PostgreSQL implements portions of the SQL/MED specification, allowing you to access data that
resides outside PostgreSQL using regular SQL queries. Such data is referred to as foreign data. (Note
that this usage is not to be confused with foreign keys, which are a type of constraint within the
database.)

Foreign data is accessed with help from a foreign data wrapper. A foreign data wrapper is a library
that can communicate with an external data source, hiding the details of connecting to the data source
and obtaining data from it. There are some foreign data wrappers available as contrib modules; see
Appendix F. Other kinds of foreign data wrappers might be found as third party products. If none of
the existing foreign data wrappers suit your needs, you can write your own; see Chapter 57.

To access foreign data, you need to create a foreign server object, which defines how to connect to
a particular external data source according to the set of options used by its supporting foreign data
wrapper. Then you need to create one or more foreign tables, which define the structure of the remote
data. A foreign table can be used in queries just like a normal table, but a foreign table has no storage
in the PostgreSQL server. Whenever it is used, PostgreSQL asks the foreign data wrapper to fetch data
from the external source, or transmit data to the external source in the case of update commands.

Accessing remote data may require authenticating to the external data source. This information can
be provided by a user mapping, which can provide additional data such as user names and passwords
based on the current PostgreSQL role.

For additional information, see CREATE FOREIGN DATA WRAPPER, CREATE SERVER, CRE-
ATE USER MAPPING, CREATE FOREIGN TABLE, and IMPORT FOREIGN SCHEMA.

5.13. Other Database Objects


Tables are the central objects in a relational database structure, because they hold your data. But they
are not the only objects that exist in a database. Many other kinds of objects can be created to make the
use and management of the data more efficient or convenient. They are not discussed in this chapter,
but we give you a list here so that you are aware of what is possible:

• Views

• Functions, procedures, and operators

• Data types and domains

• Triggers and rewrite rules

Detailed information on these topics appears in Part V.

5.14. Dependency Tracking


105
Data Definition

When you create complex database structures involving many tables with foreign key constraints,
views, triggers, functions, etc. you implicitly create a net of dependencies between the objects. For
instance, a table with a foreign key constraint depends on the table it references.

To ensure the integrity of the entire database structure, PostgreSQL makes sure that you cannot drop
objects that other objects still depend on. For example, attempting to drop the products table we con-
sidered in Section 5.4.5, with the orders table depending on it, would result in an error message like
this:

DROP TABLE products;

ERROR: cannot drop table products because other objects depend on


it
DETAIL: constraint orders_product_no_fkey on table orders depends
on table products
HINT: Use DROP ... CASCADE to drop the dependent objects too.

The error message contains a useful hint: if you do not want to bother deleting all the dependent objects
individually, you can run:

DROP TABLE products CASCADE;

and all the dependent objects will be removed, as will any objects that depend on them, recursively.
In this case, it doesn't remove the orders table, it only removes the foreign key constraint. It stops
there because nothing depends on the foreign key constraint. (If you want to check what DROP ...
CASCADE will do, run DROP without CASCADE and read the DETAIL output.)

Almost all DROP commands in PostgreSQL support specifying CASCADE. Of course, the nature of
the possible dependencies varies with the type of the object. You can also write RESTRICT instead
of CASCADE to get the default behavior, which is to prevent dropping objects that any other objects
depend on.

Note
According to the SQL standard, specifying either RESTRICT or CASCADE is required in
a DROP command. No database system actually enforces that rule, but whether the default
behavior is RESTRICT or CASCADE varies across systems.

If a DROP command lists multiple objects, CASCADE is only required when there are dependencies
outside the specified group. For example, when saying DROP TABLE tab1, tab2 the existence
of a foreign key referencing tab1 from tab2 would not mean that CASCADE is needed to succeed.

For a user-defined function or procedure whose body is defined as a string literal, PostgreSQL tracks
dependencies associated with the function's externally-visible properties, such as its argument and
result types, but not dependencies that could only be known by examining the function body. As an
example, consider this situation:

CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow',


'green', 'blue', 'purple');

CREATE TABLE my_colors (color rainbow, note text);

CREATE FUNCTION get_color_note (rainbow) RETURNS text AS


'SELECT note FROM my_colors WHERE color = $1'

106
Data Definition

LANGUAGE SQL;

(See Section 38.5 for an explanation of SQL-language functions.) PostgreSQL will be aware that the
get_color_note function depends on the rainbow type: dropping the type would force dropping
the function, because its argument type would no longer be defined. But PostgreSQL will not consider
get_color_note to depend on the my_colors table, and so will not drop the function if the table
is dropped. While there are disadvantages to this approach, there are also benefits. The function is still
valid in some sense if the table is missing, though executing it would cause an error; creating a new
table of the same name would allow the function to work again.

On the other hand, for a SQL-language function or procedure whose body is written in SQL-standard
style, the body is parsed at function definition time and all dependencies recognized by the parser are
stored. Thus, if we write the function above as

CREATE FUNCTION get_color_note (rainbow) RETURNS text


BEGIN ATOMIC
SELECT note FROM my_colors WHERE color = $1;
END;

then the function's dependency on the my_colors table will be known and enforced by DROP.

107
Chapter 6. Data Manipulation
The previous chapter discussed how to create tables and other structures to hold your data. Now it is
time to fill the tables with data. This chapter covers how to insert, update, and delete table data. The
chapter after this will finally explain how to extract your long-lost data from the database.

6.1. Inserting Data


When a table is created, it contains no data. The first thing to do before a database can be of much use
is to insert data. Data is inserted one row at a time. You can also insert more than one row in a single
command, but it is not possible to insert something that is not a complete row. Even if you know only
some column values, a complete row must be created.

To create a new row, use the INSERT command. The command requires the table name and column
values. For example, consider the products table from Chapter 5:

CREATE TABLE products (


product_no integer,
name text,
price numeric
);

An example command to insert a row would be:

INSERT INTO products VALUES (1, 'Cheese', 9.99);

The data values are listed in the order in which the columns appear in the table, separated by commas.
Usually, the data values will be literals (constants), but scalar expressions are also allowed.

The above syntax has the drawback that you need to know the order of the columns in the table. To
avoid this you can also list the columns explicitly. For example, both of the following commands have
the same effect as the one above:

INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese',


9.99);
INSERT INTO products (name, price, product_no) VALUES ('Cheese',
9.99, 1);

Many users consider it good practice to always list the column names.

If you don't have values for all the columns, you can omit some of them. In that case, the columns will
be filled with their default values. For example:

INSERT INTO products (product_no, name) VALUES (1, 'Cheese');


INSERT INTO products VALUES (1, 'Cheese');

The second form is a PostgreSQL extension. It fills the columns from the left with as many values as
are given, and the rest will be defaulted.

For clarity, you can also request default values explicitly, for individual columns or for the entire row:

INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese',


DEFAULT);

108
Data Manipulation

INSERT INTO products DEFAULT VALUES;

You can insert multiple rows in a single command:

INSERT INTO products (product_no, name, price) VALUES


(1, 'Cheese', 9.99),
(2, 'Bread', 1.99),
(3, 'Milk', 2.99);

It is also possible to insert the result of a query (which might be no rows, one row, or many rows):

INSERT INTO products (product_no, name, price)


SELECT product_no, name, price FROM new_products
WHERE release_date = 'today';

This provides the full power of the SQL query mechanism (Chapter 7) for computing the rows to be
inserted.

Tip
When inserting a lot of data at the same time, consider using the COPY command. It is not
as flexible as the INSERT command, but is more efficient. Refer to Section 14.4 for more
information on improving bulk loading performance.

6.2. Updating Data


The modification of data that is already in the database is referred to as updating. You can update
individual rows, all the rows in a table, or a subset of all rows. Each column can be updated separately;
the other columns are not affected.

To update existing rows, use the UPDATE command. This requires three pieces of information:

1. The name of the table and column to update


2. The new value of the column
3. Which row(s) to update

Recall from Chapter 5 that SQL does not, in general, provide a unique identifier for rows. Therefore it
is not always possible to directly specify which row to update. Instead, you specify which conditions
a row must meet in order to be updated. Only if you have a primary key in the table (independent
of whether you declared it or not) can you reliably address individual rows by choosing a condition
that matches the primary key. Graphical database access tools rely on this fact to allow you to update
rows individually.

For example, this command updates all products that have a price of 5 to have a price of 10:

UPDATE products SET price = 10 WHERE price = 5;

This might cause zero, one, or many rows to be updated. It is not an error to attempt an update that
does not match any rows.

Let's look at that command in detail. First is the key word UPDATE followed by the table name. As
usual, the table name can be schema-qualified, otherwise it is looked up in the path. Next is the key
word SET followed by the column name, an equal sign, and the new column value. The new column
value can be any scalar expression, not just a constant. For example, if you want to raise the price of
all products by 10% you could use:

109
Data Manipulation

UPDATE products SET price = price * 1.10;

As you see, the expression for the new value can refer to the existing value(s) in the row. We also left
out the WHERE clause. If it is omitted, it means that all rows in the table are updated. If it is present,
only those rows that match the WHERE condition are updated. Note that the equals sign in the SET
clause is an assignment while the one in the WHERE clause is a comparison, but this does not create any
ambiguity. Of course, the WHERE condition does not have to be an equality test. Many other operators
are available (see Chapter 9). But the expression needs to evaluate to a Boolean result.

You can update more than one column in an UPDATE command by listing more than one assignment
in the SET clause. For example:

UPDATE mytable SET a = 5, b = 3, c = 1 WHERE a > 0;

6.3. Deleting Data


So far we have explained how to add data to tables and how to change data. What remains is to discuss
how to remove data that is no longer needed. Just as adding data is only possible in whole rows, you
can only remove entire rows from a table. In the previous section we explained that SQL does not
provide a way to directly address individual rows. Therefore, removing rows can only be done by
specifying conditions that the rows to be removed have to match. If you have a primary key in the table
then you can specify the exact row. But you can also remove groups of rows matching a condition,
or you can remove all rows in the table at once.

You use the DELETE command to remove rows; the syntax is very similar to the UPDATE command.
For instance, to remove all rows from the products table that have a price of 10, use:

DELETE FROM products WHERE price = 10;

If you simply write:

DELETE FROM products;

then all rows in the table will be deleted! Caveat programmer.

6.4. Returning Data from Modified Rows


Sometimes it is useful to obtain data from modified rows while they are being manipulated. The
INSERT, UPDATE, and DELETE commands all have an optional RETURNING clause that supports
this. Use of RETURNING avoids performing an extra database query to collect the data, and is espe-
cially valuable when it would otherwise be difficult to identify the modified rows reliably.

The allowed contents of a RETURNING clause are the same as a SELECT command's output list (see
Section 7.3). It can contain column names of the command's target table, or value expressions using
those columns. A common shorthand is RETURNING *, which selects all columns of the target table
in order.

In an INSERT, the data available to RETURNING is the row as it was inserted. This is not so useful in
trivial inserts, since it would just repeat the data provided by the client. But it can be very handy when
relying on computed default values. For example, when using a serial column to provide unique
identifiers, RETURNING can return the ID assigned to a new row:

CREATE TABLE users (firstname text, lastname text, id serial


primary key);

110
Data Manipulation

INSERT INTO users (firstname, lastname) VALUES ('Joe', 'Cool')


RETURNING id;

The RETURNING clause is also very useful with INSERT ... SELECT.

In an UPDATE, the data available to RETURNING is the new content of the modified row. For example:

UPDATE products SET price = price * 1.10


WHERE price <= 99.99
RETURNING name, price AS new_price;

In a DELETE, the data available to RETURNING is the content of the deleted row. For example:

DELETE FROM products


WHERE obsoletion_date = 'today'
RETURNING *;

If there are triggers (Chapter 39) on the target table, the data available to RETURNING is the row as
modified by the triggers. Thus, inspecting columns computed by triggers is another common use-case
for RETURNING.

111
Chapter 7. Queries
The previous chapters explained how to create tables, how to fill them with data, and how to manipulate
that data. Now we finally discuss how to retrieve the data from the database.

7.1. Overview
The process of retrieving or the command to retrieve data from a database is called a query. In SQL
the SELECT command is used to specify queries. The general syntax of the SELECT command is

[WITH with_queries] SELECT select_list FROM table_expression


[sort_specification]

The following sections describe the details of the select list, the table expression, and the sort specifi-
cation. WITH queries are treated last since they are an advanced feature.

A simple kind of query has the form:

SELECT * FROM table1;

Assuming that there is a table called table1, this command would retrieve all rows and all user-
defined columns from table1. (The method of retrieval depends on the client application. For ex-
ample, the psql program will display an ASCII-art table on the screen, while client libraries will offer
functions to extract individual values from the query result.) The select list specification * means all
columns that the table expression happens to provide. A select list can also select a subset of the avail-
able columns or make calculations using the columns. For example, if table1 has columns named
a, b, and c (and perhaps others) you can make the following query:

SELECT a, b + c FROM table1;

(assuming that b and c are of a numerical data type). See Section 7.3 for more details.

FROM table1 is a simple kind of table expression: it reads just one table. In general, table expres-
sions can be complex constructs of base tables, joins, and subqueries. But you can also omit the table
expression entirely and use the SELECT command as a calculator:

SELECT 3 * 4;

This is more useful if the expressions in the select list return varying results. For example, you could
call a function this way:

SELECT random();

7.2. Table Expressions


A table expression computes a table. The table expression contains a FROM clause that is optionally
followed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a
table on disk, a so-called base table, but more complex expressions can be used to modify or combine
base tables in various ways.

The optional WHERE, GROUP BY, and HAVING clauses in the table expression specify a pipeline of
successive transformations performed on the table derived in the FROM clause. All these transforma-

112
Queries

tions produce a virtual table that provides the rows that are passed to the select list to compute the
output rows of the query.

7.2.1. The FROM Clause


The FROM clause derives a table from one or more other tables given in a comma-separated table
reference list.

FROM table_reference [, table_reference [, ...]]

A table reference can be a table name (possibly schema-qualified), or a derived table such as a sub-
query, a JOIN construct, or complex combinations of these. If more than one table reference is listed
in the FROM clause, the tables are cross-joined (that is, the Cartesian product of their rows is formed;
see below). The result of the FROM list is an intermediate virtual table that can then be subject to trans-
formations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall
table expression.

When a table reference names a table that is the parent of a table inheritance hierarchy, the table
reference produces rows of not only that table but all of its descendant tables, unless the key word
ONLY precedes the table name. However, the reference produces only the columns that appear in the
named table — any columns added in subtables are ignored.

Instead of writing ONLY before the table name, you can write * after the table name to explicitly
specify that descendant tables are included. There is no real reason to use this syntax any more, be-
cause searching descendant tables is now always the default behavior. However, it is supported for
compatibility with older releases.

7.2.1.1. Joined Tables


A joined table is a table derived from two other (real or derived) tables according to the rules of the
particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is

T1 join_type T2 [ join_condition ]

Joins of all types can be chained together, or nested: either or both T1 and T2 can be joined tables.
Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses,
JOIN clauses nest left-to-right.

Join Types
Cross join

T1 CROSS JOIN T2

For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined
table will contain a row consisting of all columns in T1 followed by all columns in T2. If the
tables have N and M rows respectively, the joined table will have N * M rows.

FROM T1 CROSS JOIN T2 is equivalent to FROM T1 INNER JOIN T2 ON TRUE (see


below). It is also equivalent to FROM T1, T2.

Note
This latter equivalence does not hold exactly when more than two tables appear, because
JOIN binds more tightly than comma. For example FROM T1 CROSS JOIN T2
INNER JOIN T3 ON condition is not the same as FROM T1, T2 INNER JOIN

113
Queries

T3 ON condition because the condition can reference T1 in the first case but
not the second.

Qualified joins

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2


ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING
( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2

The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT,
and FULL imply an outer join.

The join condition is specified in the ON or USING clause, or implicitly by the word NATURAL.
The join condition determines which rows from the two source tables are considered to “match”,
as explained in detail below.

The possible types of qualified join are:

INNER JOIN

For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join
condition with R1.

LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join
condition with any row in T2, a joined row is added with null values in columns of T2. Thus,
the joined table always has at least one row for each row in T1.

RIGHT OUTER JOIN

First, an inner join is performed. Then, for each row in T2 that does not satisfy the join
condition with any row in T1, a joined row is added with null values in columns of T1. This
is the converse of a left join: the result table will always have a row for each row in T2.

FULL OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join
condition with any row in T2, a joined row is added with null values in columns of T2. Also,
for each row of T2 that does not satisfy the join condition with any row in T1, a joined row
with null values in the columns of T1 is added.

The ON clause is the most general kind of join condition: it takes a Boolean value expression of
the same kind as is used in a WHERE clause. A pair of rows from T1 and T2 match if the ON
expression evaluates to true.

The USING clause is a shorthand that allows you to take advantage of the specific situation where
both sides of the join use the same name for the joining column(s). It takes a comma-separated
list of the shared column names and forms a join condition that includes an equality comparison
for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition
ON T1.a = T2.a AND T1.b = T2.b.

Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print
both of the matched columns, since they must have equal values. While JOIN ON produces all
columns from T1 followed by all columns from T2, JOIN USING produces one output column
for each of the listed column pairs (in the listed order), followed by any remaining columns from
T1, followed by any remaining columns from T2.

114
Queries

Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of all column
names that appear in both input tables. As with USING, these columns appear only once in the
output table. If there are no common column names, NATURAL JOIN behaves like JOIN ...
ON TRUE, producing a cross-product join.

Note
USING is reasonably safe from column changes in the joined relations since only the listed
columns are combined. NATURAL is considerably more risky since any schema changes
to either relation that cause a new matching column name to be present will cause the join
to combine that new column as well.

To put this together, assume we have tables t1:

num | name
-----+------
1 | a
2 | b
3 | c

and t2:

num | value
-----+-------
1 | xxx
3 | yyy
5 | zzz

then we get the following results for the various joins:

=> SELECT * FROM t1 CROSS JOIN t2;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
1 | a | 3 | yyy
1 | a | 5 | zzz
2 | b | 1 | xxx
2 | b | 3 | yyy
2 | b | 5 | zzz
3 | c | 1 | xxx
3 | c | 3 | yyy
3 | c | 5 | zzz
(9 rows)

=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
3 | c | 3 | yyy
(2 rows)

=> SELECT * FROM t1 INNER JOIN t2 USING (num);


num | name | value
-----+------+-------
1 | a | xxx

115
Queries

3 | c | yyy
(2 rows)

=> SELECT * FROM t1 NATURAL INNER JOIN t2;


num | name | value
-----+------+-------
1 | a | xxx
3 | c | yyy
(2 rows)

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | 3 | yyy
(3 rows)

=> SELECT * FROM t1 LEFT JOIN t2 USING (num);


num | name | value
-----+------+-------
1 | a | xxx
2 | b |
3 | c | yyy
(3 rows)

=> SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
3 | c | 3 | yyy
| | 5 | zzz
(3 rows)

=> SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | 3 | yyy
| | 5 | zzz
(4 rows)

The join condition specified with ON can also contain conditions that do not relate directly to the join.
This can prove useful for some queries but needs to be thought out carefully. For example:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value =


'xxx';
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | |
(3 rows)

Notice that placing the restriction in the WHERE clause produces a different result:

116
Queries

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value


= 'xxx';
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
(1 row)

This is because a restriction placed in the ON clause is processed before the join, while a restriction
placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it
matters a lot with outer joins.

7.2.1.2. Table and Column Aliases


A temporary name can be given to tables and complex table references to be used for references to
the derived table in the rest of the query. This is called a table alias.

To create a table alias, write

FROM table_reference AS alias

or

FROM table_reference alias

The AS key word is optional noise. alias can be any identifier.

A typical application of table aliases is to assign short identifiers to long table names to keep the join
clauses readable. For example:

SELECT * FROM some_very_long_table_name s JOIN


another_fairly_long_name a ON s.id = a.num;

The alias becomes the new name of the table reference so far as the current query is concerned — it
is not allowed to refer to the table by the original name elsewhere in the query. Thus, this is not valid:

SELECT * FROM my_table AS m WHERE my_table.a > 5; -- wrong

Table aliases are mainly for notational convenience, but it is necessary to use them when joining a
table to itself, e.g.:

SELECT * FROM people AS mother JOIN people AS child ON mother.id =


child.mother_id;

Additionally, an alias is required if the table reference is a subquery (see Section 7.2.1.3).

Parentheses are used to resolve ambiguities. In the following example, the first statement assigns the
alias b to the second instance of my_table, but the second statement assigns the alias to the result
of the join:

SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...


SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...

Another form of table aliasing gives temporary names to the columns of the table, as well as the table
itself:

117
Queries

FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )

If fewer column aliases are specified than the actual table has columns, the remaining columns are not
renamed. This syntax is especially useful for self-joins or subqueries.

When an alias is applied to the output of a JOIN clause, the alias hides the original name(s) within
the JOIN. For example:

SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...

is valid SQL, but:

SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c

is not valid; the table alias a is not visible outside the alias c.

7.2.1.3. Subqueries
Subqueries specifying a derived table must be enclosed in parentheses and must be assigned a table
alias name (as in Section 7.2.1.2). For example:

FROM (SELECT * FROM table1) AS alias_name

This example is equivalent to FROM table1 AS alias_name. More interesting cases, which
cannot be reduced to a plain join, arise when the subquery involves grouping or aggregation.

A subquery can also be a VALUES list:

FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))


AS names(first, last)

Again, a table alias is required. Assigning alias names to the columns of the VALUES list is optional,
but is good practice. For more information see Section 7.7.

7.2.1.4. Table Functions


Table functions are functions that produce a set of rows, made up of either base data types (scalar
types) or composite data types (table rows). They are used like a table, view, or subquery in the FROM
clause of a query. Columns returned by table functions can be included in SELECT, JOIN, or WHERE
clauses in the same manner as columns of a table, view, or subquery.

Table functions may also be combined using the ROWS FROM syntax, with the results returned in
parallel columns; the number of result rows in this case is that of the largest function result, with
smaller results padded with null values to match.

function_call [WITH ORDINALITY] [[AS] table_alias [(column_alias


[, ... ])]]
ROWS FROM( function_call [, ... ] ) [WITH ORDINALITY]
[[AS] table_alias [(column_alias [, ... ])]]

If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added
to the function result columns. This column numbers the rows of the function result set, starting from
1. (This is a generalization of the SQL-standard syntax for UNNEST ... WITH ORDINALITY.)
By default, the ordinal column is called ordinality, but a different column name can be assigned
to it using an AS clause.

118
Queries

The special table function UNNEST may be called with any number of array parameters, and it returns
a corresponding number of columns, as if UNNEST (Section 9.19) had been called on each parameter
separately and combined using the ROWS FROM construct.

UNNEST( array_expression [, ... ] ) [WITH ORDINALITY]


[[AS] table_alias [(column_alias [, ... ])]]

If no table_alias is specified, the function name is used as the table name; in the case of a ROWS
FROM() construct, the first function's name is used.

If column aliases are not supplied, then for a function returning a base data type, the column name is
also the same as the function name. For a function returning a composite type, the result columns get
the names of the individual attributes of the type.

Some examples:

CREATE TABLE foo (fooid int, foosubid int, fooname text);

CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$


SELECT * FROM foo WHERE fooid = $1;
$$ LANGUAGE SQL;

SELECT * FROM getfoo(1) AS t1;

SELECT * FROM foo


WHERE foosubid IN (
SELECT foosubid
FROM getfoo(foo.fooid) z
WHERE z.fooid = foo.fooid
);

CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);

SELECT * FROM vw_getfoo;

In some cases it is useful to define table functions that can return different column sets depending on
how they are invoked. To support this, the table function can be declared as returning the pseudo-type
record with no OUT parameters. When such a function is used in a query, the expected row structure
must be specified in the query itself, so that the system can know how to parse and plan the query.
This syntax looks like:

function_call [AS] alias (column_definition [, ... ])


function_call AS [alias] (column_definition [, ... ])
ROWS FROM( ... function_call AS (column_definition [, ... ])
[, ... ] )

When not using the ROWS FROM() syntax, the column_definition list replaces the column
alias list that could otherwise be attached to the FROM item; the names in the column definitions serve
as column aliases. When using the ROWS FROM() syntax, a column_definition list can be
attached to each member function separately; or if there is only one member function and no WITH
ORDINALITY clause, a column_definition list can be written in place of a column alias list
following ROWS FROM().

Consider this example:

SELECT *

119
Queries

FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM


pg_proc')
AS t1(proname name, prosrc text)
WHERE proname LIKE 'bytea%';

The dblink function (part of the dblink module) executes a remote query. It is declared to return
record since it might be used for any kind of query. The actual column set must be specified in the
calling query so that the parser knows, for example, what * should expand to.

This example uses ROWS FROM:

SELECT *
FROM ROWS FROM
(
json_to_recordset('[{"a":40,"b":"foo"},
{"a":"100","b":"bar"}]')
AS (a INTEGER, b TEXT),
generate_series(1, 3)
) AS x (p, q, s)
ORDER BY p;

p | q | s
-----+-----+---
40 | foo | 1
100 | bar | 2
| | 3

It joins two functions into a single FROM target. json_to_recordset() is instructed to return
two columns, the first integer and the second text. The result of generate_series() is used
directly. The ORDER BY clause sorts the column values as integers.

7.2.1.5. LATERAL Subqueries


Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to ref-
erence columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated
independently and so cannot cross-reference any other FROM item.)

Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions
the key word is optional; the function's arguments can contain references to columns provided by
preceding FROM items in any case.

A LATERAL item can appear at the top level in the FROM list, or within a JOIN tree. In the latter case
it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each
row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM
items providing the columns, the LATERAL item is evaluated using that row or row set's values of
the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is
repeated for each row or set of rows from the column source table(s).

A trivial example of LATERAL is

SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id =


foo.bar_id) ss;

This is not especially useful since it has exactly the same result as the more conventional

120
Queries

SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;

LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s)
to be joined. A common application is providing an argument value for a set-returning function. For
example, supposing that vertices(polygon) returns the set of vertices of a polygon, we could
identify close-together vertices of polygons stored in a table with:

SELECT p1.id, p2.id, v1, v2


FROM polygons p1, polygons p2,
LATERAL vertices(p1.poly) v1,
LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

This query could also be written

SELECT p1.id, p2.id, v1, v2


FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;

or in several other equivalent formulations. (As already mentioned, the LATERAL key word is unnec-
essary in this example, but we use it for clarity.)

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear
in the result even if the LATERAL subquery produces no rows for them. For example, if get_prod-
uct_names() returns the names of products made by a manufacturer, but some manufacturers in
our table currently produce no products, we could find out which ones those are like this:

SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id)
pname ON true
WHERE pname IS NULL;

7.2.2. The WHERE Clause


The syntax of the WHERE clause is

WHERE search_condition

where search_condition is any value expression (see Section 4.2) that returns a value of type
boolean.

After the processing of the FROM clause is done, each row of the derived virtual table is checked
against the search condition. If the result of the condition is true, the row is kept in the output table,
otherwise (i.e., if the result is false or null) it is discarded. The search condition typically references
at least one column of the table generated in the FROM clause; this is not required, but otherwise the
WHERE clause will be fairly useless.

Note
The join condition of an inner join can be written either in the WHERE clause or in the JOIN
clause. For example, these table expressions are equivalent:

FROM a, b WHERE a.id = b.id AND b.val > 5

121
Queries

and:

FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5

or perhaps even:

FROM a NATURAL JOIN b WHERE b.val > 5

Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause
is probably not as portable to other SQL database management systems, even though it is in
the SQL standard. For outer joins there is no choice: they must be done in the FROM clause.
The ON or USING clause of an outer join is not equivalent to a WHERE condition, because
it results in the addition of rows (for unmatched input rows) as well as the removal of rows
in the final result.

Here are some examples of WHERE clauses:

SELECT ... FROM fdt WHERE c1 > 5

SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)

SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 =


fdt.c1 + 10)

SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 =


fdt.c1 + 10) AND 100

SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 >
fdt.c1)

fdt is the table derived in the FROM clause. Rows that do not meet the search condition of the WHERE
clause are eliminated from fdt. Notice the use of scalar subqueries as value expressions. Just like any
other query, the subqueries can employ complex table expressions. Notice also how fdt is referenced
in the subqueries. Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column
in the derived input table of the subquery. But qualifying the column name adds clarity even when
it is not needed. This example shows how the column naming scope of an outer query extends into
its inner queries.

7.2.3. The GROUP BY and HAVING Clauses


After passing the WHERE filter, the derived input table might be subject to grouping, using the GROUP
BY clause, and elimination of group rows using the HAVING clause.

SELECT select_list
FROM ...
[WHERE ...]
GROUP BY grouping_column_reference
[, grouping_column_reference]...

The GROUP BY clause is used to group together those rows in a table that have the same values in
all the columns listed. The order in which the columns are listed does not matter. The effect is to

122
Queries

combine each set of rows having common values into one group row that represents all rows in the
group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to
these groups. For instance:

=> SELECT * FROM test1;


x | y
---+---
a | 3
c | 2
b | 5
a | 1
(4 rows)

=> SELECT x FROM test1 GROUP BY x;


x
---
a
b
c
(3 rows)

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because
there is no single value for the column y that could be associated with each group. The grouped-by
columns can be referenced in the select list since they have a single value in each group.

In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except
in aggregate expressions. An example with aggregate expressions is:

=> SELECT x, sum(y) FROM test1 GROUP BY x;


x | sum
---+-----
a | 4
b | 5
c | 2
(3 rows)

Here sum is an aggregate function that computes a single value over the entire group. More information
about the available aggregate functions can be found in Section 9.21.

Tip
Grouping without aggregate expressions effectively calculates the set of distinct values in a
column. This can also be achieved using the DISTINCT clause (see Section 7.3.3).

Here is another example: it calculates the total sales for each product (rather than the total sales of
all products):

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales


FROM products p LEFT JOIN sales s USING (product_id)
GROUP BY product_id, p.name, p.price;

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY
clause since they are referenced in the query select list (but see below). The column s.units does
not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)),

123
Queries

which represents the sales of a product. For each product, the query returns a summary row about all
sales of the product.

If the products table is set up so that, say, product_id is the primary key, then it would be enough to
group by product_id in the above example, since name and price would be functionally dependent
on the product ID, and so there would be no ambiguity about which name and price value to return
for each product ID group.

In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends
this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions
instead of simple column names is also allowed.

If a table has been grouped using GROUP BY, but only certain groups are of interest, the HAVING
clause can be used, much like a WHERE clause, to eliminate groups from the result. The syntax is:

SELECT select_list FROM ... [WHERE ...] GROUP BY ...


HAVING boolean_expression

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions
(which necessarily involve an aggregate function).

Example:

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;


x | sum
---+-----
a | 4
b | 5
(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';


x | sum
---+-----
a | 4
b | 5
(2 rows)

Again, a more realistic example:

SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS


profit
FROM products p LEFT JOIN sales s USING (product_id)
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
GROUP BY product_id, p.name, p.price, p.cost
HAVING sum(p.price * s.units) > 5000;

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the ex-
pression is only true for sales during the last four weeks), while the HAVING clause restricts the output
to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need
to be the same in all parts of the query.

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result
is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING).
The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP
BY clause.

124
Queries

7.2.4. GROUPING SETS, CUBE, and ROLLUP


More complex grouping operations than those described above are possible using the concept of group-
ing sets. The data selected by the FROM and WHERE clauses is grouped separately by each specified
grouping set, aggregates computed for each group just as for simple GROUP BY clauses, and then
the results returned. For example:

=> SELECT * FROM items_sold;


brand | size | sales
-------+------+-------
Foo | L | 10
Foo | M | 20
Bar | M | 15
Bar | L | 5
(4 rows)

=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING


SETS ((brand), (size), ());
brand | size | sum
-------+------+-----
Foo | | 30
Bar | | 20
| L | 15
| M | 35
| | 50
(5 rows)

Each sublist of GROUPING SETS may specify zero or more columns or expressions and is interpreted
the same way as though it were directly in the GROUP BY clause. An empty grouping set means that
all rows are aggregated down to a single group (which is output even if no input rows were present),
as described above for the case of aggregate functions with no GROUP BY clause.

References to the grouping columns or expressions are replaced by null values in result rows for
grouping sets in which those columns do not appear. To distinguish which grouping a particular output
row resulted from, see Table 9.61.

A shorthand notation is provided for specifying two common types of grouping set. A clause of the
form

ROLLUP ( e1, e2, e3, ... )

represents the given list of expressions and all prefixes of the list including the empty list; thus it is
equivalent to

GROUPING SETS (
( e1, e2, e3, ... ),
...
( e1, e2 ),
( e1 ),
( )
)

This is commonly used for analysis over hierarchical data; e.g., total salary by department, division,
and company-wide total.

A clause of the form

125
Queries

CUBE ( e1, e2, ... )

represents the given list and all of its possible subsets (i.e., the power set). Thus

CUBE ( a, b, c )

is equivalent to

GROUPING SETS (
( a, b, c ),
( a, b ),
( a, c ),
( a ),
( b, c ),
( b ),
( c ),
( )
)

The individual elements of a CUBE or ROLLUP clause may be either individual expressions, or sublists
of elements in parentheses. In the latter case, the sublists are treated as single units for the purposes
of generating the individual grouping sets. For example:

CUBE ( (a, b), (c, d) )

is equivalent to

GROUPING SETS (
( a, b, c, d ),
( a, b ),
( c, d ),
( )
)

and

ROLLUP ( a, (b, c), d )

is equivalent to

GROUPING SETS (
( a, b, c, d ),
( a, b, c ),
( a ),
( )
)

The CUBE and ROLLUP constructs can be used either directly in the GROUP BY clause, or nested
inside a GROUPING SETS clause. If one GROUPING SETS clause is nested inside another, the
effect is the same as if all the elements of the inner clause had been written directly in the outer clause.

If multiple grouping items are specified in a single GROUP BY clause, then the final list of grouping
sets is the cross product of the individual items. For example:

126
Queries

GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))

is equivalent to

GROUP BY GROUPING SETS (


(a, b, c, d), (a, b, c, e),
(a, b, d), (a, b, e),
(a, c, d), (a, c, e),
(a, d), (a, e)
)

When specifying multiple grouping items together, the final set of grouping sets might contain du-
plicates. For example:

GROUP BY ROLLUP (a, b), ROLLUP (a, c)

is equivalent to

GROUP BY GROUPING SETS (


(a, b, c),
(a, b),
(a, b),
(a, c),
(a),
(a),
(a, c),
(a),
()
)

If these duplicates are undesirable, they can be removed using the DISTINCT clause directly on the
GROUP BY. Therefore:

GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c)

is equivalent to

GROUP BY GROUPING SETS (


(a, b, c),
(a, b),
(a, c),
(a),
()
)

This is not the same as using SELECT DISTINCT because the output rows may still contain dupli-
cates. If any of the ungrouped columns contains NULL, it will be indistinguishable from the NULL
used when that same column is grouped.

Note
The construct (a, b) is normally recognized in expressions as a row constructor. Within the
GROUP BY clause, this does not apply at the top levels of expressions, and (a, b) is parsed

127
Queries

as a list of expressions as described above. If for some reason you need a row constructor in
a grouping expression, use ROW(a, b).

7.2.5. Window Function Processing


If the query contains any window functions (see Section 3.5, Section 9.22 and Section 4.2.8), these
functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if
the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions
are the group rows instead of the original table rows from FROM/WHERE.

When multiple window functions are used, all the window functions having syntactically equivalent
PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated
in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY
does not uniquely determine an ordering. However, no guarantees are made about the evaluation of
functions having different PARTITION BY or ORDER BY specifications. (In such cases a sort step is
typically required between the passes of window function evaluations, and the sort is not guaranteed
to preserve ordering of rows that its ORDER BY sees as equivalent.)

Currently, window functions always require presorted data, and so the query output will be ordered
according to one or another of the window functions' PARTITION BY/ORDER BY clauses. It is not
recommended to rely on this, however. Use an explicit top-level ORDER BY clause if you want to be
sure the results are sorted in a particular way.

7.3. Select Lists


As shown in the previous section, the table expression in the SELECT command constructs an inter-
mediate virtual table by possibly combining tables, views, eliminating rows, grouping, etc. This table
is finally passed on to processing by the select list. The select list determines which columns of the
intermediate table are actually output.

7.3.1. Select-List Items


The simplest kind of select list is * which emits all columns that the table expression produces. Oth-
erwise, a select list is a comma-separated list of value expressions (as defined in Section 4.2). For
instance, it could be a list of column names:

SELECT a, b, c FROM ...

The columns names a, b, and c are either the actual names of the columns of tables referenced in the
FROM clause, or the aliases given to them as explained in Section 7.2.1.2. The name space available
in the select list is the same as in the WHERE clause, unless grouping is used, in which case it is the
same as in the HAVING clause.

If more than one table has a column of the same name, the table name must also be given, as in:

SELECT tbl1.a, tbl2.a, tbl1.b FROM ...

When working with multiple tables, it can also be useful to ask for all the columns of a particular table:

SELECT tbl1.*, tbl2.a FROM ...

See Section 8.16.5 for more about the table_name.* notation.

If an arbitrary value expression is used in the select list, it conceptually adds a new virtual column to
the returned table. The value expression is evaluated once for each result row, with the row's values

128
Queries

substituted for any column references. But the expressions in the select list do not have to reference
any columns in the table expression of the FROM clause; they can be constant arithmetic expressions,
for instance.

7.3.2. Column Labels


The entries in the select list can be assigned names for subsequent processing, such as for use in an
ORDER BY clause or for display by the client application. For example:

SELECT a AS value, b + c AS sum FROM ...

If no output column name is specified using AS, the system assigns a default column name. For simple
column references, this is the name of the referenced column. For function calls, this is the name of
the function. For complex expressions, the system will generate a generic name.

The AS key word is usually optional, but in some cases where the desired column name matches a
PostgreSQL key word, you must write AS or double-quote the column name in order to avoid ambi-
guity. (Appendix C shows which key words require AS to be used as a column label.) For example,
FROM is one such key word, so this does not work:

SELECT a from, b + c AS sum FROM ...

but either of these do:

SELECT a AS from, b + c AS sum FROM ...


SELECT a "from", b + c AS sum FROM ...

For greatest safety against possible future key word additions, it is recommended that you always
either write AS or double-quote the output column name.

Note
The naming of output columns here is different from that done in the FROM clause (see Sec-
tion 7.2.1.2). It is possible to rename the same column twice, but the name assigned in the
select list is the one that will be passed on.

7.3.3. DISTINCT
After the select list has been processed, the result table can optionally be subject to the elimination of
duplicate rows. The DISTINCT key word is written directly after SELECT to specify this:

SELECT DISTINCT select_list ...

(Instead of DISTINCT the key word ALL can be used to specify the default behavior of retaining
all rows.)

Obviously, two rows are considered distinct if they differ in at least one column value. Null values
are considered equal in this comparison.

Alternatively, an arbitrary expression can determine what rows are to be considered distinct:

129
Queries

SELECT DISTINCT ON (expression [, expression ...]) select_list ...

Here expression is an arbitrary value expression that is evaluated for all rows. A set of rows for
which all the expressions are equal are considered duplicates, and only the first row of the set is kept
in the output. Note that the “first row” of a set is unpredictable unless the query is sorted on enough
columns to guarantee a unique ordering of the rows arriving at the DISTINCT filter. (DISTINCT
ON processing occurs after ORDER BY sorting.)

The DISTINCT ON clause is not part of the SQL standard and is sometimes considered bad style
because of the potentially indeterminate nature of its results. With judicious use of GROUP BY and
subqueries in FROM, this construct can be avoided, but it is often the most convenient alternative.

7.4. Combining Queries (UNION, INTERSECT,


EXCEPT)
The results of two queries can be combined using the set operations union, intersection, and difference.
The syntax is

query1 UNION [ALL] query2


query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2

where query1 and query2 are queries that can use any of the features discussed up to this point.

UNION effectively appends the result of query2 to the result of query1 (although there is no guar-
antee that this is the order in which the rows are actually returned). Furthermore, it eliminates duplicate
rows from its result, in the same way as DISTINCT, unless UNION ALL is used.

INTERSECT returns all rows that are both in the result of query1 and in the result of query2.
Duplicate rows are eliminated unless INTERSECT ALL is used.

EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is
sometimes called the difference between two queries.) Again, duplicates are eliminated unless EX-
CEPT ALL is used.

In order to calculate the union, intersection, or difference of two queries, the two queries must be
“union compatible”, which means that they return the same number of columns and the corresponding
columns have compatible data types, as described in Section 10.5.

Set operations can be combined, for example

query1 UNION query2 EXCEPT query3

which is equivalent to

(query1 UNION query2) EXCEPT query3

As shown here, you can use parentheses to control the order of evaluation. Without parentheses,
UNION and EXCEPT associate left-to-right, but INTERSECT binds more tightly than those two op-
erators. Thus

query1 UNION query2 INTERSECT query3

means

130
Queries

query1 UNION (query2 INTERSECT query3)

You can also surround an individual query with parentheses. This is important if the query needs
to use any of the clauses discussed in following sections, such as LIMIT. Without parentheses, you'll
get a syntax error, or else the clause will be understood as applying to the output of the set operation
rather than one of its inputs. For example,

SELECT a FROM b UNION SELECT x FROM y LIMIT 10

is accepted, but it means

(SELECT a FROM b UNION SELECT x FROM y) LIMIT 10

not

SELECT a FROM b UNION (SELECT x FROM y LIMIT 10)

7.5. Sorting Rows (ORDER BY)


After a query has produced an output table (after the select list has been processed) it can optionally
be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order
in that case will depend on the scan and join plan types and the order on disk, but it must not be relied
on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.

The ORDER BY clause specifies the sort order:

SELECT select_list
FROM table_expression
ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]
[, sort_expression2 [ASC | DESC] [NULLS { FIRST |
LAST }] ...]

The sort expression(s) can be any expression that would be valid in the query's select list. An example
is:

SELECT a, b FROM table1 ORDER BY a + b, c;

When more than one expression is specified, the later values are used to sort rows that are equal
according to the earlier values. Each expression can be followed by an optional ASC or DESC keyword
to set the sort direction to ascending or descending. ASC order is the default. Ascending order puts
smaller values first, where “smaller” is defined in terms of the < operator. Similarly, descending order
is determined with the > operator. 1

The NULLS FIRST and NULLS LAST options can be used to determine whether nulls appear before
or after non-null values in the sort ordering. By default, null values sort as if larger than any non-null
value; that is, NULLS FIRST is the default for DESC order, and NULLS LAST otherwise.

Note that the ordering options are considered independently for each sort column. For example ORDER
BY x, y DESC means ORDER BY x ASC, y DESC, which is not the same as ORDER BY
x DESC, y DESC.
1
Actually, PostgreSQL uses the default B-tree operator class for the expression's data type to determine the sort ordering for ASC and DESC.
Conventionally, data types will be set up so that the < and > operators correspond to this sort ordering, but a user-defined data type's designer
could choose to do something different.

131
Queries

A sort_expression can also be the column label or number of an output column, as in:

SELECT a + b AS sum, c FROM table1 ORDER BY sum;


SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;

both of which sort by the first output column. Note that an output column name has to stand alone,
that is, it cannot be used in an expression — for example, this is not correct:

SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; --


wrong

This restriction is made to reduce ambiguity. There is still ambiguity if an ORDER BY item is a simple
name that could match either an output column name or a column from the table expression. The
output column is used in such cases. This would only cause confusion if you use AS to rename an
output column to match some other table column's name.

ORDER BY can be applied to the result of a UNION, INTERSECT, or EXCEPT combination, but in
this case it is only permitted to sort by output column names or numbers, not by expressions.

7.6. LIMIT and OFFSET


LIMIT and OFFSET allow you to retrieve just a portion of the rows that are generated by the rest
of the query:

SELECT select_list
FROM table_expression
[ ORDER BY ... ]
[ LIMIT { number | ALL } ] [ OFFSET number ]

If a limit count is given, no more than that many rows will be returned (but possibly fewer, if the
query itself yields fewer rows). LIMIT ALL is the same as omitting the LIMIT clause, as is LIMIT
with a NULL argument.

OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 is the same as
omitting the OFFSET clause, as is OFFSET with a NULL argument.

If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the
LIMIT rows that are returned.

When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a
unique order. Otherwise you will get an unpredictable subset of the query's rows. You might be asking
for the tenth through twentieth rows, but tenth through twentieth in what ordering? The ordering is
unknown, unless you specified ORDER BY.

The query optimizer takes LIMIT into account when generating query plans, so you are very likely
to get different plans (yielding different row orders) depending on what you give for LIMIT and
OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result
will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This
is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results
of a query in any particular order unless ORDER BY is used to constrain the order.

The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large
OFFSET might be inefficient.

7.7. VALUES Lists


132
Queries

VALUES provides a way to generate a “constant table” that can be used in a query without having to
actually create and populate a table on-disk. The syntax is

VALUES ( expression [, ...] ) [, ...]

Each parenthesized list of expressions generates a row in the table. The lists must all have the same
number of elements (i.e., the number of columns in the table), and corresponding entries in each
list must have compatible data types. The actual data type assigned to each column of the result is
determined using the same rules as for UNION (see Section 10.5).

As an example:

VALUES (1, 'one'), (2, 'two'), (3, 'three');

will return a table of two columns and three rows. It's effectively equivalent to:

SELECT 1 AS column1, 'one' AS column2


UNION ALL
SELECT 2, 'two'
UNION ALL
SELECT 3, 'three';

By default, PostgreSQL assigns the names column1, column2, etc. to the columns of a VALUES
table. The column names are not specified by the SQL standard and different database systems do it
differently, so it's usually better to override the default names with a table alias list, like this:

=> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t
(num,letter);
num | letter
-----+--------
1 | one
2 | two
3 | three
(3 rows)

Syntactically, VALUES followed by expression lists is treated as equivalent to:

SELECT select_list FROM table_expression

and can appear anywhere a SELECT can. For example, you can use it as part of a UNION, or attach a
sort_specification (ORDER BY, LIMIT, and/or OFFSET) to it. VALUES is most commonly
used as the data source in an INSERT command, and next most commonly as a subquery.

For more information see VALUES.

7.8. WITH Queries (Common Table Expres-


sions)
WITH provides a way to write auxiliary statements for use in a larger query. These statements, which
are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary
tables that exist just for one query. Each auxiliary statement in a WITH clause can be a SELECT,
INSERT, UPDATE, or DELETE; and the WITH clause itself is attached to a primary statement that
can also be a SELECT, INSERT, UPDATE, or DELETE.

133
Queries

7.8.1. SELECT in WITH


The basic value of SELECT in WITH is to break down complicated queries into simpler parts. An
example is:

WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM
regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

which displays per-product sales totals in only the top sales regions. The WITH clause defines two
auxiliary statements named regional_sales and top_regions, where the output of region-
al_sales is used in top_regions and the output of top_regions is used in the primary
SELECT query. This example could have been written without WITH, but we'd have needed two levels
of nested sub-SELECTs. It's a bit easier to follow this way.

7.8.2. Recursive Queries


The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature
that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE, a WITH query
can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:

WITH RECURSIVE t(n) AS (


VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION
ALL), then a recursive term, where only the recursive term can contain a reference to the query's own
output. Such a query is executed as follows:

Recursive Query Evaluation


1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows.
Include all remaining rows in the result of the recursive query, and also place them in a temporary
working table.

2. So long as the working table is not empty, repeat these steps:

a. Evaluate the recursive term, substituting the current contents of the working table for the
recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and

134
Queries

rows that duplicate any previous result row. Include all remaining rows in the result of the
recursive query, and also place them in a temporary intermediate table.

b. Replace the contents of the working table with the contents of the intermediate table, then
empty the intermediate table.

Note
While RECURSIVE allows queries to be specified recursively, internally such queries are
evaluated iteratively.

In the example above, the working table has just a single row in each step, and it takes on the values
from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE
clause, and so the query terminates.

Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example
is this query to find all the direct and indirect sub-parts of a product, given only a table that shows
immediate inclusions:

WITH RECURSIVE included_parts(sub_part, part, quantity) AS (


SELECT sub_part, part, quantity FROM parts WHERE part =
'our_product'
UNION ALL
SELECT p.sub_part, p.part, p.quantity * pr.quantity
FROM included_parts pr, parts p
WHERE p.part = pr.sub_part
)
SELECT sub_part, SUM(quantity) as total_quantity
FROM included_parts
GROUP BY sub_part

7.8.2.1. Search Order


When computing a tree traversal using a recursive query, you might want to order the results in either
depth-first or breadth-first order. This can be done by computing an ordering column alongside the
other data columns and using that to sort the results at the end. Note that this does not actually control in
which order the query evaluation visits the rows; that is as always in SQL implementation-dependent.
This approach merely provides a convenient way to order the results afterwards.

To create a depth-first order, we compute for each result row an array of rows that we have visited so
far. For example, consider the following query that searches a table tree using a link field:

WITH RECURSIVE search_tree(id, link, data) AS (


SELECT t.id, t.link, t.data
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree;

To add depth-first ordering information, you can write this:

135
Queries

WITH RECURSIVE search_tree(id, link, data, path) AS (


SELECT t.id, t.link, t.data, ARRAY[t.id]
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, path || t.id
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree ORDER BY path;

In the general case where more than one field needs to be used to identify a row, use an array of rows.
For example, if we needed to track fields f1 and f2:

WITH RECURSIVE search_tree(id, link, data, path) AS (


SELECT t.id, t.link, t.data, ARRAY[ROW(t.f1, t.f2)]
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, path || ROW(t.f1, t.f2)
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree ORDER BY path;

Tip
Omit the ROW() syntax in the common case where only one field needs to be tracked. This
allows a simple array rather than a composite-type array to be used, gaining efficiency.

To create a breadth-first order, you can add a column that tracks the depth of the search, for example:

WITH RECURSIVE search_tree(id, link, data, depth) AS (


SELECT t.id, t.link, t.data, 0
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, depth + 1
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree ORDER BY depth;

To get a stable sort, add data columns as secondary sorting columns.

Tip
The recursive query evaluation algorithm produces its output in breadth-first search order.
However, this is an implementation detail and it is perhaps unsound to rely on it. The order of
the rows within each level is certainly undefined, so some explicit ordering might be desired
in any case.

There is built-in syntax to compute a depth- or breadth-first sort column. For example:

WITH RECURSIVE search_tree(id, link, data) AS (

136
Queries

SELECT t.id, t.link, t.data


FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
) SEARCH DEPTH FIRST BY id SET ordercol
SELECT * FROM search_tree ORDER BY ordercol;

WITH RECURSIVE search_tree(id, link, data) AS (


SELECT t.id, t.link, t.data
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
) SEARCH BREADTH FIRST BY id SET ordercol
SELECT * FROM search_tree ORDER BY ordercol;

This syntax is internally expanded to something similar to the above hand-written forms. The SEARCH
clause specifies whether depth- or breadth first search is wanted, the list of columns to track for sorting,
and a column name that will contain the result data that can be used for sorting. That column will
implicitly be added to the output rows of the CTE.

7.8.2.2. Cycle Detection


When working with recursive queries it is important to be sure that the recursive part of the query will
eventually return no tuples, or else the query will loop indefinitely. Sometimes, using UNION instead
of UNION ALL can accomplish this by discarding rows that duplicate previous output rows. However,
often a cycle does not involve output rows that are completely duplicate: it may be necessary to check
just one or a few fields to see if the same point has been reached before. The standard method for
handling such situations is to compute an array of the already-visited values. For example, consider
again the following query that searches a table graph using a link field:

WITH RECURSIVE search_graph(id, link, data, depth) AS (


SELECT g.id, g.link, g.data, 0
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
FROM graph g, search_graph sg
WHERE g.id = sg.link
)
SELECT * FROM search_graph;

This query will loop if the link relationships contain cycles. Because we require a “depth” output,
just changing UNION ALL to UNION would not eliminate the looping. Instead we need to recognize
whether we have reached the same row again while following a particular path of links. We add two
columns is_cycle and path to the loop-prone query:

WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path)


AS (
SELECT g.id, g.link, g.data, 0,
false,
ARRAY[g.id]
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,

137
Queries

g.id = ANY(path),
path || g.id
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT is_cycle
)
SELECT * FROM search_graph;

Aside from preventing cycles, the array value is often useful in its own right as representing the “path”
taken to reach any particular row.

In the general case where more than one field needs to be checked to recognize a cycle, use an array
of rows. For example, if we needed to compare fields f1 and f2:

WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path)


AS (
SELECT g.id, g.link, g.data, 0,
false,
ARRAY[ROW(g.f1, g.f2)]
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
ROW(g.f1, g.f2) = ANY(path),
path || ROW(g.f1, g.f2)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT is_cycle
)
SELECT * FROM search_graph;

Tip
Omit the ROW() syntax in the common case where only one field needs to be checked to
recognize a cycle. This allows a simple array rather than a composite-type array to be used,
gaining efficiency.

There is built-in syntax to simplify cycle detection. The above query can also be written like this:

WITH RECURSIVE search_graph(id, link, data, depth) AS (


SELECT g.id, g.link, g.data, 1
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
FROM graph g, search_graph sg
WHERE g.id = sg.link
) CYCLE id SET is_cycle USING path
SELECT * FROM search_graph;

and it will be internally rewritten to the above form. The CYCLE clause specifies first the list of
columns to track for cycle detection, then a column name that will show whether a cycle has been
detected, and finally the name of another column that will track the path. The cycle and path columns
will implicitly be added to the output rows of the CTE.

Tip
The cycle path column is computed in the same way as the depth-first ordering column show in
the previous section. A query can have both a SEARCH and a CYCLE clause, but a depth-first

138
Queries

search specification and a cycle detection specification would create redundant computations,
so it's more efficient to just use the CYCLE clause and order by the path column. If breadth-
first ordering is wanted, then specifying both SEARCH and CYCLE can be useful.

A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT in
the parent query. For example, this query would loop forever without the LIMIT:

WITH RECURSIVE t(n) AS (


SELECT 1
UNION ALL
SELECT n+1 FROM t
)
SELECT n FROM t LIMIT 100;

This works because PostgreSQL's implementation evaluates only as many rows of a WITH query as
are actually fetched by the parent query. Using this trick in production is not recommended, because
other systems might work differently. Also, it usually won't work if you make the outer query sort the
recursive query's results or join them to some other table, because in such cases the outer query will
usually try to fetch all of the WITH query's output anyway.

7.8.3. Common Table Expression Materialization


A useful property of WITH queries is that they are normally evaluated only once per execution of the
parent query, even if they are referred to more than once by the parent query or sibling WITH queries.
Thus, expensive calculations that are needed in multiple places can be placed within a WITH query
to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations
of functions with side-effects. However, the other side of this coin is that the optimizer is not able to
push restrictions from the parent query down into a multiply-referenced WITH query, since that might
affect all uses of the WITH query's output when it should affect only one. The multiply-referenced
WITH query will be evaluated as written, without suppression of rows that the parent query might
discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the
query demand only a limited number of rows.)

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT contain-
ing no volatile functions) then it can be folded into the parent query, allowing joint optimization of
the two query levels. By default, this happens if the parent query references the WITH query just
once, but not if it references the WITH query more than once. You can override that decision by
specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT
MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate com-
putation of the WITH query, but it can still give a net savings if each usage of the WITH query needs
only a small part of the WITH query's full output.

A simple example of these rules is

WITH w AS (
SELECT * FROM big_table
)
SELECT * FROM w WHERE key = 123;

This WITH query will be folded, producing the same execution plan as

SELECT * FROM big_table WHERE key = 123;

In particular, if there's an index on key, it will probably be used to fetch just the rows having key
= 123. On the other hand, in

139
Queries

WITH w AS (
SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;

the WITH query will be materialized, producing a temporary copy of big_table that is then joined
with itself — without benefit of any index. This query will be executed much more efficiently if
written as

WITH w AS NOT MATERIALIZED (


SELECT * FROM big_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
WHERE w2.key = 123;

so that the parent query's restrictions can be applied directly to scans of big_table.

An example where NOT MATERIALIZED could be undesirable is

WITH w AS (
SELECT key, very_expensive_function(val) as f FROM some_table
)
SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;

Here, materialization of the WITH query ensures that very_expensive_function is evaluated


only once per table row, not twice.

The examples above only show WITH being used with SELECT, but it can be attached in the same
way to INSERT, UPDATE, or DELETE. In each case it effectively provides temporary table(s) that
can be referred to in the main command.

7.8.4. Data-Modifying Statements in WITH


You can use data-modifying statements (INSERT, UPDATE, or DELETE) in WITH. This allows you
to perform several different operations in the same query. An example is:

WITH moved_rows AS (
DELETE FROM products
WHERE
"date" >= '2010-10-01' AND
"date" < '2010-11-01'
RETURNING *
)
INSERT INTO products_log
SELECT * FROM moved_rows;

This query effectively moves rows from products to products_log. The DELETE in WITH
deletes the specified rows from products, returning their contents by means of its RETURNING
clause; and then the primary query reads that output and inserts it into products_log.

A fine point of the above example is that the WITH clause is attached to the INSERT, not the sub-
SELECT within the INSERT. This is necessary because data-modifying statements are only allowed
in WITH clauses that are attached to the top-level statement. However, normal WITH visibility rules
apply, so it is possible to refer to the WITH statement's output from the sub-SELECT.

140
Queries

Data-modifying statements in WITH usually have RETURNING clauses (see Section 6.4), as shown
in the example above. It is the output of the RETURNING clause, not the target table of the data-mod-
ifying statement, that forms the temporary table that can be referred to by the rest of the query. If a
data-modifying statement in WITH lacks a RETURNING clause, then it forms no temporary table and
cannot be referred to in the rest of the query. Such a statement will be executed nonetheless. A not-
particularly-useful example is:

WITH t AS (
DELETE FROM foo
)
DELETE FROM bar;

This example would remove all rows from tables foo and bar. The number of affected rows reported
to the client would only include rows removed from bar.

Recursive self-references in data-modifying statements are not allowed. In some cases it is possible
to work around this limitation by referring to the output of a recursive WITH, for example:

WITH RECURSIVE included_parts(sub_part, part) AS (


SELECT sub_part, part FROM parts WHERE part = 'our_product'
UNION ALL
SELECT p.sub_part, p.part
FROM included_parts pr, parts p
WHERE p.part = pr.sub_part
)
DELETE FROM parts
WHERE part IN (SELECT part FROM included_parts);

This query would remove all direct and indirect subparts of a product.

Data-modifying statements in WITH are executed exactly once, and always to completion, indepen-
dently of whether the primary query reads all (or indeed any) of their output. Notice that this is differ-
ent from the rule for SELECT in WITH: as stated in the previous section, execution of a SELECT is
carried only as far as the primary query demands its output.

The sub-statements in WITH are executed concurrently with each other and with the main query.
Therefore, when using data-modifying statements in WITH, the order in which the specified updates
actually happen is unpredictable. All the statements are executed with the same snapshot (see Chap-
ter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the
unpredictability of the actual order of row updates, and means that RETURNING data is the only way
to communicate changes between different WITH sub-statements and the main query. An example of
this is that in

WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM products;

the outer SELECT would return the original prices before the action of the UPDATE, while in

WITH t AS (
UPDATE products SET price = price * 1.05
RETURNING *
)
SELECT * FROM t;

141
Queries

the outer SELECT would return the updated data.

Trying to update the same row twice in a single statement is not supported. Only one of the modifica-
tions takes place, but it is not easy (and sometimes not possible) to reliably predict which one. This also
applies to deleting a row that was already updated in the same statement: only the update is performed.
Therefore you should generally avoid trying to modify a single row twice in a single statement. In
particular avoid writing WITH sub-statements that could affect the same rows changed by the main
statement or a sibling sub-statement. The effects of such a statement will not be predictable.

At present, any table used as the target of a data-modifying statement in WITH must not have a con-
ditional rule, nor an ALSO rule, nor an INSTEAD rule that expands to multiple statements.

142
Chapter 8. Data Types
PostgreSQL has a rich set of native data types available to users. Users can add new types to Post-
greSQL using the CREATE TYPE command.

Table 8.1 shows all the built-in general-purpose data types. Most of the alternative names listed in
the “Aliases” column are the names used internally by PostgreSQL for historical reasons. In addition,
some internally used or deprecated types are available, but are not listed here.

Table 8.1. Data Types


Name Aliases Description
bigint int8 signed eight-byte integer
bigserial serial8 autoincrementing eight-byte integer
bit [ (n) ] fixed-length bit string
bit varying [ (n) ] varbit variable-length bit string
[ (n) ]
boolean bool logical Boolean (true/false)
box rectangular box on a plane
bytea binary data (“byte array”)
character [ (n) ] char [ (n) ] fixed-length character string
character varying [ (n) ] varchar variable-length character string
[ (n) ]
cidr IPv4 or IPv6 network address
circle circle on a plane
date calendar date (year, month, day)
double precision float8 double precision floating-point num-
ber (8 bytes)
inet IPv4 or IPv6 host address
integer int, int4 signed four-byte integer
interval [ fields ] time span
[ (p) ]
json textual JSON data
jsonb binary JSON data, decomposed
line infinite line on a plane
lseg line segment on a plane
macaddr MAC (Media Access Control) address
macaddr8 MAC (Media Access Control) address
(EUI-64 format)
money currency amount
numeric [ (p, s) ] decimal exact numeric of selectable precision
[ (p, s) ]
path geometric path on a plane
pg_lsn PostgreSQL Log Sequence Number
pg_snapshot user-level transaction ID snapshot
point geometric point on a plane

143
Data Types

Name Aliases Description


polygon closed geometric path on a plane
real float4 single precision floating-point number
(4 bytes)
smallint int2 signed two-byte integer
smallserial serial2 autoincrementing two-byte integer
serial serial4 autoincrementing four-byte integer
text variable-length character string
time [ (p) ] [ without time of day (no time zone)
time zone ]
time [ (p) ] with time timetz time of day, including time zone
zone
timestamp [ (p) ] [ with- date and time (no time zone)
out time zone ]
timestamp [ (p) ] with timestamptz date and time, including time zone
time zone
tsquery text search query
tsvector text search document
txid_snapshot user-level transaction ID snapshot
(deprecated; see pg_snapshot)
uuid universally unique identifier
xml XML data

Compatibility
The following types (or spellings thereof) are specified by SQL: bigint, bit, bit vary-
ing, boolean, char, character varying, character, varchar, date, dou-
ble precision, integer, interval, numeric, decimal, real, smallint,
time (with or without time zone), timestamp (with or without time zone), xml.

Each data type has an external representation determined by its input and output functions. Many of the
built-in types have obvious external formats. However, several types are either unique to PostgreSQL,
such as geometric paths, or have several possible formats, such as the date and time types. Some of the
input and output functions are not invertible, i.e., the result of an output function might lose accuracy
when compared to the original input.

8.1. Numeric Types


Numeric types consist of two-, four-, and eight-byte integers, four- and eight-byte floating-point num-
bers, and selectable-precision decimals. Table 8.2 lists the available types.

Table 8.2. Numeric Types


Name Storage Size Description Range
smallint 2 bytes small-range integer -32768 to +32767
integer 4 bytes typical choice for integer -2147483648 to
+2147483647
bigint 8 bytes large-range integer -9223372036854775808 to
+9223372036854775807

144
Data Types

Name Storage Size Description Range


decimal variable user-specified precision, up to 131072 digits before
exact the decimal point; up to
16383 digits after the deci-
mal point
numeric variable user-specified precision, up to 131072 digits before
exact the decimal point; up to
16383 digits after the deci-
mal point
real 4 bytes variable-precision, inexact 6 decimal digits precision
double precision 8 bytes variable-precision, inexact 15 decimal digits precision
smallserial 2 bytes small autoincrementing in- 1 to 32767
teger
serial 4 bytes autoincrementing integer 1 to 2147483647
bigserial 8 bytes large autoincrementing in- 1 to
teger 9223372036854775807

The syntax of constants for the numeric types is described in Section 4.1.2. The numeric types have a
full set of corresponding arithmetic operators and functions. Refer to Chapter 9 for more information.
The following sections describe the types in detail.

8.1.1. Integer Types


The types smallint, integer, and bigint store whole numbers, that is, numbers without frac-
tional components, of various ranges. Attempts to store values outside of the allowed range will result
in an error.

The type integer is the common choice, as it offers the best balance between range, storage size, and
performance. The smallint type is generally only used if disk space is at a premium. The bigint
type is designed to be used when the range of the integer type is insufficient.

SQL only specifies the integer types integer (or int), smallint, and bigint. The type names
int2, int4, and int8 are extensions, which are also used by some other SQL database systems.

8.1.2. Arbitrary Precision Numbers


The type numeric can store numbers with a very large number of digits. It is especially recommend-
ed for storing monetary amounts and other quantities where exactness is required. Calculations with
numeric values yield exact results where possible, e.g., addition, subtraction, multiplication. How-
ever, calculations on numeric values are very slow compared to the integer types, or to the float-
ing-point types described in the next section.

We use the following terms below: The precision of a numeric is the total count of significant digits
in the whole number, that is, the number of digits to both sides of the decimal point. The scale of a
numeric is the count of decimal digits in the fractional part, to the right of the decimal point. So the
number 23.5141 has a precision of 6 and a scale of 4. Integers can be considered to have a scale of zero.

Both the maximum precision and the maximum scale of a numeric column can be configured. To
declare a column of type numeric use the syntax:

NUMERIC(precision, scale)

The precision must be positive, the scale zero or positive. Alternatively:

145
Data Types

NUMERIC(precision)

selects a scale of 0. Specifying:

NUMERIC

without any precision or scale creates an “unconstrained numeric” column in which numeric values
of any length can be stored, up to the implementation limits. A column of this kind will not coerce
input values to any particular scale, whereas numeric columns with a declared scale will coerce
input values to that scale. (The SQL standard requires a default scale of 0, i.e., coercion to integer
precision. We find this a bit useless. If you're concerned about portability, always specify the precision
and scale explicitly.)

Note
The maximum precision that can be explicitly specified in a NUMERIC type declaration is
1000. An unconstrained NUMERIC column is subject to the limits described in Table 8.2.

If the scale of a value to be stored is greater than the declared scale of the column, the system will
round the value to the specified number of fractional digits. Then, if the number of digits to the left of
the decimal point exceeds the declared precision minus the declared scale, an error is raised.

Numeric values are physically stored without any extra leading or trailing zeroes. Thus, the declared
precision and scale of a column are maximums, not fixed allocations. (In this sense the numeric
type is more akin to varchar(n) than to char(n).) The actual storage requirement is two bytes
for each group of four decimal digits, plus three to eight bytes overhead.

In addition to ordinary numeric values, the numeric type has several special values:

Infinity
-Infinity
NaN

These are adapted from the IEEE 754 standard, and represent “infinity”, “negative infinity”, and “not-
a-number”, respectively. When writing these values as constants in an SQL command, you must put
quotes around them, for example UPDATE table SET x = '-Infinity'. On input, these
strings are recognized in a case-insensitive manner. The infinity values can alternatively be spelled
inf and -inf.

The infinity values behave as per mathematical expectations. For example, Infinity plus any finite
value equals Infinity, as does Infinity plus Infinity; but Infinity minus Infinity
yields NaN (not a number), because it has no well-defined interpretation. Note that an infinity can only
be stored in an unconstrained numeric column, because it notionally exceeds any finite precision
limit.

The NaN (not a number) value is used to represent undefined calculational results. In general, any
operation with a NaN input yields another NaN. The only exception is when the operation's other inputs
are such that the same output would be obtained if the NaN were to be replaced by any finite or infinite
numeric value; then, that output value is used for NaN too. (An example of this principle is that NaN
raised to the zero power yields one.)

Note
In most implementations of the “not-a-number” concept, NaN is not considered equal to any
other numeric value (including NaN). In order to allow numeric values to be sorted and used

146
Data Types

in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN
values.

The types decimal and numeric are equivalent. Both types are part of the SQL standard.

When rounding values, the numeric type rounds ties away from zero, while (on most machines) the
real and double precision types round ties to the nearest even number. For example:

SELECT x,
round(x::numeric) AS num_round,
round(x::double precision) AS dbl_round
FROM generate_series(-3.5, 3.5, 1) as x;
x | num_round | dbl_round
------+-----------+-----------
-3.5 | -4 | -4
-2.5 | -3 | -2
-1.5 | -2 | -2
-0.5 | -1 | -0
0.5 | 1 | 0
1.5 | 2 | 2
2.5 | 3 | 2
3.5 | 4 | 4
(8 rows)

8.1.3. Floating-Point Types


The data types real and double precision are inexact, variable-precision numeric types. On
all currently supported platforms, these types are implementations of IEEE Standard 754 for Binary
Floating-Point Arithmetic (single and double precision, respectively), to the extent that the underlying
processor, operating system, and compiler support it.

Inexact means that some values cannot be converted exactly to the internal format and are stored as
approximations, so that storing and retrieving a value might show slight discrepancies. Managing these
errors and how they propagate through calculations is the subject of an entire branch of mathematics
and computer science and will not be discussed here, except for the following points:

• If you require exact storage and calculations (such as for monetary amounts), use the numeric
type instead.

• If you want to do complicated calculations with these types for anything important, especially if
you rely on certain behavior in boundary cases (infinity, underflow), you should evaluate the im-
plementation carefully.

• Comparing two floating-point values for equality might not always work as expected.

On all currently supported platforms, the real type has a range of around 1E-37 to 1E+37 with a
precision of at least 6 decimal digits. The double precision type has a range of around 1E-307
to 1E+308 with a precision of at least 15 digits. Values that are too large or too small will cause an
error. Rounding might take place if the precision of an input number is too high. Numbers too close
to zero that are not representable as distinct from zero will cause an underflow error.

By default, floating point values are output in text form in their shortest precise decimal representa-
tion; the decimal value produced is closer to the true stored binary value than to any other value rep-
resentable in the same binary precision. (However, the output value is currently never exactly midway
between two representable values, in order to avoid a widespread bug where input routines do not
properly respect the round-to-nearest-even rule.) This value will use at most 17 significant decimal
digits for float8 values, and at most 9 digits for float4 values.

147
Data Types

Note
This shortest-precise output format is much faster to generate than the historical rounded for-
mat.

For compatibility with output generated by older versions of PostgreSQL, and to allow the output
precision to be reduced, the extra_float_digits parameter can be used to select rounded decimal output
instead. Setting a value of 0 restores the previous default of rounding the value to 6 (for float4)
or 15 (for float8) significant decimal digits. Setting a negative value reduces the number of digits
further; for example -2 would round output to 4 or 13 digits respectively.

Any value of extra_float_digits greater than 0 selects the shortest-precise format.

Note
Applications that wanted precise values have historically had to set extra_float_digits to 3 to
obtain them. For maximum compatibility between versions, they should continue to do so.

In addition to ordinary numeric values, the floating-point types have several special values:

Infinity
-Infinity
NaN

These represent the IEEE 754 special values “infinity”, “negative infinity”, and “not-a-number”, re-
spectively. When writing these values as constants in an SQL command, you must put quotes around
them, for example UPDATE table SET x = '-Infinity'. On input, these strings are recog-
nized in a case-insensitive manner. The infinity values can alternatively be spelled inf and -inf.

Note
IEEE 754 specifies that NaN should not compare equal to any other floating-point value (in-
cluding NaN). In order to allow floating-point values to be sorted and used in tree-based in-
dexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

PostgreSQL also supports the SQL-standard notations float and float(p) for specifying inexact
numeric types. Here, p specifies the minimum acceptable precision in binary digits. PostgreSQL ac-
cepts float(1) to float(24) as selecting the real type, while float(25) to float(53)
select double precision. Values of p outside the allowed range draw an error. float with no
precision specified is taken to mean double precision.

8.1.4. Serial Types


Note
This section describes a PostgreSQL-specific way to create an autoincrementing column. An-
other way is to use the SQL-standard identity column feature, described at CREATE TABLE.

The data types smallserial, serial and bigserial are not true types, but merely a notation-
al convenience for creating unique identifier columns (similar to the AUTO_INCREMENT property
supported by some other databases). In the current implementation, specifying:

148
Data Types

CREATE TABLE tablename (


colname SERIAL
);

is equivalent to specifying:

CREATE SEQUENCE tablename_colname_seq AS integer;


CREATE TABLE tablename (
colname integer NOT NULL DEFAULT
nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;

Thus, we have created an integer column and arranged for its default values to be assigned from a
sequence generator. A NOT NULL constraint is applied to ensure that a null value cannot be inserted.
(In most cases you would also want to attach a UNIQUE or PRIMARY KEY constraint to prevent
duplicate values from being inserted by accident, but this is not automatic.) Lastly, the sequence is
marked as “owned by” the column, so that it will be dropped if the column or table is dropped.

Note
Because smallserial, serial and bigserial are implemented using sequences, there
may be "holes" or gaps in the sequence of values which appears in the column, even if no rows
are ever deleted. A value allocated from the sequence is still "used up" even if a row containing
that value is never successfully inserted into the table column. This may happen, for example,
if the inserting transaction rolls back. See nextval() in Section 9.17 for details.

To insert the next value of the sequence into the serial column, specify that the serial column
should be assigned its default value. This can be done either by excluding the column from the list of
columns in the INSERT statement, or through the use of the DEFAULT key word.

The type names serial and serial4 are equivalent: both create integer columns. The type
names bigserial and serial8 work the same way, except that they create a bigint column.
bigserial should be used if you anticipate the use of more than 231 identifiers over the lifetime of
the table. The type names smallserial and serial2 also work the same way, except that they
create a smallint column.

The sequence created for a serial column is automatically dropped when the owning column is
dropped. You can drop the sequence without dropping the column, but this will force removal of the
column default expression.

8.2. Monetary Types


The money type stores a currency amount with a fixed fractional precision; see Table 8.3. The frac-
tional precision is determined by the database's lc_monetary setting. The range shown in the table
assumes there are two fractional digits. Input is accepted in a variety of formats, including integer
and floating-point literals, as well as typical currency formatting, such as '$1,000.00'. Output is
generally in the latter form but depends on the locale.

Table 8.3. Monetary Types


Name Storage Size Description Range
money 8 bytes currency amount -92233720368547758.08
to
+92233720368547758.07

149
Data Types

Since the output of this data type is locale-sensitive, it might not work to load money data into a
database that has a different setting of lc_monetary. To avoid problems, before restoring a dump
into a new database make sure lc_monetary has the same or equivalent value as in the database
that was dumped.

Values of the numeric, int, and bigint data types can be cast to money. Conversion from the
real and double precision data types can be done by casting to numeric first, for example:

SELECT '12.34'::float8::numeric::money;

However, this is not recommended. Floating point numbers should not be used to handle money due
to the potential for rounding errors.

A money value can be cast to numeric without loss of precision. Conversion to other types could
potentially lose precision, and must also be done in two stages:

SELECT '52093.89'::money::numeric::float8;

Division of a money value by an integer value is performed with truncation of the fractional part
towards zero. To get a rounded result, divide by a floating-point value, or cast the money value to
numeric before dividing and back to money afterwards. (The latter is preferable to avoid risking
precision loss.) When a money value is divided by another money value, the result is double pre-
cision (i.e., a pure number, not money); the currency units cancel each other out in the division.

8.3. Character Types


Table 8.4. Character Types
Name Description
character varying(n), varchar(n) variable-length with limit
character(n), char(n) fixed-length, blank padded
text variable unlimited length

Table 8.4 shows the general-purpose character types available in PostgreSQL.

SQL defines two primary character types: character varying(n) and character(n), where
n is a positive integer. Both of these types can store strings up to n characters (not bytes) in length. An
attempt to store a longer string into a column of these types will result in an error, unless the excess
characters are all spaces, in which case the string will be truncated to the maximum length. (This
somewhat bizarre exception is required by the SQL standard.) If the string to be stored is shorter than
the declared length, values of type character will be space-padded; values of type character
varying will simply store the shorter string.

If one explicitly casts a value to character varying(n) or character(n), then an over-


length value will be truncated to n characters without raising an error. (This too is required by the
SQL standard.)

The notations varchar(n) and char(n) are aliases for character varying(n) and char-
acter(n), respectively. If specified, the length must be greater than zero and cannot exceed
10485760. character without length specifier is equivalent to character(1). If character
varying is used without length specifier, the type accepts strings of any size. The latter is a Post-
greSQL extension.

In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type
text is not in the SQL standard, several other SQL database management systems have it as well.

150
Data Types

Values of type character are physically padded with spaces to the specified width n, and are stored
and displayed that way. However, trailing spaces are treated as semantically insignificant and disre-
garded when comparing two values of type character. In collations where whitespace is signifi-
cant, this behavior can produce unexpected results; for example SELECT 'a '::CHAR(2) col-
late "C" < E'a\n'::CHAR(2) returns true, even though C locale would consider a space
to be greater than a newline. Trailing spaces are removed when converting a character value to
one of the other string types. Note that trailing spaces are semantically significant in character
varying and text values, and when using pattern matching, that is LIKE and regular expressions.

The characters that can be stored in any of these data types are determined by the database character set,
which is selected when the database is created. Regardless of the specific character set, the character
with code zero (sometimes called NUL) cannot be stored. For more information refer to Section 24.3.

The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which
includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead
of 1. Long strings are compressed by the system automatically, so the physical requirement on disk
might be less. Very long values are also stored in background tables so that they do not interfere with
rapid access to shorter column values. In any case, the longest possible character string that can be
stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less
than that. It wouldn't be useful to change this because with multibyte character encodings the number
of characters and bytes can be quite different. If you desire to store long strings with no specific upper
limit, use text or character varying without a length specifier, rather than making up an
arbitrary length limit.)

Tip
There is no performance difference among these three types, apart from increased storage
space when using the blank-padded type, and a few extra CPU cycles to check the length
when storing into a length-constrained column. While character(n) has performance ad-
vantages in some other database systems, there is no such advantage in PostgreSQL; in fact
character(n) is usually the slowest of the three because of its additional storage costs. In
most situations text or character varying should be used instead.

Refer to Section 4.1.2.1 for information about the syntax of string literals, and to Chapter 9 for infor-
mation about available operators and functions.

Example 8.1. Using the Character Types

CREATE TABLE test1 (a character(4));


INSERT INTO test1 VALUES ('ok');
SELECT a, char_length(a) FROM test1; -- 1

a | char_length
------+-------------
ok | 2

CREATE TABLE test2 (b varchar(5));


INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good ');
INSERT INTO test2 VALUES ('too long');
ERROR: value too long for type character varying(5)
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit
truncation
SELECT b, char_length(b) FROM test2;

151
Data Types

b | char_length
-------+-------------
ok | 2
good | 5
too l | 5

1 The char_length function is discussed in Section 9.4.

There are two other fixed-length character types in PostgreSQL, shown in Table 8.5. The name type
exists only for the storage of identifiers in the internal system catalogs and is not intended for use
by the general user. Its length is currently defined as 64 bytes (63 usable characters plus terminator)
but should be referenced using the constant NAMEDATALEN in C source code. The length is set at
compile time (and is therefore adjustable for special uses); the default maximum length might change
in a future release. The type "char" (note the quotes) is different from char(1) in that it only uses
one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.

Table 8.5. Special Character Types


Name Storage Size Description
"char" 1 byte single-byte internal type
name 64 bytes internal type for object names

8.4. Binary Data Types


The bytea data type allows storage of binary strings; see Table 8.6.

Table 8.6. Binary Data Types


Name Storage Size Description
bytea 1 or 4 bytes plus the actual binary string variable-length binary string

A binary string is a sequence of octets (or bytes). Binary strings are distinguished from character
strings in two ways. First, binary strings specifically allow storing octets of value zero and other “non-
printable” octets (usually, octets outside the decimal range 32 to 126). Character strings disallow zero
octets, and also disallow any other octet values and sequences of octet values that are invalid according
to the database's selected character set encoding. Second, operations on binary strings process the
actual bytes, whereas the processing of character strings depends on locale settings. In short, binary
strings are appropriate for storing data that the programmer thinks of as “raw bytes”, whereas character
strings are appropriate for storing text.

The bytea type supports two formats for input and output: “hex” format and PostgreSQL's histori-
cal “escape” format. Both of these are always accepted on input. The output format depends on the
configuration parameter bytea_output; the default is hex. (Note that the hex format was introduced in
PostgreSQL 9.0; earlier versions and some tools don't understand it.)

The SQL standard defines a different binary string type, called BLOB or BINARY LARGE OBJECT.
The input format is different from bytea, but the provided functions and operators are mostly the
same.

8.4.1. bytea Hex Format


The “hex” format encodes binary data as 2 hexadecimal digits per byte, most significant nibble first.
The entire string is preceded by the sequence \x (to distinguish it from the escape format). In some
contexts, the initial backslash may need to be escaped by doubling it (see Section 4.1.2.1). For input,
the hexadecimal digits can be either upper or lower case, and whitespace is permitted between digit
pairs (but not within a digit pair nor in the starting \x sequence). The hex format is compatible with a

152
Data Types

wide range of external applications and protocols, and it tends to be faster to convert than the escape
format, so its use is preferred.

Example:

SET bytea_output = 'hex';

SELECT '\xDEADBEEF'::bytea;
bytea
------------
\xdeadbeef

8.4.2. bytea Escape Format


The “escape” format is the traditional PostgreSQL format for the bytea type. It takes the approach of
representing a binary string as a sequence of ASCII characters, while converting those bytes that cannot
be represented as an ASCII character into special escape sequences. If, from the point of view of the
application, representing bytes as characters makes sense, then this representation can be convenient.
But in practice it is usually confusing because it fuzzes up the distinction between binary strings and
character strings, and also the particular escape mechanism that was chosen is somewhat unwieldy.
Therefore, this format should probably be avoided for most new applications.

When entering bytea values in escape format, octets of certain values must be escaped, while all
octet values can be escaped. In general, to escape an octet, convert it into its three-digit octal value and
precede it by a backslash. Backslash itself (octet decimal value 92) can alternatively be represented
by double backslashes. Table 8.7 shows the characters that must be escaped, and gives the alternative
escape sequences where applicable.

Table 8.7. bytea Literal Escaped Octets


Decimal Octet Description Escaped Input Example Hex Representa-
Value Representation tion
0 zero octet '\000' '\000'::bytea \x00
39 single quote '''' or ''''::bytea \x27
'\047'
92 backslash '\\' or '\\'::bytea \x5c
'\134'
0 to 31 and 127 “non-printable” '\xxx' (octal '\001'::bytea \x01
to 255 octets value)

The requirement to escape non-printable octets varies depending on locale settings. In some instances
you can get away with leaving them unescaped.

The reason that single quotes must be doubled, as shown in Table 8.7, is that this is true for any string
literal in an SQL command. The generic string-literal parser consumes the outermost single quotes
and reduces any pair of single quotes to one data character. What the bytea input function sees is just
one single quote, which it treats as a plain data character. However, the bytea input function treats
backslashes as special, and the other behaviors shown in Table 8.7 are implemented by that function.

In some contexts, backslashes must be doubled compared to what is shown above, because the generic
string-literal parser will also reduce pairs of backslashes to one data character; see Section 4.1.2.1.

Bytea octets are output in hex format by default. If you change bytea_output to escape, “non-
printable” octets are converted to their equivalent three-digit octal value and preceded by one back-
slash. Most “printable” octets are output by their standard representation in the client character set, e.g.:

153
Data Types

SET bytea_output = 'escape';

SELECT 'abc \153\154\155 \052\251\124'::bytea;


bytea
----------------
abc klm *\251T

The octet with decimal value 92 (backslash) is doubled in the output. Details are in Table 8.8.

Table 8.8. bytea Output Escaped Octets


Decimal Octet Description Escaped Output Example Output Result
Value Representation
92 backslash \\ '\134'::bytea \\
0 to 31 and 127 “non-printable” \xxx (octal val- '\001'::bytea \001
to 255 octets ue)
32 to 126 “printable” octets client character '\176'::bytea ~
set representation

Depending on the front end to PostgreSQL you use, you might have additional work to do in terms of
escaping and unescaping bytea strings. For example, you might also have to escape line feeds and
carriage returns if your interface automatically translates these.

8.5. Date/Time Types


PostgreSQL supports the full set of SQL date and time types, shown in Table 8.9. The operations
available on these data types are described in Section 9.9. Dates are counted according to the Gregorian
calendar, even in years before that calendar was introduced (see Section B.6 for more information).

Table 8.9. Date/Time Types


Name Storage Size Description Low Value High Value Resolution
timestamp 8 bytes both date and 4713 BC 294276 AD 1 microsecond
[ (p) ] time (no time
[ with- zone)
out time
zone ]
timestamp 8 bytes both date and 4713 BC 294276 AD 1 microsecond
[ (p) ] time, with time
with time zone
zone
date 4 bytes date (no time 4713 BC 5874897 AD 1 day
of day)
time 8 bytes time of day (no 00:00:00 24:00:00 1 microsecond
[ (p) ] date)
[ with-
out time
zone ]
time 12 bytes time of day 00:00:00+1559 24:00:00-1559 1 microsecond
[ (p) ] (no date), with
with time time zone
zone
interval 16 bytes time interval -178000000 178000000 1 microsecond
[ fields ] years years
[ (p) ]

154
Data Types

Note
The SQL standard requires that writing just timestamp be equivalent to timestamp
without time zone, and PostgreSQL honors that behavior. timestamptz is accepted
as an abbreviation for timestamp with time zone; this is a PostgreSQL extension.

time, timestamp, and interval accept an optional precision value p which specifies the number
of fractional digits retained in the seconds field. By default, there is no explicit bound on precision.
The allowed range of p is from 0 to 6.

The interval type has an additional option, which is to restrict the set of stored fields by writing
one of these phrases:

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND

Note that if both fields and p are specified, the fields must include SECOND, since the precision
applies only to the seconds.

The type time with time zone is defined by the SQL standard, but the definition exhibits prop-
erties which lead to questionable usefulness. In most cases, a combination of date, time, time-
stamp without time zone, and timestamp with time zone should provide a complete
range of date/time functionality required by any application.

8.5.1. Date/Time Input


Date and time input is accepted in almost any reasonable format, including ISO 8601, SQL-compati-
ble, traditional POSTGRES, and others. For some formats, ordering of day, month, and year in date
input is ambiguous and there is support for specifying the expected ordering of these fields. Set the
DateStyle parameter to MDY to select month-day-year interpretation, DMY to select day-month-year
interpretation, or YMD to select year-month-day interpretation.

PostgreSQL is more flexible in handling date/time input than the SQL standard requires. See Appen-
dix B for the exact parsing rules of date/time input and for the recognized text fields including months,
days of the week, and time zones.

Remember that any date or time literal input needs to be enclosed in single quotes, like text strings.
Refer to Section 4.1.2.7 for more information. SQL requires the following syntax

type [ (p) ] 'value'

where p is an optional precision specification giving the number of fractional digits in the seconds
field. Precision can be specified for time, timestamp, and interval types, and can range from
0 to 6. If no precision is specified in a constant specification, it defaults to the precision of the literal
value (but not more than 6 digits).

155
Data Types

8.5.1.1. Dates
Table 8.10 shows some possible inputs for the date type.

Table 8.10. Date Input


Example Description
1999-01-08 ISO 8601; January 8 in any mode (recommended format)
January 8, 1999 unambiguous in any datestyle input mode
1/8/1999 January 8 in MDY mode; August 1 in DMY mode
1/18/1999 January 18 in MDY mode; rejected in other modes
01/02/03 January 2, 2003 in MDY mode; February 1, 2003 in DMY mode;
February 3, 2001 in YMD mode
1999-Jan-08 January 8 in any mode
Jan-08-1999 January 8 in any mode
08-Jan-1999 January 8 in any mode
99-Jan-08 January 8 in YMD mode, else error
08-Jan-99 January 8, except error in YMD mode
Jan-08-99 January 8, except error in YMD mode
19990108 ISO 8601; January 8, 1999 in any mode
990108 ISO 8601; January 8, 1999 in any mode
1999.008 year and day of year
J2451187 Julian date
January 8, 99 BC year 99 BC

8.5.1.2. Times
The time-of-day types are time [ (p) ] without time zone and time [ (p) ] with
time zone. time alone is equivalent to time without time zone.

Valid input for these types consists of a time of day followed by an optional time zone. (See Table 8.11
and Table 8.12.) If a time zone is specified in the input for time without time zone, it is silently
ignored. You can also specify a date but it will be ignored, except when you use a time zone name
that involves a daylight-savings rule, such as America/New_York. In this case specifying the date
is required in order to determine whether standard or daylight-savings time applies. The appropriate
time zone offset is recorded in the time with time zone value.

Table 8.11. Time Input


Example Description
04:05:06.789 ISO 8601
04:05:06 ISO 8601
04:05 ISO 8601
040506 ISO 8601
04:05 AM same as 04:05; AM does not affect
value
04:05 PM same as 16:05; input hour must be <=
12
04:05:06.789-8 ISO 8601, with time zone as UTC off-
set

156
Data Types

Example Description
04:05:06-08:00 ISO 8601, with time zone as UTC off-
set
04:05-08:00 ISO 8601, with time zone as UTC off-
set
040506-08 ISO 8601, with time zone as UTC off-
set
040506+0730 ISO 8601, with fractional-hour time
zone as UTC offset
040506+07:30:00 UTC offset specified to seconds (not
allowed in ISO 8601)
04:05:06 PST time zone specified by abbreviation
2003-04-12 04:05:06 America/New_York time zone specified by full name

Table 8.12. Time Zone Input


Example Description
PST Abbreviation (for Pacific Standard Time)
America/New_York Full time zone name
PST8PDT POSIX-style time zone specification
-8:00:00 UTC offset for PST
-8:00 UTC offset for PST (ISO 8601 extended format)
-800 UTC offset for PST (ISO 8601 basic format)
-8 UTC offset for PST (ISO 8601 basic format)
zulu Military abbreviation for UTC
z Short form of zulu (also in ISO 8601)

Refer to Section 8.5.3 for more information on how to specify time zones.

8.5.1.3. Time Stamps


Valid input for the time stamp types consists of the concatenation of a date and a time, followed by
an optional time zone, followed by an optional AD or BC. (Alternatively, AD/BC can appear before the
time zone, but this is not the preferred ordering.) Thus:

1999-01-08 04:05:06

and:

1999-01-08 04:05:06 -8:00

are valid values, which follow the ISO 8601 standard. In addition, the common format:

January 8 04:05:06 1999 PST

is supported.

The SQL standard differentiates timestamp without time zone and timestamp with
time zone literals by the presence of a “+” or “-” symbol and time zone offset after the time. Hence,
according to the standard,

157
Data Types

TIMESTAMP '2004-10-19 10:23:54'

is a timestamp without time zone, while

TIMESTAMP '2004-10-19 10:23:54+02'

is a timestamp with time zone. PostgreSQL never examines the content of a literal string
before determining its type, and therefore will treat both of the above as timestamp without
time zone. To ensure that a literal is treated as timestamp with time zone, give it the
correct explicit type:

TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'

In a literal that has been determined to be timestamp without time zone, PostgreSQL will
silently ignore any time zone indication. That is, the resulting value is derived from the date/time fields
in the input value, and is not adjusted for time zone.

For timestamp with time zone, the internally stored value is always in UTC (Universal
Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an
explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If
no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the
system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone.

When a timestamp with time zone value is output, it is always converted from UTC to the
current timezone zone, and displayed as local time in that zone. To see the time in another time
zone, either change timezone or use the AT TIME ZONE construct (see Section 9.9.4).

Conversions between timestamp without time zone and timestamp with time zone
normally assume that the timestamp without time zone value should be taken or given as
timezone local time. A different time zone can be specified for the conversion using AT TIME
ZONE.

8.5.1.4. Special Values


PostgreSQL supports several special date/time input values for convenience, as shown in Table 8.13.
The values infinity and -infinity are specially represented inside the system and will be
displayed unchanged; but the others are simply notational shorthands that will be converted to ordinary
date/time values when read. (In particular, now and related strings are converted to a specific time
value as soon as they are read.) All of these values need to be enclosed in single quotes when used
as constants in SQL commands.

Table 8.13. Special Date/Time Inputs


Input String Valid Types Description
epoch date, timestamp 1970-01-01 00:00:00+00 (Unix
system time zero)
infinity date, timestamp later than all other time stamps
-infinity date, timestamp earlier than all other time
stamps
now date, time, timestamp current transaction's start time
today date, timestamp midnight (00:00) today
tomorrow date, timestamp midnight (00:00) tomorrow
yesterday date, timestamp midnight (00:00) yesterday
allballs time 00:00:00.00 UTC

158
Data Types

The following SQL-compatible functions can also be used to obtain the current time value for the cor-
responding data type: CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME,
LOCALTIMESTAMP. (See Section 9.9.5.) Note that these are SQL functions and are not recognized
in data input strings.

Caution
While the input strings now, today, tomorrow, and yesterday are fine to use in inter-
active SQL commands, they can have surprising behavior when the command is saved to be
executed later, for example in prepared statements, views, and function definitions. The string
can be converted to a specific time value that continues to be used long after it becomes stale.
Use one of the SQL functions instead in such contexts. For example, CURRENT_DATE + 1
is safer than 'tomorrow'::date.

8.5.2. Date/Time Output


The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres),
traditional POSTGRES (Unix date format), or German. The default is the ISO format. (The SQL
standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a historical
accident.) Table 8.14 shows examples of each output style. The output of the date and time types is
generally only the date or time part in accordance with the given examples. However, the POSTGRES
style outputs date-only values in ISO format.

Table 8.14. Date/Time Output Styles


Style Specification Description Example
ISO ISO 8601, SQL stan- 1997-12-17 07:37:16-08
dard
SQL traditional style 12/17/1997 07:37:16.00 PST
Postgres original style Wed Dec 17 07:37:16 1997 PST
German regional style 17.12.1997 07:37:16.00 PST

Note
ISO 8601 specifies the use of uppercase letter T to separate the date and time. PostgreSQL
accepts that format on input, but on output it uses a space rather than T, as shown above. This
is for readability and for consistency with RFC 33391 as well as some other database systems.

In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been spec-
ified, otherwise month appears before day. (See Section 8.5.1 for how this setting also affects inter-
pretation of input values.) Table 8.15 shows examples.

Table 8.15. Date Order Conventions


datestyle Setting Input Ordering Example Output
SQL, DMY day/month/year 17/12/1997 15:37:16.00 CET
SQL, MDY month/day/year 12/17/1997 07:37:16.00 PST
Postgres, DMY day/month/year Wed 17 Dec 07:37:16 1997 PST

1
https://tools.ietf.org/html/rfc3339

159
Data Types

In the ISO style, the time zone is always shown as a signed numeric offset from UTC, with positive
sign used for zones east of Greenwich. The offset will be shown as hh (hours only) if it is an integral
number of hours, else as hh:mm if it is an integral number of minutes, else as hh:mm:ss. (The third case
is not possible with any modern time zone standard, but it can appear when working with timestamps
that predate the adoption of standardized time zones.) In the other date styles, the time zone is shown
as an alphabetic abbreviation if one is in common use in the current zone. Otherwise it appears as a
signed numeric offset in ISO 8601 basic format (hh or hhmm).

The date/time style can be selected by the user using the SET datestyle command, the DateStyle
parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment vari-
able on the server or client.

The formatting function to_char (see Section 9.8) is also available as a more flexible way to format
date/time output.

8.5.3. Time Zones


Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry.
Time zones around the world became somewhat standardized during the 1900s, but continue to be
prone to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL uses the
widely-used IANA (Olson) time zone database for information about historical time zone rules. For
times in the future, the assumption is that the latest known rules for a given time zone will continue
to be observed indefinitely far into the future.

PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However,
the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:

• Although the date type cannot have an associated time zone, the time type can. Time zones in
the real world have little meaning unless associated with a date as well as a time, since the offset
can vary through the year with daylight-saving time boundaries.

• The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible
to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.

To address these difficulties, we recommend using date/time types that contain both date and time
when using time zones. We do not recommend using the type time with time zone (though
it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard).
PostgreSQL assumes your local time zone for any type containing only date or time.

All timezone-aware dates and times are stored internally in UTC. They are converted to local time in
the zone specified by the TimeZone configuration parameter before being displayed to the client.

PostgreSQL allows you to specify time zones in three different forms:

• A full time zone name, for example America/New_York. The recognized time zone names are
listed in the pg_timezone_names view (see Section 52.94). PostgreSQL uses the widely-used
IANA time zone data for this purpose, so the same time zone names are also recognized by other
software.

• A time zone abbreviation, for example PST. Such a specification merely defines a particular offset
from UTC, in contrast to full time zone names which can imply a set of daylight savings transition
rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see
Section 52.93). You cannot set the configuration parameters TimeZone or log_timezone to a time
zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME
ZONE operator.

• In addition to the timezone names and abbreviations, PostgreSQL will accept POSIX-style time
zone specifications, as described in Section B.5. This option is not normally preferable to using a
named time zone, but it may be necessary if no suitable IANA time zone entry is available.

160
Data Types

In short, this is the difference between abbreviations and full names: abbreviations represent a specific
offset from UTC, whereas many of the full names imply a local daylight-savings time rule, and so have
two possible UTC offsets. As an example, 2014-06-04 12:00 America/New_York represents
noon local time in New York, which for this particular date was Eastern Daylight Time (UTC-4). So
2014-06-04 12:00 EDT specifies that same time instant. But 2014-06-04 12:00 EST
specifies noon Eastern Standard Time (UTC-5), regardless of whether daylight savings was nominally
in effect on that date.

To complicate matters, some jurisdictions have used the same timezone abbreviation to mean different
UTC offsets at different times; for example, in Moscow MSK has meant UTC+3 in some years and
UTC+4 in others. PostgreSQL interprets such abbreviations according to whatever they meant (or had
most recently meant) on the specified date; but, as with the EST example above, this is not necessarily
the same as local civil time on that date.

In all cases, timezone names and abbreviations are recognized case-insensitively. (This is a change
from PostgreSQL versions prior to 8.2, which were case-sensitive in some contexts but not others.)

Neither timezone names nor abbreviations are hard-wired into the server; they are obtained from con-
figuration files stored under .../share/timezone/ and .../share/timezonesets/ of
the installation directory (see Section B.4).

The TimeZone configuration parameter can be set in the file postgresql.conf, or in any of the
other standard ways described in Chapter 20. There are also some special ways to set it:

• The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative
spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax.

• The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command
to the server upon connection.

8.5.4. Interval Input


interval values can be written using the following verbose syntax:

[@] quantity unit [quantity unit...] [direction]

where quantity is a number (possibly signed); unit is microsecond, millisecond, sec-


ond, minute, hour, day, week, month, year, decade, century, millennium, or abbrevi-
ations or plurals of these units; direction can be ago or empty. The at sign (@) is optional noise.
The amounts of the different units are implicitly added with appropriate sign accounting. ago negates
all the fields. This syntax is also used for interval output, if IntervalStyle is set to postgres_ver-
bose.

Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For
example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. Also,
a combination of years and months can be specified with a dash; for example '200-10' is read the
same as '200 years 10 months'. (These shorter forms are in fact the only ones allowed by the
SQL standard, and are used for output when IntervalStyle is set to sql_standard.)

Interval values can also be written as ISO 8601 time intervals, using either the “format with designa-
tors” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. The format with
designators looks like this:

P quantity unit [ quantity unit ...] [ T [ quantity unit ...]]

The string must start with a P, and may include a T that introduces the time-of-day units. The available
unit abbreviations are given in Table 8.16. Units may be omitted, and may be specified in any order,

161
Data Types

but units smaller than a day must appear after T. In particular, the meaning of M depends on whether
it is before or after T.

Table 8.16. ISO 8601 Interval Unit Abbreviations


Abbreviation Meaning
Y Years
M Months (in the date part)
W Weeks
D Days
H Hours
M Minutes (in the time part)
S Seconds

In the alternative format:

P [ years-months-days ] [ T hours:minutes:seconds ]

the string must begin with P, and a T separates the date and time parts of the interval. The values are
given as numbers similar to ISO 8601 dates.

When writing an interval constant with a fields specification, or when assigning a string to an in-
terval column that was defined with a fields specification, the interpretation of unmarked quantities
depends on the fields. For example INTERVAL '1' YEAR is read as 1 year, whereas INTER-
VAL '1' means 1 second. Also, field values “to the right” of the least significant field allowed by the
fields specification are silently discarded. For example, writing INTERVAL '1 day 2:03:04'
HOUR TO MINUTE results in dropping the seconds field, but not the day field.

According to the SQL standard all fields of an interval value must have the same sign, so a leading
negative sign applies to all fields; for example the negative sign in the interval literal '-1 2:03:04'
applies to both the days and hour/minute/second parts. PostgreSQL allows the fields to have different
signs, and traditionally treats each field in the textual representation as independently signed, so that
the hour/minute/second part is considered positive in this example. If IntervalStyle is set to
sql_standard then a leading sign is considered to apply to all fields (but only if no additional
signs appear). Otherwise the traditional PostgreSQL interpretation is used. To avoid ambiguity, it's
recommended to attach an explicit sign to each field if any field is negative.

Field values can have fractional parts: for example, '1.5 weeks' or '01:02:03.45'. However,
because interval internally stores only three integer units (months, days, microseconds), fractional
units must be spilled to smaller units. Fractional parts of units greater than months are truncated to be
an integer number of months, e.g. '1.5 years' becomes '1 year 6 mons'. Fractional parts
of weeks and days are computed to be an integer number of days and microseconds, assuming 30 days
per month and 24 hours per day, e.g., '1.75 months' becomes 1 mon 22 days 12:00:00.
Only seconds will ever be shown as fractional on output.

Table 8.17 shows some examples of valid interval input.

Table 8.17. Interval Input


Example Description
1-2 SQL standard format: 1 year 2 months
3 4:05:06 SQL standard format: 3 days 4 hours 5 minutes 6
seconds
1 year 2 months 3 days 4 hours 5 Traditional Postgres format: 1 year 2 months 3
minutes 6 seconds days 4 hours 5 minutes 6 seconds

162

You might also like