TechnoBase DBMS | MySQL | SQL Server | Oracle | Unix/Linux
PostgreSQL
PostgreSQL architecture ---- Write-ahead logging(WAL) Database Objects ---- Databases ---- Tables ---- Constraints ---- Indexes ---- Character sets and collation PostgreSQL language ---- Datatypes ---- SQL commands ---- Functions and operators ---- Queries ---- PL/pgSQL language ---- Full text search Server administration ---- Installing PostgreSQL ---- Server startup/shutdown ---- Server configuration -------- File locations -------- Memory usage ---- Connection handling ---- Monitoring the server -------- Disk usage ------------ Vacuuming -------- The statistics collector ---- Logging Transaction/Query management ---- Locks -------- Table-level locks -------- Row-level locks ---- Using Explain Importing/Exporting data ---- COPY command User Management/Security ---- Roles -------- Role attributes ---- Privileges -------- Default access privileges ---- Client authentication -------- Authentication methods ---- Encryption Backup/Recovery ---- Backups and restores ---- WAL archiving ---- Troubleshooting/Recovery -------- Error codes High Availability/Scalability ---- Replication -------- Streaming replication ------------ Synchronous replication Tools/Utilities ---- psql ---- pgAdmin ---- Additional modules


POSTGRES ARCHITECTURE

A PostgreSQL session consists of the following cooperating processes (programs):

A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called postgres.

The user's client (frontend) application that wants to perform database operations.

Once a database has been created, it can be accessed by:

  • Running the PostgreSQL interactive terminal program, called psql, which allows interactively entering and executing SQL commands
  • Using an existing graphical frontend tool like pgAdmin
  • Writing a custom application, using one of the several available language bindings


    Database file layout

    The configuration and data files used by a database cluster are stored in the cluster's data directory, also referred to as PGDATA. The data directory contains the following control files and subdirectories:

    PG_VERSION A file containing the major version number of PostgreSQL base Subdirectory containing per-database subdirectories current_logfiles File recording the log file(s) currently written to by the logging collector global Subdirectory containing cluster-wide tables, such as pg_database pg_dynshmem Subdirectory containing files used by the dynamic shared memory subsystem pg_logical Subdirectory containing status data for logical decoding pg_snapshots Subdirectory containing exported snapshots pg_stat Subdirectory containing permanent files for the statistics subsystem pg_stat_tmp Subdirectory containing temporary files for the statistics subsystem pg_tblspc Subdirectory containing symbolic links to tablespaces pg_wal Subdirectory containing WAL (Write Ahead Log) files postgresql.auto.conf A file used for storing configuration parameters that are set by ALTER SYSTEM

    	show data_directory;
    	show config_file;
    For each database in the cluster there is a subdirectory within PGDATA/base, named after the database's OID in pg_database, which is the default location for the database's files and its system catalogs.

    Each table and index is stored in a separate file. For ordinary relations, these files are named after the table or index's filenode number, found in pg_class.relfilenode. But for temporary relations, the file name is of the form tBBB_FFF, where BBB is the backend ID of the backend which created the file and FFF is the filenode number. In addition to the main file, each table and index has a free space map (_fsm) which stores information about free space available in the relation and a visibility map (_vm) to track which pages are known to have no dead tuples. Unlogged tables and indexes have a third fork (_init) known as the initialization fork.

    A table that has columns with potentially large entries will have an associated TOAST table, which is used for out-of-line storage of field values that are too large to keep in the table. pg_class.reltoastrelid links from a table to its TOAST table,

    The database and table corresponding to the file path 'base/110636/2609' can be identified by
    	select * from pg_database where oid = 110636;
    	SELECT pg_filenode_relation(0, 2609);
    When a table or index exceeds 1 GB, it is divided into GB-sized segments. The first segment's file name is the same as the filenode, subsequent segments are named filenode.1, filenode.2, etc.

    Write-ahead logging (WAL)

    PostgreSQL maintains a write ahead log (WAL) in the pg_wal subdirectory of the cluster's data directory to ensure data integrity by recording every change committed to the database's data files.

    Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction. The log file is written sequentially, so the cost of syncing the log is much less than the cost of flushing the data pages. When the server is processing many small concurrent transactions, one fsync of the log file may suffice to commit many transactions.

    WAL configuration

    checkpoint_timeout - Maximum time between automatic WAL checkpoints, in seconds. The valid range is between 30 seconds and one day. The default is 5 minutes. Increasing this parameter can increase the amount of time needed for crash recovery.

    fsync - If this parameter is on, the server will try to make sure that updates are physically written to disk, by issuing fsync() system calls or equivalent methods

    max_wal_size - Maximum size to let the WAL grow to between automatic WAL checkpoints. This is a soft limit and WAL size can exceed max_wal_size under heavy load, a failing archive_command or a high wal_keep_segments setting. The default is 1 GB.

    synchronous_commit - Specifies whether transaction commit will wait for WAL records to be written to disk. Default value is on and other valid values are - off, local, remote_apply and remote_write. When off, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.)

    If synchronous_standby_names is non-empty, this parameter also controls whether or not transaction commits will wait for their WAL records to be replicated to the standby servers. on - commits will wait until replies from the current synchronous standbys indicate they have received the commit record of the transaction and flushed it to disk
    remote_apply - commits will wait until the synchronous standbys indicate they have received the commit record of the transaction and applied it, so that it has become visible to queries on the standbys
    remote_write - commits will wait until the synchronous standbys indicate they have received the commit record of the transaction and written it out to their OS. This setting is sufficient to ensure data preservation even if a standby instance were to crash, but not if the standby suffers an OS-level crash, since the data has not necessarily reached stable storage on the standby.
    local - causes commits to wait for local flush to disk, but not for replication
    If synchronous_standby_names is empty, the settings on, remote_apply, remote_write and local all provide the same synchronization level - transaction commits only wait for local flush to disk.

    It is possible to have some transactions commit synchronously and others asynchronously. For e.g a transaction can be made to commit asynchronously when the default is the opposite, by issuing SET LOCAL synchronous_commit TO OFF within the transaction.

    wal_buffers - The amount of shared memory used for WAL data not yet written to disk. The default setting of -1 selects a size equal to 1/32 (about 3%) of shared_buffers, but not less than 64kB nor more than the size of one WAL segment (16MB). This value can be set manually but any positive value less than 32kB will be treated as 32kB.
    The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide any benefit. However, setting to at least a few MB can improve write performance on a busy server where many clients are committing at once.

    wal_level - determines how much information is written to the WAL. This parameter can only be set at server start replica - writes enough data to support WAL archiving and replication (default)
    minimal - removes all logging except the information required to recover from a crash or immediate shutdown
    logical - adds information necessary to support logical decoding
    wal_sync_method - Method used for forcing WAL updates out to disk. If fsync = off, then WAL file updates will not be forced out at all and this setting is irrelevant. Possible values are
  • open_datasync (write WAL files with open() option O_DSYNC)
  • fdatasync (call fdatasync() at each commit)
  • fsync (call fsync() at each commit)
  • fsync_writethrough (call fsync() at each commit, forcing write-through of any disk write cache)
  • open_sync (write WAL files with open() option O_SYNC)
  • Back



    DATABASE OBJECTS

    Databases

    The first database, postgres, is always created by the initdb command when the data storage area is initialized. A second database, template1, is also created during database cluster initialization. Whenever a new database is created within the cluster, template1 is cloned i.e that any changes made in template1 are propagated to all subsequently created databases.

    Another database, template0, contains the same data as the initial contents of template1 and can be used to create a virgin database without the additions made in template1.
    	CREATE DATABASE db1 TEMPLATE template0;
    	
    	SELECT * fROM pg_database;	- list databases
    	\l+
    	
    	SELECT oid, * from pg_database where datname = 'db1';	- show oid
    
    	ALTER DATABASE db1 RENAME TO db2;
    Only the database owner or a superuser can rename a database and non-superuser owners must also have the CREATEDB privilege. The current database cannot be renamed.
    	ALTER DATABASE db1 OWNER TO new_owner;		- must be the owner or superuser
    	ALTER DATABASE db1 SET TABLESPACE tblspc_new;	- change the default tablespace of the database
    	
    	DROP DATABASE db1;	- only the owner or a superuser can drop a database
    The pg_stat_database view contains one row for each database showing database-wide statistics.
    	select * from pg_stat_database;
    Tablespaces

    Tablespaces allow defining locations in the file system where database files can be stored. By using tablespaces, the disk layout of a PostgreSQL installation can be controlled. Tables, indexes and entire databases can be assigned to specific tablespaces.

    Tablespaces also allow for optimizing performance, for e.g, an index which is heavily used can be placed on a fast disk, such as an expensive solid state device.
    	CREATE TABLESPACE fastspace LOCATION '/ssd1/postgresql/data';
    Two tablespaces are automatically created when the database cluster is initialized. pg_global is used for shared system catalogs. pg_default is the default tablespace of the template1 and template0 databases and will be the default tablespace for other databases, unless overridden in CREATE DATABASE.
    	select * from pg_tablespace;
    The temp_tablespaces parameter determines the placement of temporary tables and indexes, as well as temporary files that are used for purposes such as sorting large data sets. This can be a list of tablespace names so that the load associated with temporary objects can be spread over multiple tablespaces. A random member of the list is picked each time a temporary object is to be created.
    	show default_tablespace;
    	show temp_tablespaces;
    Schemas

    A database contains one or more named schemas, which in turn contain tables and other objects including data types, functions and operators. The same object name can be used in different schemas without conflict. Unlike databases, schemas are not rigidly separated, a user can access objects in any of the schemas in the database they are connected to.

    Schemas may be used for
  • allowing many users to use one database without interfering with each other
  • organizing database objects into logical groups to make them more manageable
  • third-party applications can be put into separate schemas so they do not collide with the names of other objects

    Every new database contains a schema named public. If objects are created without schema names, they are put into public schema.

    By default, users cannot access any objects in schemas they do not own and to allow that, the owner of the schema must grant the USAGE privilege on the schema and additional privileges on objects in the schema.

    A user can also be allowed to create objects in someone else's schema by granting the CREATE privilege on the schema. By default, everyone has CREATE and USAGE privileges on the schema public. This allows all users that are able to connect to a given database to create objects in its public schema. That privilege can be revoked by
    	REVOKE CREATE ON SCHEMA public FROM PUBLIC;
    Tables

    The maximum number of columns that a table can contain is between 250 and 1600, depending on the column types. PostgreSQL does not support global temporary tables and each session has to create its own temporary tables.
    	CREATE TABLE t1
    	( id int PRIMARY KEY, name char(50));	- create a new table
    	
    	CREATE TEMPORARY TABLE t1
    	( id int PRIMARY KEY, name char(50));	- create a temporary table
    	
    	CREATE TABLE t1 LIKE t2;		- copy an existing table
    	
    	DROP TABLE t1;		- only the table owner, the schema owner or superuser can drop a table
    	
    	select * FROM information_schema.tables WHERE table_schema = 'public' and table_catalog = 'db1';	- list all user tables
    	select * FROM pg_tables WHERE schemaname = 'public';
    	\dt[S+] [pattern]	- list tables
    	\dv[S+]	[pattern]	- list views
    	\dit[S+] [pattern]	- list tables and indexes
    	
    	\d tablename or pattern		- show table structure
    	
    	SELECT oid FROM pg_class WHERE relname = 'tbl1' AND relkind = 'r';	- find oid of a table
    	SELECT pg_relation_filepath(oid);	- data file pertaining to the table
    Each table has a primary heap disk file where most of the data is stored. If the table has any columns with potentially-wide values, there are TOAST files associated with the table, used to store values too wide to fit comfortably in the main table. There will be one valid index on the TOAST table, if present. Each table and index is stored in a separate disk file - possibly more than one file, if the file would exceed 1GB.

    Altering tables:
    	ALTER TABLE t1 RENAME TO t2;		- rename table
    
    	ALTER TABLE t1 ALTER COLUMN col1 DROP NOT NULL;   	- make a column nullable
    	
    	ALTER TABLE t1 ADD CONSTRAINT pk_t1 PRIMARY KEY(col1);	- add primary key
    	ALTER TABLE t1 ADD CONSTRAINT fk_t1 FOREIGN KEY(col1) REFERENCES t2(col2);	- add foreign key
    	ALTER TABLE t1 RENAME CONSTRAINT fk1 TO fk2;		- rename constraint
    	ALTER TABLE t1 DROP CONSTRAINT fk1;			- drop constraint
    	
    	ALTER TABLE t1 OWNER TO role1;				- change owner
    	
    	ALTER TABLE t1 DISABLE TRIGGER ALL;			- disable triggers
    	ALTER TABLE t1 ENABLE TRIGGER ALL;
    he pg_stat_all_tables view contains one row for each table in the current database (including TOAST tables), showing statistics about accesses to the table. The pg_stat_user_tables and pg_stat_sys_tables views contain the same information, filtered to only show user and system tables respectively.
    	select schemaname, relname, seq_tup_read, seq_scan from pg_stat_all_tables order by seq_tup_read desc limit 20;	- table scans
    Sequences

    Sequence objects are special single-row tables created with CREATE SEQUENCE and used to generate unique identifiers for rows of a table.
    	select * FROM information_schema.sequences where sequence_schema = 'public' and sequence_catalog = 'db1';	- list all sequences
    	\ds[S+]	[pattern]	- list sequences
    	
    	ALTER SEQUENCE seq1 RENAME TO seq2;		- rename sequence
    Sequence manipulation functions
    	select currval('seq');			return value most recently obtained with nextval for specified sequence
    	select lastval();			return value most recently obtained with nextval for any sequence
    	select nextval('seq');			advance sequence and return new value
    	select setval('seq', bigint);		set sequence's current value
    	select setval('seq', bigint, boolean);	set sequence's current value and is_called flag
    	
    	SELECT setval('seq1', COALESCE((SELECT MAX(id)+1 FROM table1), 1), false);		- reset an out of sync sequence
    Constraints

    Check constraint
    	CREATE TABLE t1 (
    	product_id integer,
    	price numeric CHECK (price > 0),		- column check constraint
    	discounted_price numeric CONSTRAINT positive_price CHECK (discounted_price > 0),	- named check constraint
    	CHECK (price > discounted_price)		- table check constraint
    	);
    Unique constraint
    	CREATE TABLE t1 (
    	a integer,
    	b integer UNIQUE,
    	c integer,
    	UNIQUE (a, c) );
    Primary key
    	CREATE TABLE t1 (
    	a integer,
    	b integer,
    	c integer,
    	PRIMARY KEY (a, c) );
    Foreign keys
    	CREATE TABLE t1 (
    	order_id integer REFERENCES order(id) ON DELETE RESTRICT,	- prevents deletion of referenced row
    	product_id integer REFERENCES product(id) ON DELETE CASCADE );	- referencing rows are deleted when referenced rows are deleted
    To resolve foreign-key violation errors, triggers can be disabled on the parent table and later enabled. Disabling or enabling internally generated constraint triggers requires superuser privileges.

    Indexes

    PostgreSQL provides several index types - B-tree (default), Hash, GiST, SP-GiST, GIN and BRIN.
    	CREATE INDEX ix_t1_c1 ON schema1.t1 USING BTREE(c1) TABLESPACE pg_default;
    	CREATE UNIQUE INDEX uq_t1_c1 on t1(c1);
    	
    	DROP INDEX ix_t1_c1;   
    	ALTER INDEX ix_t1_c1 RENAME TO ix_t1_c2;
    	
    	SELECT schemaname, tablename, indexname FROM pg_indexes WHERE schemaname='public' and tablename = 't1';   - list indexes for table
    	\di[S+]	[pattern]	- list indexes
    PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table.

    Indexes can be created on functions and scalar expressions on columns as well.
    	CREATE INDEX ix_t1_lower_col1 ON t1 (lower(col1));
    	CREATE INDEX ix_t1_name ON t1 ((first_name || ' ' || last_name));
    The pg_stat_all_indexes view contains one row for each index in the current database, showing statistics about accesses to that specific index. The pg_stat_sys_indexes and pg_stat_user_indexes views contain only rows for indexes on system and user tables respectively.
    	select * from pg_stat_all_indexes where schemaname = 'public' and relname = 't1';
    	select * from pg_stat_all_indexes where schemaname = 'public' and relname = 't1' and idx_scan = 0;	- unused indexes
    Character sets and collation
    	select pg_encoding_to_char(encoding), datcollate, datctype FROM pg_database WHERE datname = 'dbname';
    		datcollate	- LC_COLLATE (string sort order)
    		datctype	- LC_CTYPE (character classification)

    Back



    PostgreSQL LANGUAGE

    Datatypes

    Binary
    			Storage						Description				
    bytea			1 or 4 bytes + length of binary string		variable-length binary string

    Character
    					Description
    character varying(n), varchar(n)	variable-length with limit
    character(n), char(n)			fixed-length, blank padded
    text					variable unlimited length

    Date/Time
    				Storage		Description				Resolution
    timestamp			8 bytes		date and time (no time zone)		1 microsecond / 14 digits
    timestamp with time zone	8 bytes		date and time, with time zone		1 microsecond / 14 digits
    date				4 bytes		date					1 day
    time				8 bytes		time of day (no date)			1 microsecond / 14 digits
    time with time zone		12 bytes	times of day only, with time zone	1 microsecond / 14 digits
    interval			16 bytes	time interval				1 microsecond / 14 digits

    Numeric
    			Storage 	Description				Range
    smallint 		2 bytes 	small-range integer 			-32768 to +32767 
    integer 		4 bytes 	typical choice for integer 		-2147483648 to +2147483647 
    bigint 			8 bytes 	large-range integer 			-9223372036854775808 to +9223372036854775807 
    decimal 		variable 	user-specified precision, exact 	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point 
    numeric 		variable 	user-specified precision, exact 	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point 
    real 			4 bytes 	variable-precision, inexact 		6 decimal digits precision 
    double precision 	8 bytes 	variable-precision, inexact 		15 decimal digits precision 
    smallserial 		2 bytes 	small autoincrementing integer 		1 to 32767 
    serial 			4 bytes 	autoincrementing integer 		1 to 2147483647 
    bigserial 		8 bytes 	large autoincrementing integer 		1 to 9223372036854775807
    Object Identifiers

    Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. OIDs are not added to user-created tables, unless WITH OIDS is specified, or the default_with_oids configuration variable is enabled. Type oid represents an object identifier. There are also several alias types for oid: regproc, regprocedure, regoper, regoperator, regclass, and regtype.

    UUID - stores 128-bit UUIDs, takes up 16 bytes

    SQL commands

    ANALYZE - collects statistics about the tables in the current database and updates the pg_statistic system catalog. The query planner uses these statistics to determine the most efficient execution plans for queries
    	ANALYZE tbl1;	- analyze a specific table
    CHECKPOINT - forces an immediate WAL checkpoint without waiting for a regular checkpoint scheduled by the system. Data files are updated to reflect the information in the log and flushed to disk. It is not intended for use during normal operation

    CREATE TYPE - registers a new data type for use in the current database
    	CREATE TYPE bug_status AS ENUM ('new', 'open', 'closed');
    INSERT ON CONFLICT can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error
    	INSERT INTO t1 (id, name, status) VALUES (3, 'Fred', 1)
    	ON CONFLICT (id) DO UPDATE set name = EXCLUDED.name where t1.status = 1;
    	
    	INSERT INTO t1 (id, name) VALUES (3, 'Fred') ON CONFLICT (id) DO NOTHING;
    RESET - restore the value of a run-time parameter to the default value
    	RESET work_mem;
    	RESET ALL;
    SELECT
    	SELECT * from t1 LIMIT 100 OFFSET 50;	- retrieve only a portion of the rows
    SET - change a run-time parameter for the current session. SET SESSION is equivalent to SET
    	set work_mem = '12MB';
    SHOW - show the value of a run-time parameter
    	SHOW wal_level;
    	SHOW ALL;

    Prepared statements

    PREPARE creates a prepared statement which is parsed, analyzed and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed avoiding parsing and analysis. Prepared statements can be cleaned up using the DEALLOCATE command. Prepared statements only last for the duration of the current database session.
    	PREPARE prepstmt1 (int) AS
    	SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid AND l.date = $2;
    	EXECUTE prepstmt1(1, current_date);
    	
    	select * from pg_prepared_statements;	- all prepared statements available in the session
    Functions and operators

    Aggregate functions

    bool_and(expression) - true if all input values are true, otherwise false
    bool_or(expression) - true if at least one input value is true, otherwise false
    string_agg (expression, delimiter) - input values concatenated into a string, separated by delimiter

    Conversion

    cast(expr as type) - also specified as expr::type
    	SELECT '01-OCT-2015'::date;	- returns '2015-10-01'
    to_char - convert to string
    	to_char(125, '999')		- convert int to string
    	to_char(current_timestamp, 'HH12:MI:SS')	- convert timestamp to string
    to_date - convert string to date
    	to_date('14.05.2011', 'dd.mm.yyyy')	- returns '2011-05-14'
    	to_date('05 Dec 2000', 'DD Mon YYYY')	- returns '2000-12-05'
    to_number - convert string to numeric
    	to_number('12,454.8-', '99G999D9S')
    Date/time

    current_date - current date
    current_time - current time
    current_timestamp - current date and time
    date_part - retrieves subfields from date/time values
    	date_part('day', TIMESTAMP '2001-02-16 20:38:40')	- 16
    	ate_part('hour', INTERVAL '4 hours 3 minutes')	- 4
    date_trunc(text, timestamp) - truncate date/time values to specified precision
    	date_trunc('hour',current_timestamp)	- '2018-11-04 15:00:00'
    	date_trunc('year',current_date)		- '2018-01-01 00:00:00'
    extract - retrieves subfields from date/time values
    	extract(day from date '2018-10-14')	- 14
    	extract(day from interval '40 days 1 minute')	- 40
    	extract(month from timestamp '2001-02-16 20:38:40');	- 2
    	extract(century from current_timestamp)	- 21
    now() - current date and time

    Session information
    	select current_database();
    	select current_schema;
    	select current_role;		- equivalent to current_user
    	select current_user;		- user name of current execution context	
    	select pg_backend_pid();	- process ID of the server process attached to the current session
    	select pg_blocking_pids(int);	- process ID(s) that are blocking specified server process ID from acquiring a lock
    	select version();
    	select user;			- equivalent to current_user
    String

    || (concatenation) - str1 || str2
    bit_length(str) - number of bits in string
    char_length(str) - number of characters
    left(str,n) - return first n characters in string
    lower(str) - convert to lower case
    octet_length(str) - number of bytes
    position(substr in str) - location of specified substring
    	position('om' in 'Thomas')	- 3
    right(str,n) - return last n characters
    trim(str) - trim spaces or other specified characters
    	trim(both 'xyz' from 'yxTomxx')	- 'Tom'
    	trim(leading 'xyz' from 'yxTomxx')	- 'Tomxx'
    upper(str) - convert to upper case

    Queries

    Combining Queries

    The results of two queries can be combined using the set operations union, intersection and difference.
    	query1 UNION [ALL] query2
    	query1 INTERSECT [ALL] query2
    	query1 EXCEPT [ALL] query2
    Grouping sets

    Complex grouping operations are possible using grouping sets. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set and aggregates are computed for each group just as for simple GROUP BY clauses and the results are returned.
    	select year, month, sum(sales) from t1
    	group by grouping_sets ((year, month), (year));
    Common table expressions (CTE)

    The WITH clause is used to write auxiliary statements for use in a larger query. Each auxiliary statement in a WITH clause can be a SELECT, INSERT, UPDATE or DELETE and the WITH clause itself is attached to a primary statement that can also be a SELECT, INSERT, UPDATE or DELETE.

    Using RECURSIVE, a WITH query can refer to its own output. The query below sums the integers from 1 through 100:
    	WITH RECURSIVE t(n) AS (
    	VALUES (1)
    	UNION ALL
    	SELECT n+1 FROM t WHERE n < 100
    	)
    	SELECT sum(n) FROM t;
    Data-modifying statements in WITH have a RETURNING clause to return its contents to the primary query.
    	WITH purged_rows AS (
    	DELETE FROM products WHERE created_date between '2010-10-01' AND '2010-11-01'
    	RETURNING *
    	)
    	INSERT INTO products_log SELECT * FROM purged_rows;

    PL/pgSQL

    PL/pgSQL is a loadable procedural language for the PostgreSQL database that is used to create functions and procedures.

    Declarations

    Parameters - parameters passed to the functions are named with the identifiers $1, $2 etc. Either the identifiers or optionally declared aliases can be used to refer to the parameter values. Aliases can be created by giving a name to the parameter in the create function statement or can be explicitly declared.
    	create function fn_get_tax(amount real) returns real as $$
    	begin	
    		return amount * 0.05; 
    	end;
    	$$ language plpgsql;
    Alias - can be declared for function parameters or a variable
    	create function fn_get_name (integer) returns varchar as $$
    	begin
    		declare
    		emp_id alias for $1;
    		emp_name varchar(50);
    		
    		select name into emp_name where id = emp_id;
    		return name;
    	end;
    Row type - a row-type variable can hold a whole row of a SELECT or FOR query result whose column set matches the declared type of the variable. The individual fields of the row value are accessed using the dot notation i.e rowvar.field

    Record variables - are similar to row-type variables, but they have no predefined structure and take on the actual row structure of the row and they are assigned during a SELECT or FOR command.

    Control structures

    FOR
    	FOR target IN query LOOP	- iterate through the results of a query
    		statements		- target is a record or row variable or comma-separated list of scalar variables
    	END LOOP
    LOOP - performs an unconditional loop until terminated by an EXIT or RETURN statement
    	LOOP
    		statements
    	END LOOP;
    RETURN - terminates a function and returns a value to the caller
    	RETURN expression
    When a function is declared to return SETOF, the individual items to return are specified by a sequence of RETURN NEXT or RETURN QUERY commands and a final RETURN command with no argument is used to indicate that the function has finished executing.

    A procedure does not have a return value and can end without a RETURN statement. If a RETURN statement is desired to exit the code early, then NULL must be returned. If a procedure has output parameters, then the output values can be assigned to the parameters as if they were variables.
    	CREATE PROCEDURE triple(INOUT x int)
    	LANGUAGE plpgsql
    	AS $$
    	BEGIN
    		x := x * 3;
    	END;
    	$$;
    	
    	CALL triple(5);
    Statements

    Assignment (:=, =)
    	var1 := var2 * 0.6;
    	rec1.field1 = 20;
    NULL - a placeholder statement that does nothing


    Full text search

    Full text searching is based on the match operator @@, which returns true if a tsvector (document) matches a tsquery (query). It doesn't matter which data type is written first

    Within a tsquery, the & (AND), | (OR) and ! (NOT) operators specify how the arguments are to be handled for a match.
    	SELECT 'word1 word2 word3 word4'::tsvector @@ 'word2 & word4'::tsquery;	- returns true
     	SELECT 'word3 & word5'::tsquery @@ 'word1 word2 word3 word4'::tsvector;	- false
    Searching for phrases is possible with the <-> (FOLLOWED BY) tsquery operator, which matches only if its arguments have matches that are adjacent and in the given order.
    	SELECT to_tsvector('fatal error') @@ to_tsquery('fatal <-> error');	- true
    	SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal <-> error');	- false

    Back



    SERVER ADMINISTRATION

    Installing/upgrading PostgreSQL

    PostgreSQL can be installed using the source code or pre-built binary packages that are freely available on the web.

    Upgrading via pg_dumpall

  • Backup database using pg_dumpall
    	pg_dumpall > outputfile
  • Shut down old server
  • Rename the old installation folder
    	mv /usr/local/pgsql /usr/local/pgsql.old
  • Install the new version of PostgreSQL
  • Create a new database cluster if needed
    	/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
  • Restore previous pg_hba.conf and any postgresql.conf modifications
  • Start the database server, again using the special database user account
    	/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
  • Restore data from backup using the new psql
    	/usr/local/pgsql/bin/psql -d postgres -f outputfile
    Upgrading via pg_upgrade

    The pg_upgrade module allows an installation to be migrated in-place from one major version to another. Upgrades can be performed in minutes, particularly with --link mode. It requires steps similar to pg_dumpall e.g. starting/stopping the server, running initdb.

    Options
    	-b bindir, --old-bindir=bindir		the old PostgreSQL executable directory (environment variable PGBINOLD)
    	-B bindir, --new-bindir=bindir		the new executable directory (PGBINNEW)
    	-c, --check				check clusters only, don't change any data
    	-d configdir, --old-datadir=configdir	the old database cluster configuration directory (PGDATAOLD)
    	-D configdir, --new-datadir=configdiry	the new cluster configuration directory (PGDATANEW)
    	-j, --jobs				number of simultaneous processes or threads to use
    	-k, --link				use hard links instead of copying files to the new cluster
    	-o options, --old-options options	options to be passed to the old postgres command (multiple options are appended)
    	-O options, --new-options options	options to be passed to the new postgres command
    	-p port, --old-port=port		the old cluster port number (PGPORTOLD)
    	-P port, --new-port=port		the new cluster port (PGPORTNEW)
    	-r, --retain				retain SQL and log files even after successful completion
    	-U username, --username=username	cluster's install user name (PGUSER)
    	-V,	--version			display version information and exit

    Server startup/shutdown

    It is advisable to run PostgreSQL under a separate user account (the user name postgres is often used) and this user should only own the data that is managed by the server and should not be shared with other daemons.

    postgres is the database server program. One postgres instance manages the data of exactly one database cluster which is a collection of databases that is stored at a common file system location (the 'data area'). Multiple postgres instances can run on a system as long as they use different data areas and different communication ports.

    postgres accepts the following command-line arguments
    	-D datadir	location of the database files
    	-h hostname	IP or address on which to listen for TCP/IP connections from client applications
    	-p port		the TCP/IP port or local Unix domain socket file extension on which to listen for connections
    	-v,--version	print the postgres version and exit
    	-?,--help	show help about arguments and exit
    The simplest way to start the server and leave it running in the foreground is
    	$postgres -D /usr/local/pgsql/data
    If the -D option is not provided, the server tries to use the data directory named by the PGDATA environment variable and if that variable is not provided either, it fails.

    It is normally better to run postgres in the background by using
    	$ postgres -D /usr/local/pgsql/data >logfile 2>&1 &
    pg_ctl

    pg_ctl is a utility for initializing a PostgreSQL database cluster, starting, stopping or restarting the postgres server or displaying the status of a running server. Although the server can be started manually, pg_ctl encapsulates tasks such as redirecting log output and properly detaching from the terminal and process group. It also provides convenient options for controlled shutdown.
    	Options
    	-D,--pgdata=	the data directory
    	-l,--logfile=	file to append the server log output, is created if it does not exist
    	-m,mode=	the shutdown mode
    	-V,--version	print pg_ctl version and exit
    	-?,--help	show help about arguments and exit
    
    	pg_ctl start -l logfile		- start server in the background and direct output in the named log file
    	pg_ctl stop			- stop the server
    	pg_ctl restart			- stops and starts the server, reusing the command line options that were passed previously
    	pg_ctl status			- show server status
    The pg_ctl program provides an interface for sending signals to shut down the server. The PID of the postgres process can be found using the ps program or from the file postmaster.pid in the data directory.
    	pg_ctl kill signal_name process_id
    SIGTERM - Smart Shutdown mode. After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally and shuts down only after all sessions terminate
    SIGINT - Fast Shutdown mode. The server disallows new connections and sends all existing server processes SIGTERM causing them to abort their current transactions and exit promptly. It then waits for all server processes to exit and shuts down
    SIGQUIT - Immediate Shutdown mode, recommended only in emergencies. The server will send SIGQUIT to all child processes and wait for them to terminate and if they do not within 5 seconds, SIGKILL is sent. The master server process exits as soon as all child processes have exited, without doing normal database shutdown processing. This will lead to recovery by replaying the WAL log at next startup
    The signals can also be sent using kill on non-Windows systems.
    	$ kill -INT `head -1 /usr/local/pgsql/data/postmaster.pid`	- do a fast shutdown

    Server configuration

    All server parameters are case-insensitive and are kept in the configuration file postgresql.conf, which is normally kept in the data directory. A default copy is installed when the database cluster directory is initialized.
    	# sample parameters
    	log_connections = yes
    	log_destination = 'syslog'
    	search_path = '"$user", public'
    	shared_buffers = 128MB
    The parameters can be managed via SQL.
    	ALTER SYSTEM SET log_statement = all;
    	ALTER SYSTEM RESET log_statement;	- change to default
    	ALTER SYSTEM RESET ALL;		- default all parameters
    	
    	ALTER DATABASE SET log_statement = all;
    	
    	SHOW log_statement;
    	SET log_statement = all;	- set the value only for the current session
    ALTER SYSTEM writes the given parameter setting to the postgresql.auto.conf file, which is read in addition to postgresql.conf. Setting a parameter to DEFAULT or using the RESET variant, removes that entry from the postgresql.auto.conf file.

    File locations

    In addition to the postgresql.conf file, PostgreSQL uses two other manually-edited configuration files, which control client authentication. By default, all three configuration files are stored in the database cluster's data directory but can be placed elsewhere using the below parameters.
    	show data_directory;	- the directory to use for data storage, can only be set at server start
    	show config_file;	- the main server configuration file (customarily called postgresql.conf), only set on the postgres command line
    	show hba_file;		- the configuration file for host-based authentication (customarily called pg_hba.conf), set at server start only
    	show ident_file;	- the configuration file for user name mapping (customarily called pg_ident.conf), set at server start only
    	show external_pid_file; 	- the name of an additional process-ID (PID) file that the server should create for use by server 
    						administration programs, can only be set at server start
    In a default installation, none of the above parameters are set explicitly and the data directory is specified by the -D command-line option or the PGDATA environment variable and the configuration files are all found within the data directory.

    Memory usage

    shared_buffers - Sets the amount of memory the database server uses for shared memory buffers. The default is typically 128MB, but might be less if kernel settings will not support it (as determined during initdb). This parameter can only be set at server start.

    On a dedicated database server, a reasonable starting value for shared_buffers is 25% of the available memory. On some workloads even larger settings for shared_buffers are effective, but because PostgreSQL also relies on the OS cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount. Larger settings for shared_buffers usually require a corresponding increase in max_wal_size, in order to spread out the process of writing large quantities of new or changed data over a longer period of time.

    work_mem - Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The default value is 4MB. For a complex query, several sort or hash operations might be running in parallel and each operation will use as much memory. Also, several sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries.
    	SHOW shared_buffers;
    	SHOW work_mem;

    Connection handling

    PostgreSQL is implemented using a simple 'process per user' client/server model wherein there is one client process connected to exactly one server process. The master server process (postgres) listens at the specified port for incoming connections and forks a new process for each connection and then the client and the new process communicate without intervention from the master process.

    The server tasks communicate with each other using semaphores and shared memory to ensure data integrity throughout concurrent data access.

    The client process can be any program that understands the PostgreSQL protocol, including those based on the C-language library libpq or independent implementations such as the Java JDBC driver.

    Once a connection is established the client process can send a query, transmitted using plain text, to the server. i.e there is no parsing done in the client. The server parses the query, creates an execution plan, executes the plan and returns the retrieved rows to the client through the established connection.
    	show port;	- TCP port the server listens on (default 5432)
    The max_connections parameter determines the maximum number of concurrent connections to the database server. The default is typically 100 connections, but might be less if kernel settings do not support it, as determined during initdb. This parameter can only be set at server start.
    	show max_connections;
    	show superuser_reserved_connections;	- connections reserved for superusers (default=3)

    Monitoring the server
    	SELECT version();		- show postgres version 
    
    	select t1.datname AS db_name, pg_size_pretty(pg_database_size(t1.datname)) as db_size from pg_database t1
    		order by pg_database_size(t1.datname) desc;		- db size
    	select table_name, pg_relation_size(quote_ident(table_name)) from information_schema.tables
    		where table_schema = 'public' order by 2 desc;		- large tables	
    		
    	pg_cancel_backend(pid);		- cancel query by giving a SIGINT to the server process
    	pg_terminate_backend(pid);	- gives a SIGTERM to the server process, terminating the process and also the connection
    List all postgreSQL processes in Unix
    	ps -efl | grep postgres
    Each client connection is a server process and its command value displays in the form
    	postgres: user database host activity
    Disk usage

    Disk space usage can be monitored in three ways
  • using SQL functions
    	select pg_database_size('dbname');	- space used by database
    	select pg_indexes_size(regclass);	- space used by indexes of a table
    	select pg_total_relation_size(regclass);	- total space used by table, including all indexes and TOAST data
  • using the oid2name module
  • by manual inspection of the system catalogs
    	SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;		- largest tables and indexes
    	SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'tbl1';	- disk usage of a table
    	
    	SELECT relname, relpages FROM pg_class, (SELECT reltoastrelid FROM pg_class WHERE relname = 'tbl1') AS ss
    		WHERE oid = ss.reltoastrelid OR oid = (SELECT indexrelid FROM pg_index WHERE indrelid = ss.reltoastrelid)
    	ORDER BY relname;	- space used by TOAST tables
    	
    	SELECT c2.relname, c2.relpages FROM pg_class c, pg_class c2, pg_index i
    		WHERE c.relname = 'tbl1' AND c.oid = i.indrelid AND c2.oid = i.indexrelid
    	ORDER BY c2.relname;	- index sizes
    Vacuuming

    VACUUM reclaims storage occupied by dead tuples that have been deleted by an update but are not physically removed from their table; they remain present until a VACUUM is done. Hence VACUUM needs to be run periodically, especially on frequently-updated tables.

    Apart from recovering disk space, VACUUM also updates data statistics used by the PostgreSQL query planner and updates the visibility map, which speeds up index-only scans. Parameters:
    ANALYZE - updates statistics used by the planner
    FULL - Full vacuum, which can reclaim more space, but takes much longer and exclusively locks the table. Also requires extra disk space, since it writes a new copy of the table and doesn't release the old copy until the operation is complete. To be used only when a significant amount of space needs to be reclaimed from within the table
    VERBOSE - prints a detailed vacuum activity report for each table
    	VACUUM (VERBOSE, ANALYZE) tbl1;		- clean, analyze and print report
    The Autovacuum Daemon

    PostgreSQL has an optional feature called autovacuum, that automates the execution of VACUUM and ANALYZE commands. When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility, hence autovacuum cannot be used unless track_counts is set to true. Autovacuuming is enabled by default and the related configuration parameters are set appropriately.

    If the installation has N databases, a new worker will be launched every autovacuum_naptime/N seconds.
    	show autovacuum_naptime;	- minimum delay between autovacuum runs on any given database
    	show autovacuum_max_workers	- maximum worker processes that can run concurrently
    The statistics collector

    The statistics collector is a subsystem that supports collection and reporting of information about server activity, controlled by the following configuration parameters
  • track_activities - enables monitoring of the current command being executed by any server process
  • track_counts - controls whether statistics are collected about table and index accesses
  • track_functions - enables tracking of usage of user-defined functions
  • track_io_timing - enables monitoring of block read and write times

    The statistics collector transmits the collected information to other postgreSQL processes through temporary files that are stored in the directory named by the stats_temp_directory parameter. When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory, so that statistics can be retained across server restarts. When recovery is performed at server start (e.g. after immediate shutdown, server crash and point-in-time recovery), all statistics counters are reset.
    	SHOW stats_temp_directory;	- pg_stat_tmp, by default
    Several predefined views are available to show the current state of the system and collected statistics.

    Dynamic statistics views

    pg_stat_activity - has one row per server process, showing the current activity of that process, such as state and current query Columns:
    datid/datname - oid/name of the database connected to
    pid - process id
    usename - name of the logged in user
    client_addr/client_hostname - ip/host name of the connected client
    state - current state of the backend. Possible values are
    	active	- the backend is executing a query
    	idle	- the backend is waiting for a new client command
    	idle in transaction	- the backend is in a transaction but is not currently executing a query
    	idle in transaction (aborted)	- similar to idle in transaction, except one of the statements in the transaction caused an error
    	fastpath function call	- the backend is executing a fast-path function
    	disabled	- this state is reported if track_activities is disabled in this backend
    query - text of the backend's most recent query
    backend_type - type of the backend
    	select * from pg_stat_activity;				- all connections
    	select * from pg_stat_activity where state != 'idle';	- running queries
    	
    	SELECT pid, usename, client_addr, client_port, now() - pg_stat_activity.query_start AS duration,  query,  state FROM pg_stat_activity 
    		where now() - query_start > interval '5 minute' and state != 'idle';	- queries running > 5 minute
    Collected statistics views

    pg_stat_archiver - has just one row showing statistics about the WAL archiver process's activity
    pg_stat_bgwriter - has a single row showing statistics about the background writer process's activity and containing global data for the cluster
    pg_statio_all_tables - has one row for each table in the current database, showing statistics about I/O on that table

    System catalogs

    PostgreSQL's system catalogs (where schema metadata is stored) are regular tables. Most system catalogs are copied from the template database during database creation and are thereafter database-specific. A few catalogs (like pg_database) are physically shared across all databases in a cluster.

    In addition to public and user-created schemas, each database contains a pg_catalog schema which contains the system tables and all the built-in data types, functions, and operators.

    pg_class - tables, indexes, sequences, views pg_collation - available collations pg_database - available databases pg_description - optional descriptions(comments) for each database object, manipulated with the COMMENT command pg_index - additional info on indexes pg_tablespace - tablespaces, one table per cluster pg_type - data types

    There are also several built-in system views that provide convenient access to commonly used queries on the system catalogs or to internal server state.

    The information schema

    The information schema is a schema named information_schema that automatically exists in all databases and consists of a set of views that contain information about the objects defined in the current database.

    All the columns of the information schema views use special data types that are defined in the information schema. These types are
  • cardinal_number - a nonnegative integer
  • character_data - a character string (without specific maximum length)
  • sql_identifier - a character string. This type is used for SQL identifiers, character_data is used for any other kind of text data
  • time_stamp - a domain over the type timestamp with time zone
  • yes_or_no - a character string domain that contains either YES or NO, used to represent boolean data

    Logging

    The following configuration parameters control logging of server messages and errors.

    log_destination - server messages can be logged to stderr (default), csvlog and syslog as specified by this parameter. csvlog log entries are output in CSV format, which is convenient for loading logs into programs.
    	show log_destination;	- default stderr
    When either stderr or csvlog are included, the file current_logfiles is created to record the location of the log files currently in use by the logging collector and the associated logging destination

    logging_collector - this parameter enables the background process that captures log messages sent to stderr and redirects them into log files. It can only be set at server start.
    	show logging_collector;		- default on
    log_directory - this parameter determines the directory in which log files will be created. It can be specified as an absolute path or relative to the cluster data directory. This parameter can only be set in the postgresql.conf file or on the server command line.
    	show log_directory;		- default log
    log_filename - this parameter sets the file names of the created log files.
    	show log_filename;	- default is postgresql-%Y-%m-%d_%H%M%S.log
    log_rotation_size - this parameter determines the maximum size of an individual log file. After the limit is reached, a new log file will be created. It can be set to zero to disable size-based creation of new log files.
    	show log_rotation_size;
    log_min_duration_statement - logs the duration of statements that ran for at least the specified number of milliseconds. Setting this to 0 prints all statement durations and -1 (the default) disables logging statement durations.
    	show log_min_duration_statement;
    log_statement - controls which SQL statements are logged. Valid values are none (default), ddl, mod and all.
    	show log_statement;

    Back



    TRANSACTION/QUERY MANAGEMENT

    Transactions

    START TRANSACTION/BEGIN commands begin a transaction block that terminates with a COMMIT/ROLLBACK. Any SQL command without a START TRANSACTION/BEGIN implicitly begins a block and PostgreSQL issues a COMMIT after each such command (autocommit).
    	START TRANSACTION/BEGIN ISOLATION LEVEL SERIALIZABLE/REPEATABLE READ/READ COMMITTED/READ UNCOMMITTED;
    The SET TRANSACTION command sets the characteristics of the current transaction. If SET TRANSACTION is executed without a prior START TRANSACTION or BEGIN, it issues a warning and has no effect.
    	SET TRANSACTION ISOLATION LEVEL SERIALIZABLE/REPEATABLE READ/READ COMMITTED/READ UNCOMMITTED;	- set isolation level
    SAVEPOINT establishes a new savepoint within the current transaction.
    	SAVEPOINT s1;
    	ROLLBACK TO SAVEPOINT s1;
    	RELEASE SAVEPOINT s1;
    Locks

    PostgreSQL supports table, row and page-level locking. Most commands automatically acquire locks of appropriate modes, for e.g TRUNCATE takes a table-level lock. Locks can also be acquired explicitly with the LOCK command.

    Table-level lock modes

  • ACCESS SHARE - acquired by the SELECT statement on referenced tables. Conflicts only with ACCESS EXCLUSIVE
  • ROW SHARE - The SELECT FOR UPDATE and SELECT FOR SHARE commands acquire a lock of this mode
  • ROW EXCLUSIVE - acquired by any command that modifies data in a table i.e UPDATE, DELETE, INSERT
  • SHARE UPDATE EXCLUSIVE - protects a table against concurrent schema changes and VACUUM runs. Acquired by VACUUM, ANALYZE, CREATE STATISTICS and some ALTER TABLE variants
  • SHARE - protects a table against concurrent data changes. Acquired by CREATE INDEX
  • SHARE ROW EXCLUSIVE - protects a table against concurrent data changes and is self-exclusive so that only one session can hold it at a time. Acquired by CREATE COLLATION, CREATE TRIGGER and many forms of ALTER TABLE
  • EXCLUSIVE - only allows concurrent ACCESS SHARE locks i.e only reads can happen in parallel
  • ACCESS EXCLUSIVE - the default mode for LOCK TABLE statements. Conflicts with all other modes. Acquired by DROP TABLE, TRUNCATE and many forms of ALTER TABLE statement

    LOCK TABLE obtains a table-level lock, waiting if necessary for any conflicting locks to be released. If NOWAIT is specified, LOCK TABLE is aborted if it cannot be acquired immediately and an error is emitted. Once obtained, the lock is held for the remainder of the current transaction and released at transaction end.

    LOCK TABLE must be used within a transaction block (BEGIN and COMMIT/ROLLBACK) and an error results if used outside.
    	LOCK TABLE t1 IN SHARE MODE NOWAIT;	- Obtain a SHARE lock on a primary key table before performing inserts into a foreign key table
    	LOCK TABLE t1 IN SHARE ROW EXCLUSIVE MODE;	- lock a primary key table in SHARE ROW EXCLUSIVE mode before a delete operation
    Row-level lock Modes

  • FOR UPDATE - causes the rows retrieved by the SELECT statement to be locked for update, preventing them from being locked, modified or deleted by other transactions. Within a REPEATABLE READ or SERIALIZABLE transaction, an error will be thrown if a row to be locked has changed since the transaction started
  • FOR NO KEY UPDATE - behaves similarly to FOR UPDATE, except that the lock acquired is weaker. This lock will not block SELECT FOR KEY SHARE commands from acquiring a lock on the same rows. Acquired by any UPDATE that does not acquire a FOR UPDATE lock
  • FOR SHARE - behaves similarly to FOR NO KEY UPDATE, except that it acquires a shared lock rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing UPDATE, DELETE, SELECT FOR UPDATE or SELECT FOR NO KEY UPDATE on these rows but does not prevent them from performing SELECT FOR SHARE or SELECT FOR KEY SHARE
  • FOR KEY SHARE - behaves similarly to FOR SHARE, except that the lock is weaker. SELECT FOR UPDATE is blocked but not SELECT FOR NO KEY UPDATE. A key-shared lock blocks other transactions from performing DELETE or any UPDATE that changes the key values but not other UPDATE and it does not prevent SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE

    Page-level share/exclusive locks are used internally to control read/write access to table pages in the shared buffer pool and are released immediately after a row is fetched/updated.

    pg_locks

    The view pg_locks provides access to information about the locks held by open transactions within the database server. pg_locks contains one row per active lockable object, requested lock mode, and relevant process. The same object might appear many times, if multiple processes are holding or waiting for locks on it.

    Columns in pg_locks:

    locktype - type of object (relation, extend, page, tuple, transactionid, virtualxid, object, userlock or advisory)
    relation - OID of the relation targeted by the lock
    page - page number targeted within the relation
    tuple - tuple number within the page
    mode - lock mode held/desired
    granted - true if lock is held, false if awaited

    	select t.relname,l.locktype,page,virtualtransaction,pid,mode,granted	- list open locks
    	from pg_locks l, pg_stat_all_tables t
    	where l.relation=t.relid order by relation asc;
    The below query joins pg_locks with pg_stat_activity to list the processes waiting for and holding locks (only row-level locks and not object-level locks).
    	SELECT t1.pid AS blocked_pid,
    	t3.usename AS blocked_user,
    	t2.pid AS blocking_pid,
    	t4.usename AS blocking_user,
    	t3.query AS blocked_statement,
    	t4.query AS current_statement_in_blocking_process
    	FROM
    	pg_catalog.pg_locks t1
    	JOIN pg_catalog.pg_stat_activity t3  ON t3.pid = t1.pid
    	JOIN pg_catalog.pg_locks t2	ON t2.locktype = t1.locktype
    		AND t2.DATABASE IS NOT DISTINCT FROM t1.DATABASE
    		AND t2.relation IS NOT DISTINCT FROM t1.relation
    		AND t2.page IS NOT DISTINCT FROM t1.page
    		AND t2.tuple IS NOT DISTINCT FROM t1.tuple
    		AND t2.virtualxid IS NOT DISTINCT FROM t1.virtualxid
    		AND t2.transactionid IS NOT DISTINCT FROM t1.transactionid
    		AND t2.classid IS NOT DISTINCT FROM t1.classid
    		AND t2.objid IS NOT DISTINCT FROM t1.objid
    		AND t2.objsubid IS NOT DISTINCT FROM t1.objsubid
    		AND t2.pid != t1.pid
    	JOIN pg_catalog.pg_stat_activity t4 ON t4.pid = t2.pid
    	WHERE NOT t1.GRANTED;
    Deadlocks

    deadlock_timeout - the amount of time, in milliseconds, to wait on a lock before checking to see if there is a deadlock condition. The default is 1000ms. The check for deadlock is relatively expensive and it can be considered raising it on a heavily loaded server. Ideally the setting should exceed the typical transaction time, so as to improve the odds that a lock will be released before the waiter decides to check for deadlock. Only superusers can change this setting.
    	SHOW deadlock_timeout;
    Using Explain

    The structure of a query plan is a tree of plan nodes. Nodes at the bottom level of the tree are scan nodes which return raw rows from a table. There are different types of scan nodes for different table access methods - sequential scans, index scans and bitmap index scans. There are also non-table row sources such as VALUES clauses and set-returning functions in FROM, which have their own scan node types. If the query requires joining, aggregation, sorting or other operations on the raw rows there will be additional nodes above the scan nodes to perform these operations. Different node types can appear here as well if there is more than one way to perform these operations.

    The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the cost estimates that the planner made for the execution of that node. Additional lines might appear, indented from the node's summary line to show additional properties of the node. The very first line (the summary line for the topmost node) has the estimated total execution cost for the plan which the planner seeks to minimize.
    	EXPLAIN SELECT * FROM t1;
    
    				QUERY PLAN
    	-------------------------------------------------------------
    	 Seq Scan on t1  (cost=0.00..458.00 rows=10000 width=244)
    	 
    		0.00 = Estimated start-up cost, the time expended before the output phase can begin, e.g. time to do the sorting in a sort node
    		458.00 = Estimated total cost for retrieving all available rows
    		10000 = Estimated number of rows output by this plan node
    		244 = Estimated average width of rows output by this plan node (in bytes)
    The above query has no WHERE clause, so the planner chooses a simple sequential scan plan.
    	EXPLAIN SELECT * FROM t1 WHERE id < 100;
    
                                      QUERY PLAN
    	--------------------------------------------------------------
    	Bitmap Heap Scan on t1  (cost=5.07..229.20 rows=101 width=244)
    		Recheck Cond: (id < 100)
    		->  Bitmap Index Scan on t1_id  (cost=0.00..5.04 rows=101 width=0)
    			Index Cond: (id < 100)
    For the above query, the query planner uses a two-step plan - the child plan node uses an index to find the locations of rows matching the index condition and then the upper plan node actually fetches those rows from the table. The reason for using two plan levels is that the upper plan node sorts the row locations identified by the index into physical order before reading them, to minimize the cost of separate fetches. The 'bitmap' in the node names is the mechanism that does the sorting.

    Back



    IMPORTING/EXPORTING DATA

    COPY

    COPY moves data between PostgreSQL tables and standard file-system files. COPY TO copies the contents of a table or the result of a SELECT query to a file while COPY FROM copies data from a file to a table. If a list of columns is specified, only the data in those columns will be copied to or from the file.

    The file must be accessible by the PostgreSQL user id the server runs as and the name must be specified from the viewpoint of the server. When PROGRAM is specified, the server executes the given command and reads from/writes to the standard output/input of the program. The command must be specified from the the server's viewpoint and be executable by the PostgreSQL user. When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

    An input file name can be an absolute or relative path, but an output file name must be an absolute path. File FORMAT can be text(default), csv or binary.
    	COPY t1 FROM '/usr1/tbl1.txt';	- import from file
    	\copy tbl1 from '/usr1/tbl1.txt';	- psql
    	COPY t1 FROM 'c:\downloads\file.csv' DELIMITERS ',' CSV QUOTE '''';	- load a CSV file with single quotes around values
    	
    	COPY tbl1 TO '/usr1/tbl1.txt';	- export to file
    	COPY (SELECT * FROM t1 WHERE col1 LIKE 'A%') TO '/home/user1/t1.copy';	- send query output to file
    	\copy (SELECT * FROM t1 limit 100) to '/home/user1/t1.copy' with header delimiter as ',';	- export as csv

    Back



    USER MANAGEMENT/SECURITY

    Database roles

    A role is an entity that can own database objects and have database privileges. A role can be considered as a user, a group of users or both. A role can be granted membership to another role, allowing the member role to use privileges assigned to another role.
    	CREATE ROLE dbadmin;
    	DROP ROLE role1;
    
    	SELECT * FROM pg_roles;	- list all roles
    	\du+ or \duS+		- psql (S - include system roles)
    
    	ALTER ROLE role1 RENAME TO role2;
    
    	ALTER ROLE role1 WITH PASSWORD 'pwd';	- change password for role
    	\password [role]			- psql
    CREATE/ALTER USER is an alias for CREATE/ALTER ROLE. The only difference is that in CREATE USER, LOGIN is assumed by default, whereas NOLOGIN is assumed when CREATE ROLE is used.

    Role Attributes

    A role can have attributes that define its privileges and interact with the client authentication system.

    Login - Only roles that have the LOGIN attribute can be used for a database connection. A role with the LOGIN attribute is same as a user
    	CREATE ROLE role1 LOGIN;	- same as CREATE USER role1;
    	ALTER ROLE role1 LOGIN;
    	
    Createdb - Has permission to create databases
    Createrole - A role with CREATEROLE privilege can alter and drop other roles, as well as grant or revoke membership in them
    Superuser - A database superuser bypasses all permission checks, except the right to log in
    	ALTER ROLE role1 SUPERUSER;		- grant/remove superuser attribute
    	ALTER ROLE role1 NOSUPERUSER;

    Default roles

    PostgreSQL provides a set of default roles which provide access to certain commonly needed, privileged capabilities and information. Administrators can GRANT these roles to users and other roles providing those users with access to the specific capabilities and information.

    	Role			Allowed Access
    	----			------- ------
    	pg_read_all_settings	read all configuration variables, even those normally visible only to superusers
    	pg_read_all_stats	read all pg_stat_* views and use various statistics related extensions
    	pg_stat_scan_tables	execute monitoring functions that may take ACCESS SHARE locks on tables, potentially for a long time
    	pg_signal_backend	send signals to other backends (eg: cancel query, terminate)
    	pg_read_server_files	allow reading files from any location the database can access on the server with COPY
    					and other file-access functions
    	pg_write_server_files	allow writing to files in any location the database can access on the server with COPY
    					and other file-access functions
    	pg_execute_server_program	allow executing programs on the database server as the user the database runs as with COPY and other
    						functions which allow executing a server-side program
    	pg_monitor		read/execute various monitoring views and functions. This role is a member of pg_read_all_settings,
    					pg_read_all_stats and pg_stat_scan_tables
    Privileges

    When an object is created, it is assigned an owner which is normally the role that created the object. For most kinds of objects, the initial state is that only the owner or a superuser can use or change the object. To allow other roles to use it, privileges must be granted.

    The right to drop or alter an object is inherent in the owner and is not a grantable privilege. However, a similar effect can be obtained by granting or revoking membership in the role that owns the object. The owner implicitly has all grant options for the object too.

    List of privileges

    CONNECT - allows the user to connect to the specified database
    CREATE - for databases, allows creation of schemas and for schemas, allows creation of objects within the schema
    REFERENCES - privilege required on both the referenced and referencing columns to create a foreign key constraint,
    SELECT - select any column from table. For sequences, allows the use of the currval function
    TEMPORARY/TEMP - allows temporary tables to be created in the database
    TRIGGER - allows the creation of a trigger on table
    UPDATE - allows update of any column of table. For sequences, allows the use of the nextval and setval functions
    USAGE - for schemas, allows access to objects within the schema. For sequences, allows the use of the currval and nextval functions
    	GRANT ALL ON tbl1 TO role1;		- grant access to an existing table
    	GRANT SELECT, INSERT, DELETE, UPDATE ON tbl1 TO role2 WITH GRANT OPTION;	- allow recipient to grant privilege to others
    	
    	REVOKE ALL ON tbl1 FROM role1;
    The special role name PUBLIC can be used to grant a privilege to every role on the system, including those that might be created later. PUBLIC can be thought of as an implicit group that always includes all roles.

    Also, group roles can be set up to help manage privileges when there are many users of a database.

    A role will have the sum of privileges granted directly to it, privileges granted to any role it is a member of and privileges granted to PUBLIC.
    	REVOKE ALL ON ALL TABLES IN SCHEMA public FROM PUBLIC;		- grant access to all tables
    	GRANT ALL ON ALL TABLES IN SCHEMA public TO role_dba;
    	GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO role_dev;
    	
    	REVOKE ALL ON ALL SEQUENCES IN SCHEMA public FROM PUBLIC;		- grant access to all sequences
    	GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO role_dba;
    	GRANT SELECT, USAGE ON ALL SEQUENCES IN SCHEMA public TO role_dev;
    
    	GRANT role_dba TO role1, role2;		- grant membership in a role
    	REVOKE role_dba FROM role1;
    
    	SELECT oid, rolname FROM pg_roles WHERE  pg_has_role( 'user1', oid, 'member');  - roles where user is member
    	SELECT r.rolname as username, r1.rolname as "role"		- show members of role_dba
    		FROM pg_catalog.pg_roles r
    		JOIN pg_catalog.pg_auth_members m ON m.member = r.oid
    		JOIN pg_roles r1 ON m.roleid=r1.oid
    		WHERE r.rolcanlogin and r1.rolname = 'role_dba' ORDER BY 1;
    		
    	SELECT grantee, privilege_type FROM information_schema.role_table_grants WHERE table_name='t1';   -  show grants on a table
    	SELECT * FROM information_schema.table_privileges WHERE  grantee = 'user1';	- show all tables that user has privileges
    	
    	\dn+ or \dnS+		- list schemas and associated privileges (S - include system objects)
    	\dp tbl1  or \dp [pattern]	- list tables/views/sequences and associated privileges
    	
    			Schema |  Name   | Type  |   Access privileges   | Column access privileges 
    			--------+---------+-------+-----------------------+--------------------------
    			public | table1  | table | user=arwdDxt/grantor  |
    						 | =r/miriam     
    						 | admin=arw/grantor
    
    				rolename=xxx - privileges granted to a role
    					=xxx - privileges granted to PUBLIC
    
    				r -SELECT	w -UPDATE	a -INSERT	d -DELETE	D -TRUNCATE	x -REFERENCES	t -TRIGGER
    				X -EXECUTE	U -USAGE	C -CREATE	c -CONNECT	T -TEMPORARY	arwdDxt - all privileges (for table)
    						* -- grant option for preceding privilege	/yyy - role that granted this privilege
    							
    	REASSIGN OWNED BY role1, role2  TO role_dba;          - reassign ownership
    	SELECT * FROM pg_tables t WHERE t.schemaname = 'public' and t.tableowner = 'role1';
    In a new database, any role is allowed to create objects in the public schema. To remove this possibility, the following may be run after the database creation:
    	REVOKE ALL ON schema public FROM public;
    	GRANT ALL ON schema public TO role_dba;
    	GRANT USAGE ON schema public TO role_dev;
    Default access privileges

    ALTER DEFAULT PRIVILEGES allows setting the privileges that will be applied to objects created in the future and it does not affect privileges assigned to existing objects.

    Default privileges can be changed only for objects that will be created by the current user or by roles that the current user is a member of. The privileges can be set globally i.e for all objects created in the current database, or just for objects created in specified schemas.

    The default privileges for any object type normally grant all grantable permissions to the object owner and may grant some privileges to PUBLIC as well. However, this behavior can be changed by altering the global default privileges with ALTER DEFAULT PRIVILEGES.
    	\ddp [pattern]		- list default access privileges
    	
    	ALTER DEFAULT PRIVILEGES GRANT ALL ON TABLES TO role_dba;     - grant access to tables to be created in future by current user
    	ALTER DEFAULT PRIVILEGES GRANT SELECT, DELETE, INSERT, UPDATE ON TABLES TO role_dev;
    	ALTER DEFAULT PRIVILEGES GRANT ALL ON SEQUENCES TO role_dba;
    	ALTER DEFAULT PRIVILEGES FOR ROLE role1 GRANT ALL ON TABLES TO role_dba;     - grant access to tables to be created in future by role1
    	ALTER DEFAULT PRIVILEGES FOR ROLE role1 GRANT USAGE, SELECT ON SEQUENCES TO role_dev;
    Client authentication

    Client authentication is controlled by a configuration file pg_hba.conf, stored in the database cluster's data directory. (HBA - host-based authentication) A default pg_hba.conf file is installed when the data directory is initialized by initdb. Each record in pg_hba.conf specifies a connection type, a client IP address range (if relevant for the connection type), a database name, a user name and the authentication method to be used for connections matching these parameters.

    Authentication methods

    trust - Allow the connection unconditionally. Allows anyone that can connect to the server to login as any user they wish, without the need for a password or any other authentication
    reject - Reject the connection unconditionally. Useful for filtering out certain hosts from a group, a reject line could block a specific host from connecting, while a later line allows the remaining hosts in a specific network to connect
    password - Require the client to supply an unencrypted password for authentication. Since the password is sent in clear text over the network, this should not be used on untrusted networks

    Ident authentication

    The ident authentication method works by obtaining the client's OS user name from an ident server and using it as the allowed database user name (with optional user name mapping). This is only supported on TCP/IP connections. When ident is specified for a local (non-TCP/IP) connection, peer authentication will be used instead.

    Every unix-like OS ships with an ident server that listens on TCP port 113 by default. PostgreSQL can interrogate the ident server on the host of the connecting client and can theoretically determine the OS user for any given connection.

    The following configuration options are supported for ident
  • map - allows for mapping between system and database user names

    User name maps are defined in the ident map file, which by default is named pg_ident.conf and is stored in the cluster's data directory. The ident map file contains lines of the general form
    	map-name OS-username db-username
    Peer authentication

    The peer authentication method works by obtaining the client's operating system user name from the kernel and using it as the allowed database user name (with optional user name mapping). This method is only supported on local connections.

    The following configuration options are supported for peer
  • map - allows for mapping between system and database user names

    Peer authentication is only available on OSs providing the getpeereid() function, the SO_PEERCRED socket parameter or similar mechanisms, like Linux and Solaris.


    Encryption

    Password encryption - Passwords are stored as hashes (determined by password_encryption), so the actual password assigned to the user cannot be determined. If SCRAM or MD5 encryption is used for client authentication, the unencrypted password is never even temporarily present on the server because the client encrypts it before being sent across the network. SCRAM is preferred, because it is an Internet standard and is more secure than the PostgreSQL-specific MD5 authentication protocol.

    Encryption for specific columns - The pgcrypto module allows certain fields to be stored encrypted. The client supplies the decryption key and the data is decrypted on the server and then sent to the client. The decrypted data and the decryption key are present on the server for a brief time while it is being decrypted and communicated between the client and server. This presents a brief moment where the data and keys can be intercepted by someone with full access to the database server.

    Data partition encryption - Storage encryption can be performed at the file system level or the block level. Linux file system encryption options include eCryptfs and EncFS and block level or full disk encryption options include dm-crypt + LUKS. Other OSs support this functionality including Windows. This prevents unencrypted data from being read if the drives or the entire computer is stolen. This does not protect against attacks while the file system is mounted, because when mounted, the OS provides an unencrypted view of the data.

    Encrypting data across a network - SSL connections encrypt all data sent across the network: the password, queries and the data returned. The pg_hba.conf file allows administrators to specify which hosts can use non-encrypted connections (host) and which require SSL-encrypted connections (hostssl). Also, clients can specify that they connect to servers only via SSL. Stunnel or SSH can also be used to encrypt transmissions.

    SSL host authentication - Both the client and server can provide SSL certificates to each other. It needs some extra configuration on each side and provides stronger verification of identity than the mere use of passwords. It prevents a computer from pretending to be the server just long enough to read the password sent by the client.

    Client-side encryption - The client can encrypt the data and unencrypted data never appears on the database server. Data is encrypted on the client before being sent to the server and data has to be decrypted on the client before being used.

    Back



    BACKUP/RECOVERY

    Backups and restores

    pg_dump

    pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently and does not block other users accessing the database (readers or writers). pg_dump extracts a database into an archive or script file. Script files are plain-text files containing SQL statements and can be restored by feeding them to psql. Archive files need to be restored using the pg_restore utility.
    	pg_dump -U username dbname > dump.sql
    	
    	Options:
    	-a,--data-only
    	-c,--clean		create drop statements for the objects
    	-f f1,--file=f1		send output to file
    	-s,--schema-only
    	-t t1 or [pattern],--table=t1 or [pattern]		dump selected tables
    	-T t1 or [pattern],--exclude-table=t1 or [pattern]	exclude tables
    	-x,--no-privileges,--no-acl		do not dump privileges (grant/revoke commands)
    	
    	Command-line options for connection parameters:
    	-d,--dbname=		name of the database to connect to (equivalent to specifying dbname as the first non-option argument)
    	-h,--host=		host name of the machine on which the server is running (default=PGHOST environment variable)
    	-p,--port=		TCP port or Unix domain socket file extension on which the server is listening (default=PGPORT environment variable)
    	-U,--username=		user name to connect as
    	-w,--no-password	do not issue a password prompt
    	-W,--password		prompt for a password before connecting to a database
    	--role=			role name to be used to create the dump
    pg_dump dumps only a single database and it does not dump roles or tablespaces because those are cluster-wide and not per-database. The pg_dumpall program supports dumping of the entire contents of a database cluster. pg_dumpall backs up each database in a given cluster and also cluster-wide data such as role and tablespace definitions.
    	pg_dumpall > dumpfile
    WAL archiving

    The WAL is used in crash recovery by replaying the log entries since the last checkpoint but can also be used as an incremental backup.

    To enable WAL archiving, wal_level needs to be set to replica or higher, archive_mode to on and the shell command to use in the archive_command parameter needs to be specified in the postgresql.conf file.
    	SHOW archive_mode;
    	SHOW archive_command;
    	SHOW archive_timeout;
    archive_mode When enabled, completed WAL segments are sent to archive storage by setting archive_command. There are three modes - off, on and always. During normal operation, there is no difference between on and always, but when set to always the WAL archiver is enabled also during archive recovery or standby mode. In always mode, all files restored from the archive or streamed with streaming replication will be archived (again).

    archive_mode and archive_command are separate variables so that archive_command can be changed without leaving archiving mode. archive_mode can only be set at server start and cannot be enabled when wal_level is set to minimal.
    archive_command The local shell command to execute to archive a completed WAL file segment. This parameter can only be set in the postgresql.conf file or on the server command line. It is ignored unless archive_mode was enabled at server start. If archive_command is an empty string (the default) while archive_mode is enabled, WAL archiving is temporarily disabled, but the server continues to accumulate WAL segment files in the expectation that a command will soon be provided. archive_timeout The archive_command is only invoked for completed WAL segments. Hence, if the server generates little WAL traffic, there could be a long delay between the completion of a transaction and its safe recording in archive storage. When this parameter is > 0(default), the server will switch to a new segment file whenever this many seconds have elapsed since the last segment file switch, and there has been any database activity. Archived files that are closed early due to a forced switch are still the same length as completely full files. Therefore, a very short archive_timeout can bloat the archive storage, archive_timeout settings of a minute or so are reasonable.

    Restoring backups

    Text files created by pg_dump can be read in by the psql program. The database needs to be created beforehand.
    	psql dbname < dumpfile
    pg_restore

    pg_restore restores a PostgreSQL database from a non-text archive file created by pg_dump.
    	pg_restore -U user1 -d db1 file.bak
    	
    	Options:
    	-a,--data-only		no schema
    	-C,--create		create the database before restoring
    	-c,--clean		drop objects before creating them
    	-d db1,--dbname=db1	database to restore
    	--disable-triggers	to be used in data-only restores to avoid referential integrity errors
    	-s,--schema-only
    	-t t1,--table=t1	restore specified table only
    	
    	Command line arguments for connection parameters:
    	-h,--host=		host name on which the server is running
    	-p,--port=		tcp port or unix domain socket file extension on which the server is listening
    	-U,--username=		user name to connect as
    	-w,--no-password	do not prompt for password
    	-W,--password		prompt for password
    	--role=			role name to be used to perform the restore
    	
    Troubleshooting/Recovery

    Error codes

    53000: Insufficient memory reserved for statement - unoptimized, expensive query - use Explain
    53100: Could not extend file "base/16394/5139755": No space left on device - disk full

    Back



    HIGH-AVAILABILITY/SCALABILITY

    Replication

    PostgreSQL offers two types of replication - logical (row level) and physical (block level). Physical replication and/or WAL archiving is preferable for redundancy, high availability, backup and Point-in-time recovery (PITR). Logical replication is well suited to data integration, data movement and data aggregation, for scale-out and for distributed multi-master deployments.

    Log shipping

    Log shipping is described as directly moving WAL records from one database server to another. File-based log shipping is implemented by transferring WAL records one file (16MB) at a time. The bandwidth required varies according to the transaction rate of the primary server.

    Log shipping is asynchronous i.e WAL records are shipped after transaction commit and there is a window of data loss in case of a crash.

    Streaming Replication

    Streaming replication or record-based log-shipping allows a standby server to stay more up-to-date than file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they are generated, without waiting for the WAL file to be filled. Streaming replication by default is asynchronous but the delay between a commit and standby update is much smaller than with file-based log shipping.

    Synchronous replication

    Once streaming replication has been configured, configuring synchronous replication requires only one additional configuration step - synchronous_standby_names must be set to a non-empty value. synchronous_commit must also be set to on (default). This configuration will cause each commit to wait for confirmation that the standby has written the commit record to durable storage. synchronous_commit can be set by individual users, can be configured in the configuration file for particular users or databases or dynamically by applications in order to control the durability guarantee on a per-transaction basis.

    Setting synchronous_commit to remote_write will cause each commit to wait for confirmation that the standby has received the commit record and written it out to its own OS, but not for the data to be flushed to disk. This setting provides a weaker guarantee of durability than on does - the standby could lose the data in the event of an OS crash, though not a postgreSQL crash. However, it's a useful setting in practice because it can decrease the response time for the transaction. Data loss could only occur if both the primary and the standby crash and the database of the primary gets corrupted at the same time.

    Setting synchronous_commit to remote_apply will cause each commit to wait until the current synchronous standbys report that they have replayed the transaction, making it visible to user queries

    Trigger-Based Master-Standby Replication

    The master server asynchronously sends data changes to the standby server. The standby can answer read-only queries while the master server is running. The standby server is ideal for data warehouse queries.

    Slony-I is an example of this type of replication, with per-table granularity, and support for multiple standby servers. Because it updates the standby server asynchronously (in batches), there is possible data loss during fail over. Slony-I has the following advantages over the PostgreSQL in-built WAL-based replication - master and slave can run different versions of postgreSQL, only some of the DBs can be replicated instead of all and extra-changes to data can be implemented on the slaves.

    Setting up replication

    Configure master (in postgresql.conf)
    	listen_addresses = '10.0.12.1'	- master ip
    	wal_level = hot_standby
    	synchronous_commit = local
    	
    	archive_mode = on
    	archive_command = 'cp %p /var/lib/postgresql/ver/main/archive/%f'	- archive folder needs to be created
    
    	max_wal_senders = 2
    	wal_keep_segments = 10
    
    	synchronous_standby_names = 'pgslave01'
    Create user with replication privilege in master

    Edit pg_hba.conf file and add the following
    	# localhost
    	host    replication     repluser          127.0.0.1/32            md5
    
    	# master IP address
    	host    replication     repluser          10.0.15.10/32            md5
    
    	# slave IP address
    	host    replication     repluser          10.0.15.11/32            md5
    Configure slave (in postgresql.conf)
    	listen_addresses = '10.0.12.2'	- slave ip
    	wal_level = hot_standby
    	synchronous_commit = local
    
    	max_wal_senders = 2
    	wal_keep_segments = 10
    	
    	synchronous_standby_names = 'pgslave01'
    	hot_standby = on
    Copy data from master to slave
    	pg_basebackup -h 10.0.12.1 -U repluser -D /var/lib/postgresql/ver/main -P --xlog
    Create file replication.conf in the data directory
    	standby_mode = 'on'
    	primary_conninfo = 'host=10.0.12.1 port=5432 user=repluser password=pwd application_name=pgslave01'
    	restore_command = 'cp /var/lib/postgresql/ver/main/archive/%f %p'
    	trigger_file = '/tmp/postgresql.trigger.5432'
    Start master and slave and check status
    	select * from pg_stat_replication;

    Back



    TOOLS/UTILITIES

    psql

    psql is a terminal-based front-end to PostgreSQL. It enables typing queries interactively, issue them to PostgreSQL and see the results. Alternatively, input can be from a file or from command line arguments. psql also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating tasks.

    Connecting to a database
    	psql -hhost -pport -Uuser -Wpassword  database		- connect to database
    If the host name is omitted, psql will connect via a Unix-domain socket to a server on the local host or via TCP/IP to localhost on machines that don't have Unix-domain sockets.

    In normal operation, psql provides a prompt with the name of the database to which psql is currently connected followed by the string => where sql commands can be typed in, terminated by a semicolon.

    Options
    	--dbname=, -d		database to connect to
    	--echo-all, -a		print all non-empty input lines to standard output
    	--file=, -f		reads commands from a file, rather than standard input
    	--host=, -h		hostname
    	--log-file=, -L		write all query output into file filename, in addition to the normal output destination
    	--output=, -o		put all query output in a file
    	--password, -W		prompt for password
    	--port=, -p		port #
    	--username=, -U		username
    Meta-commands

    psql meta-commands start with a backslash and are processed by psql itself and are useful for administration and scripting.
    	\conninfo		- information about current connection
    	\encoding [encoding]	- sets the client character set encoding
    	\encoding		- shows current encoding
    	\d[S+] [pattern]	- for each relation (table/view/index/sequence), shows all columns, their types, the tablespace and 
    					associated indexes, constraints, rules and triggers
    	\db[+] [pattern]	- lists tablespaces
    	\ddp [pattern]	- lists default access privilege settings
    	\di[S+] [pattern]	- lists indexes
    	\dp or \z [pattern]	- lists tables, views and sequences with their associated access privileges
    	\ds[S+] [pattern]	- lists sequences
    	\dt[S+] [pattern]	- lists tables
    	\dT[S+] [pattern]	- lists data types (S - include system data types)
    	\dg[S+] or \du[S+] [pattern]	- lists database roles
    	\dv[S+] [pattern]	- lists views
    	\dx [pattern]		- installed extensions
    	\l[+] or \list[+] [pattern]	- list databases in the server
    	\password [user]	- change password
    	\q or \quit		- quits psql or the current script


    pgAdmin

    pgAdmin is a graphical open-source management tool for postgreSQL.

    The graphs on the Dashboard tab provide usage statistics for the selected server or database including - the server sessions, the transactions per second, tuples written, tuples fetched and block I/O (number of blocks read from the filesystem or fetched from the buffer cache (but not the OS cache) for the server or database).

    The Server activity panel displays information about sessions, locks, prepared transactions and server configuration. The icons in the sessions table can be used to terminate a session, cancel a query or view session details.

    Additional modules

    pg_stat_statements - provides a means for tracking execution statistics of all SQL statements executed by a server. The module must be loaded by adding pg_stat_statements to shared_preload_libraries in postgresql.conf, because it requires additional shared memory.

    The statistics gathered by the module are made available via a view named pg_stat_statements.
    	# postgresql.conf			
    	shared_preload_libraries = 'pg_stat_statements'		
    	pg_stat_statements.max = 10000
    	pg_stat_statements.track = all
    
    	CREATE EXTENSION pg_stat_statements;		-- activate on a database
    	
    	select * from pg_stat_statements;	-- view containing statistics collected by pg_stat_statements
    		calls	-- no. of times executed
    		max_time	-- max time spent
    		mean_time	-- mean time spent
    	
    PostGIS

    PostGIS is an extension to PostgreSQL which allows GIS (Geographic Information Systems) objects to be stored in the database. PostGIS includes support for GiST-based R-Tree spatial indexes, and functions for analysis and processing of GIS objects.
    	CREATE EXTENSION postgis;

    Back

    Maintained by: VINCENT KANDASAMY, Database Architect/Administrator (kandasf@hotmail.com)
    Last updated: Oct 22, 12:12