Category Archives: PostgreSQL

PostgreSQL: Backup and Restoring a Database

Backups are performed by a utility called ‘pg_dump’  while  ‘psql’ can be used to restore from a backup file. Normally, people use the utility ‘pg_restore’, however, this utility requires the user to backup the database using a custom or tarball  format. If you use plain text, which is the default format when using ‘pg_dump’, then you must use ‘psql’ to restore the database. The current database I’m working with is still quite small, weighing in at about 645 megabytes.

To backup the database I typed:

pg_dump -v  -O  --host=www.alexkuo.info --port=5432 --username=alex database_name > database_name.sql

The ‘-v’ flag turns on verbose mode and the -O flag skips any attempt to set a owner to the database. Other options that I tested were the -C flag for adding a create database statement and the ‘-c’ flag for adding drop table statements. I omitted the ‘-C’ and the ‘-c’ flags because I decided to restore into an existing database. Without the specification of file sizes and format, pg_dump will create one plain text file for the backup.

Some file systems have limitations on the maximum file size. NTFS file sizes are limited by the Max volume size of the drive – which is two terabytes, while FAT32 is limited to about four gigabytes. The EXT2 file system has a max file size of two terabytes. Max file sizes for Ext3 and Ext4 vary depending on block sizes, between 16 gigabytes to 2 terabytes. If needed, you can split files by piping into the split utility. The split utility is found only on Linux, Unix, and Cygwin.

To do this type:

pg_dump database_name | split -bytes=100m backup.bak.

This splits the plain text format of the backup into 100 megabyte pieces with the file named as ‘backup.bak.XXX’ where XXX is a digit.

If needed, you can append the backed up files when restoring by typing:

cat backup.bak. | psql -d database_name

If you’re restoring from a single plain text file, type:

psql -d database_name -f database_name.sql

If the database exists, errors will be thrown and the restore will fail – unless you specified the ‘-c’ flag in ‘pg_dump’ to add the ‘drop tables’ statements.

If you used the plain text format for your backup, creating an archive from the backup file can reduce the file size. For myself, zipping the backup file reduced the size from 695 megabytes to 60 megabytes.

For more information on file systems or backing up databases, please consult these links:

PostgreSQL: Notes about Quotes and Identifiers

Last night I was playing around with schemas and found some oddities with quotes. In version 8.0+, you can use ‘$$’ symbols to replace quotes. This gives the programmer some flexibility in terms of having to escape quotes within the text body. For example:

CREATE OR REPLACE FUNCTION NoticeNoEscaping() RETURNS VOID AS
$$
BEGIN
Raise Notice "See, no escaping required";
END;
$$
LANGUAGE 'plpgsql';

When creating tables, I noticed that defining a column name with quotes and without quotes results in two different column names.

Create Table QuotedColumn
(
“Id” Serial,
Primary Key(“Id”)
);

The following would throw an error because of missing quotes.

Create Table QuotedColumn
(
“Id” Serial,
Primary Key(Id)
);

I suggest dropping the quotes all together. To correct the error above, you would type:

Create Table UnQuotedColumn
(
Id Serial,
Primary Key(Id)
);

I also encountered a similar problem while implementing a schema. For this example, a table is being added to the schema, ‘test_schema’.

Create Schema “test_schema”;

Create Table “test_schema”.Customers
(
Id Serial,
Primary Key(Id)
);

When creating schemas using PgAdmin, quotes are automatically insert around the schema name. This results in having to use quotes to define the schema prefix each time a table within the schema is referenced. Try the following instead.

Create Schema test_schema;

Set search_path To test_schema;

Create Table Customers
(
Id Serial,
Primary Key(Id)
);

 In the above snippet, the schema,  ‘test_schema’, can now be referenced without quotes. Not only that, but I also set the schema for all code to automatically reference ‘test_schema’ by using the command ‘Set search_path To <schema name>;’.

If you would like more information on the topics discussed, please consult the following links:

PostgreSQL: PL/PGSQL, Arrays, and Batches

When developing in T-SQL on SQL Server, I sometimes need to update or process multiple rows at once. This involved either multiple round trip calls to the database or sending a delimited string that would eventually be parsed into a matrix and then inserted into a table. Once you figured out the process for such operations, implementing a solution was quick.

In PostgreSQL, the need for such operations isn’t as dire because of the array data type. Instead of parsing and formatting data into a delimited string, arrays can be passed directly to a function (stored procedure). The prime benefit is less code to maintain and standardized functionality. In version 8.2, only basic data types are supported. Composite, user defined, and domain data types are not supported by arrays.

In order to create an array simply type:

Select ARRAY['one', 'two', 'three'];

In order to create a multidimensional array, type:

Select ARRAY[['a','one'], ['b','two'], ['c', 'three']];

Keep in mind when creating arrays, all data types in each element must be the same. Mixing data types in the same array will result in an error.

Now that we know how to create arrays. Let’s create a function to process these arrays. Consult the following example:

CREATE OR REPLACE FUNCTION MultiArrayRead(stuff varchar[][]) RETURNS VOID AS
$$
BEGIN
FOR i in array_lower(stuff, 1) .. array_upper(stuff, 1) LOOP

Raise Notice '%, %', stuff[i][1]::varchar, stuff[i][2]::varchar;

END LOOP;
END;
$$
LANGUAGE 'plpgsql';

If you’ve programmed in any language, the keywords and the structure of this code snippet should be fairly self-explanatory.

The first line can be broken down into the following:

‘CREATE OR REPLACE FUNCTION’ – In T-SQL, you had to define a statement that checked if a stored procedure existed and then called a command to drop the function. In pgsql, this one snippet takes care of detecting and dropping a function (stored procedure) of the same name or signature.

‘MultiArrayRead(stuff varchar[][])’ – ‘MultiArrayRead’ is the function name. The parameters accepted are defined between the parenthesis. In this case a two dimensional matrix of varchar elements called ‘stuff’. Please note that a varchar without a length defined as opposed to a varchar data type with a length( such as ‘varchar(30)’), will accept any length of string… well not any …. up to 1 GB, I think.

‘RETURNS VOID AS’ – Defines the return data type. This function returns nothing.

The ‘$$’ symbols replace the traditional single or double quotes. I’m using these symbols because it frees the developer from having to escape quotes more often within the definition of the function.

The fourth line of the snippet defines the beginning of the For loop. ‘array_lower(array_variable, matrix_column)’ and ‘array_upper(array_variable, matrix_column)’ return the first index  and the last index of the array. If your matrix has more then one column, you can return different bounds for a matrix’s columns by incrementing the second parameter.

I mentioned Batch commands in the title of the article. After passing a matrix and any other variables needed to modify  or retrieve information, iterate through the commands you would like to execute in the loop.

For Example:

CREATE OR REPLACE FUNCTION MultiArrayWrite(stuff varchar[][]) RETURNS VOID AS
$$
BEGIN
FOR i in array_lower(stuff, 1) .. array_upper(stuff, 1) LOOP

Insert into example_table(id, description) values
(stuff[i][1]::int, stuff[i][2]:varchar);

END LOOP;
END;
$$
LANGUAGE 'plpgsql';

The ‘::data_type’ (like in ‘stuff[i][1]::int’ ) appended to each element is used for casting the element to the data type defined in the columns of ‘example_table’.

Calling this function can be done by executing the following:

Select MultiArrayWrite(ARRAY[['1','one'], ['2','two'], ['3', 'three']]);

This is just a brief overview of arrays and batch executions. If you would like more information on these topics, please consult the following links.

PostgreSQL: What is a Domain?

After running PgAdmin III and opening up a database schema, you may notice an item listed as ‘Domain’.  Think of domains as a custom data type that can be put together using data types already found in PostgreSQL along with the ability to set rules or constraints on this data type.

Domains are used to define a commonly used data object found through out a database. For example – A phone number object can be defined as a domain and reused through out a schema. You can define this phone number domain as 13 characters and include a constraint to match a regular expression that checks for a specific format. Once the domain has been defined, you can reuse the same domain in different tables to define the data type used for a specific column.

Launch the Bash shell and open up a sandbox database using psql.

I typed ‘$ psql -d <database_name>’

Next I created a domain for a phone number by typing the following:

CREATE DOMAIN phone_number as CHAR(13)
CHECK( VALUE ~ '((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}' );

Now, let’s add this domain to a new table.

CREATE TABLE customers 
(
id Serial Unique Not Null,
name varchar,
phone phone_number
);

We can now test to see if the constraints are enforced for the phone number.

INSERT INTO CUSTOMERS(name, phone) VALUES('Alex', '(123)123-1234');

INSERT INTO CUSTOMERS(name, phone) VALUES('Bob', '3213231313');

The second insert statement for bob should fail because it does not match the regular expression check for the phone_number data type or domain.  The convenience of creating the phone_number data type is that we can now reuse the same data type in other tables that may need to store phone numbers, giving us a central point of maintenance for adding or removing constraints and checks for this type of data.

If you would like more information on this topic, please check out the following links: