Some numbers that indicate PostgreSQL performance using JDBC to query and insert data. All of the tests done on the same quiescent machine.
Read performance
A simple program to run reads looks as follows:
...
Some numbers that indicate PostgreSQL performance using JDBC to query and insert data. All of the tests done on the same quiescent machine.
Read performance
A simple program to run reads looks as follows:
Code Block |
---|
/** * @author <a href="mailto:jal@etc.to">Frits Jalvingh</a> * Created on 8-6-17. */ import java.sql.*; import java.util.*; public class PgReadTest { public static void main(String args[]) { try { Class.forName("org.postgresql.Driver"); String dbURL = "jdbc:postgresql://localhost:5432/staging_12_tst"; Properties parameters = new Properties(); parameters.put("user", "jal"); parameters.put("password", "jal"); e.printStackTrace(Connection conn = DriverManager.getConnection(dbURL, parameters); } conn.setAutoCommit(false); } } |
Important parameters are fetch size and autocommit. When autocommit is ON (which it is by default) the JDBC driver exhibits horrible behavior. It read all rows in memory before returning from rs.next, causing huge processing delays and OOM if many rows are present.
The important thing here is:
- Set autocommit = false, and set fetch size reasonably high so that the driver can stream rows from the server.
Numbers, with fetchsize=65536:
- Read 24179781 rows in 32614 milliseconds, 755618.15625 rows per second
- Read 24179781 rows in 32953 milliseconds, 755618.15625 rows per second
Fetchsize = 32768
- Read 24179781 rows in 33048 milliseconds, 732720.6363636364 rows per second
- Read 24179781 rows in 32957 milliseconds, 755618.15625 rows per second
Fetchsize=1024*1024:
- Read 24179781 rows in 48519 milliseconds, 503745.4375 rows per second
- Read 24179781 rows in 48561 milliseconds, 503745.4375 rows per second
Insert tests
A simple program to test for inserts:
Code Block |
---|
/** * @author <a href="mailto:jal@etc.to">Frits Jalvingh</a> * Created on 8-6-17. */ import java.sql.*; import java.util.*; public class PgInsertTest { public static void main(String args[]) { // Important String query = "select * from staging.boekingdetail"; long lStartTime = System.currentTimeMillis(); int size = 0; try(PreparedStatement ps = conn.prepareStatement(query)) { ps.setFetchSize(65536); try(ResultSet rs = ps.executeQuery()) { int cols = rs.getMetaData().getColumnCount(); while(rs.next()) { for(int i = 1; i <= cols; i++) { try { Object val = Class.forName("org.postgresql.Driver"rs.getObject(i); String dbURL = "jdbc:postgresql://localhost:5432/datavault_12_tst"; } Properties parameters = new Properties(); parameters.put("user", "jal") size++; parameters.put("password", "jal"); } Connection conn = DriverManager.getConnection(dbURL, parameters); } conn.setAutoCommit(false); } long lEndTime // Important= System.currentTimeMillis(); long difference String= querylEndTime =- "insertlStartTime; into datavault.h_boekingdetail(id_h,source_system_id,organisation_id,load_dts,boekingdetailid) values(?,?,?,?,?)"; double rps long lStartTime = System.currentTimeMillis(double); size / (difference int size = 0/ 1000); int commitSize = 50_000; int commitCount = 0; System.out.println("Read " + size + " rows in " + difference + " milliseconds, " + rps + " rows per second"); int batchSize = 0; } catch(Exception e) { int batchCount = 0e.printStackTrace(); } try(PreparedStatement ps = conn.prepareStatement(query)) { long now = System.currentTimeMillis(); for(int row = 0; row < 1_000_000; row++) { ps.setInt(1, row + 1000_000); ps.setInt(2, 1); ps.setInt(3, 12); ps.setTimestamp(4, new Timestamp(now)); ps.setString(5, "boe" + row + "kingdetail" + row); size++; } } |
Important parameters are fetch size and autocommit. When autocommit is ON (which it is by default) the JDBC driver exhibits horrible behavior. It read all rows in memory before returning from rs.next, causing huge processing delays and OOM if many rows are present.
The important thing here is:
- Set autocommit = false, and set fetch size reasonably high so that the driver can stream rows from the server.
Numbers, with fetchsize=65536:
- Read 24179781 rows in 32614 milliseconds, 755618.15625 rows per second
- Read 24179781 rows in 32953 milliseconds, 755618.15625 rows per second
Fetchsize = 32768
- Read 24179781 rows in 33048 milliseconds, 732720.6363636364 rows per second
- Read 24179781 rows in 32957 milliseconds, 755618.15625 rows per second
Fetchsize=1024*1024:
- Read 24179781 rows in 48519 milliseconds, 503745.4375 rows per second
- Read 24179781 rows in 48561 milliseconds, 503745.4375 rows per second
Insert tests
A simple program to test for inserts:
Code Block |
---|
/** * @author <a href="mailto:jal@etc.to">Frits Jalvingh</a> * Created on 8-6-17. */ import java.sql.*; import java.util.*; public class PgInsertTest { public static void main(String args[]) { try { commitCount++; Class.forName("org.postgresql.Driver"); String batchCount++;dbURL = "jdbc:postgresql://localhost:5432/datavault_12_tst"; Properties parameters = new if(batchSize <= 0) {Properties(); parameters.put("user", "jal"); ps parameters.executeUpdate(put("password", "jal"); Connection conn if(commitCount > commitSize) {= DriverManager.getConnection(dbURL, parameters); conn.setAutoCommit(false); commitCount = 0; // Important String query = "insert into conn.commit()datavault.h_boekingdetail(id_h,source_system_id,organisation_id,load_dts,boekingdetailid) values(?,?,?,?,?)"; long lStartTime = System.currentTimeMillis(); } int size = 0; }int elsecommitSize {= 50_000; int commitCount = 0; batchCount++; int batchSize = 0; ps.addBatch(); int batchCount = 0; try(PreparedStatement ps if(batchCount >= batchSize= conn.prepareStatement(query)) { long now ps.executeBatch= System.currentTimeMillis(); batchCountfor(int row = 0; row < 1_000_000; row++) { ps.setInt(1, row if(commitCount > commitSize) {+ 1000_000); ps.setInt(2, 1); commitCount = 0; ps.setInt(3, 12); ps.setTimestamp(4, new conn.commit(Timestamp(now)); ps.setString(5, "boe" + row + "kingdetail" + }row); size++; } commitCount++; } }batchCount++; if(batchSize ><= 0) && batchCount{ > 0) { ps.executeBatchexecuteUpdate(); } if(commitCount > commitSize) { } conn.commit(); commitCount = 0; long lEndTime = System.currentTimeMillis(); long difference = lEndTime - lStartTime; conn.commit(); double rps = (double) size / (difference / 1000); } System.out.println("Inserted " + size + " rows in " + difference} + " milliseconds, " + rps + " rows per second")else { batchCount++; } catch(Exception e) { eps.printStackTraceaddBatch(); } } } |
Numbers: without batching and commit size 50.000:
- Inserted 1000000 rows in 29732 milliseconds, 34482.75862068965 rows per second
- Inserted 1000000 rows in 28303 milliseconds, 35714.28571428572 rows per second
With commit size 100.000:
- Inserted 1000000 rows in 29037 milliseconds, 34482.75862068965 rows per second
With commitsize 50.000 and batch size 1000:
- Inserted 1000000 rows in 14029 milliseconds, 71428.57142857143 rows per second
- Inserted 1000000 rows in 14292 milliseconds, 71428.57142857143 rows per second
With commit size 50.000 and batch size 10.000:
- Inserted 1000000 rows in 14322 milliseconds, 71428.57142857143 rows per second
- Inserted 1000000 rows in 14272 milliseconds, 71428.57142857143 rows per second
Removing primary key and all indices from the table
Commit size 50.000 and batch size 10.000
- Inserted 1000000 rows in 7500 milliseconds, 142857.14285714287 rows per second
- Inserted 1000000 rows in 7410 milliseconds, 142857.14285714287 rows per second
A two-fold performance increase.
Same settings but setting table to UNLOGGED:
- Inserted 1000000 rows in 7025 milliseconds, 142857.14285714287 rows per second
- Inserted 1000000 rows in 6940 milliseconds, 166666.66666666666 rows per second
Enabling "dangerous" options in PostgreSQL
Switching off fsync and full_page_writes has no effect on performance.
Inserting from the database itself, using pgsql
The following PGSQL method:
Code Block |
---|
CREATE or replace function test() returns void AS $$ DECLARE count integer; BEGIN for count in 1..1000000 loop insert into datavault.h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailidif(batchCount >= batchSize) { ps.executeBatch(); batchCount = 0; if(commitCount > commitSize) { commitCount = 0; conn.commit(); } } } } if(batchSize > 0 && batchCount > 0) { ps.executeBatch(); } } conn.commit(); long lEndTime = System.currentTimeMillis(); long difference = lEndTime - lStartTime; double rps = (double) size / (difference / 1000); System.out.println("Inserted " + size + " rows in " + difference + " milliseconds, " + rps + " rows per second"); } catch(Exception e) { e.printStackTrace(); } } } |
Numbers: without batching and commit size 50.000:
- Inserted 1000000 rows in 29732 milliseconds, 34482.75862068965 rows per second
- Inserted 1000000 rows in 28303 milliseconds, 35714.28571428572 rows per second
With commit size 100.000:
- Inserted 1000000 rows in 29037 milliseconds, 34482.75862068965 rows per second
With commitsize 50.000 and batch size 1000:
- Inserted 1000000 rows in 14029 milliseconds, 71428.57142857143 rows per second
- Inserted 1000000 rows in 14292 milliseconds, 71428.57142857143 rows per second
With commit size 50.000 and batch size 10.000:
- Inserted 1000000 rows in 14322 milliseconds, 71428.57142857143 rows per second
- Inserted 1000000 rows in 14272 milliseconds, 71428.57142857143 rows per second
Removing primary key and all indices from the table
Commit size 50.000 and batch size 10.000
- Inserted 1000000 rows in 7500 milliseconds, 142857.14285714287 rows per second
- Inserted 1000000 rows in 7410 milliseconds, 142857.14285714287 rows per second
A two-fold performance increase.
Same settings but setting table to UNLOGGED:
- Inserted 1000000 rows in 7025 milliseconds, 142857.14285714287 rows per second
- Inserted 1000000 rows in 6940 milliseconds, 166666.66666666666 rows per second
Enabling "dangerous" options in PostgreSQL
Switching off fsync and full_page_writes has no effect on performance.
Inserting from the database itself, using pgsql
The following PGSQL method:
Code Block |
---|
CREATE or replace function test() returns void AS $$
DECLARE
count integer;
BEGIN
for count in 1..1000000 loop
insert into datavault.h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailid)
values(count, 1, 12, now(), 'boe' || count || 'king' || count);
end loop;
END;
$$ LANGUAGE plpgsql; |
can be executed as:
Code Block |
---|
select test();
commit; |
and returns in 4 seconds, which means 250.000 rows per second. This seems to be an upper limit, and is less than half the speed Oracle manages (see below).
Similar checks on Oracle
Writing the same in Oracle is quite a lot faster. Without indexes and PK, but with full transaction logging the performance is:
- Inserted 1000000 rows in 1072 milliseconds, 1000000.0 rows per second
- Inserted 24000000 rows in 47155 milliseconds, 510638.2978723404 rows per second
The fastest way so far: using the postgreSQL COPY command
The fastest method I found so far do do inserts is by using PostgreSQL's copy command. I rewrote the earlier Java insert code as follows:
Code Block |
---|
/** * @author <a href="mailto:jal@etc.to">Frits Jalvingh</a> * Created on 8-6-17. */ import org.postgresql.*; import org.postgresql.copy.*; import java.io.*; import java.sql.*; import java.util.*; public class PgCopyTest { public static void main(String args[]) { try { Class.forName("org.postgresql.Driver"); String dbURL = "jdbc:postgresql://localhost:5432/datavault_12_tst"; Properties parameters = new Properties(); parameters.put("user", "jal"); parameters.put("password", "jal"); Connection conn = DriverManager.getConnection(dbURL, parameters); conn.setAutoCommit(false); // Important String query = "insert into datavault.h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailid) values(?,?,?,?,?)"; long lStartTime = System.currentTimeMillis(); int size = 24_000_000; int commitSize = 50_000_000; int commitCount = 0; int batchSize = 10000; int batchCount = 0; CopyManager cm = ((PGConnection) conn).getCopyAPI(); cm.copyIn("copy datavault.h_test from stdin with (format text)", new InsertReader(size)); conn.commit(); long lEndTime = System.currentTimeMillis(); long difference = lEndTime - lStartTime; double rps = (double) size / ((double)difference / 1000); System.out.println("Inserted " + size + " rows in " + difference + " milliseconds, " + rps + " rows per second"); } catch(Exception e) { e.printStackTrace(); } } } class InsertReader extends Reader { private final int m_size; private final StringBuilder m_buffer = new StringBuilder(); private int m_bufferIndex; private int m_count; public InsertReader(int count) { m_size = count; } @Override public int read(char[] cbuf, int off, int len) throws IOException { if(m_count >= m_size) return -1; int buflen = m_buffer.length(); int remainder = buflen - m_bufferIndex; if(remainder < len) { generateRecords(); buflen = m_buffer.length(); remainder = buflen - m_bufferIndex; } if(remainder > len) remainder = len; m_buffer.getChars(m_bufferIndex, m_bufferIndex + remainder, cbuf, off); m_bufferIndex += remainder; return remainder; } private void generateRecords() { m_buffer.replace(0, m_bufferIndex, ""); m_bufferIndex = 0; int left = m_size - m_count; if(left > 1000) left = 1000; while(left-- > 0) { addRecord(m_count++); } } private void addRecord(int nr) { m_buffer .append(Integer.toString(nr)) .append("\t") .append("1") .append("\t") .append("12") .append("\t") .append("2017-06-09 00:00:00") .append("\t") .append("boe" + nr + "king" + nr) values(count, 1, 12, now(), 'boe' || count || 'king' || count); .append("\n"); } @Override public void close() throws IOException { end loop; END; $$ LANGUAGE plpgsql; |
can be executed as:
Code Block |
---|
select test();
commit; |
and returns in 4 seconds, which means 250.000 rows per second. This seems to be an upper limit, and is less than half the speed Oracle manages (see below).
Similar checks on Oracle
Writing the same in Oracle is quite a lot faster. Without indexes and PK, but with full transaction logging the performance is:
...
}
} |
This code inserts 24 million records in the same table as the other programs. The numbers are impressive:
- Inserted 24000000 rows in 22004 milliseconds, 1090710.7798582076 rows per second
- Inserted 24000000 rows in 20949 milliseconds, 1145639.4099957037 rows per second
This is impressive, and is faster than Oracle. But the method has a lot of undesirable issues:
- The format to write each record's data in is error prone, as we need exact conversions for all possible types. Dates, for instance, are hard: you need to use the same format as defined on the db server, or set the server to some known format (which means other parallel tasks might suffer).
- It is hard to handle transactions.
Stuff that has little or no effect
For these tests I set the following postgreSQL config parameters:
Parameter | Description |
---|---|
shared_buffers = 512MB | Changing this value to 8GB has no measurable effect. This seems logical as inserts should not use that much buffer space - since a commit is included the to-disk time is the limiting factor. |
Other parameters that might be of importance were set as follows:
The latter was done because I noticed autovacuum running several times during those inserts. That seemed like a bad plan.
Links
- http://postgres.cz/wiki/PostgreSQL_SQL_Tricks
- http://rostislav-matl.blogspot.nl/2011/08/fast-inserts-to-postgresql-with-jdbc.html
- https://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/