Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Some numbers that indicate PostgreSQL performance using JDBC to query and insert data. All of the tests done on the same quiescent machine.

...

Switching off fsync and full_page_writes has no effect on performance.

Using driver option "reWriteBatchedInserts" = "true"

A great tip fromĀ Babu Chellappa was to set this option for the driver. In the code above this can be done by adding:

Code Block
parameters.put("reWriteBatchedInserts", "true");

in the top of the method.

The effects are really good:

  • Inserted 1000000 rows in 2535 milliseconds, 394477.3175542406 rows per second
  • Inserted 1000000 rows in 2553 milliseconds, 391696.0438699569 rows per second

This is quite an increase! Thanks a lot, Babu!

Inserting from the database itself, using pgsql

...

Writing the same in Oracle is quite a lot faster. Without indexes and PK, but with full transaction logging the performance is:

  • Inserted 1000000 rows in 1072 milliseconds, 1000000.0 rows per second
  • Inserted 24000000 rows in 47155 milliseconds, 510638.2978723404 rows per second

The fastest way so far: using the postgreSQL COPY command

The fastest method I found so far do do inserts is by using PostgreSQL's copy command. I rewrote the earlier Java insert code as follows:




Code Block
/**
 * @author <a href="mailto:jal@etc.to">Frits Jalvingh</a>
 * Created on 8-6-17.
 */
import org.postgresql.*;
import org.postgresql.copy.*;

import java.io.*;
import java.sql.*;
import java.util.*;

public class PgCopyTest {
   public static void main(String args[]) {
      try {
         Class.forName("org.postgresql.Driver");
         String dbURL = "jdbc:postgresql://localhost:5432/datavault_12_tst";
         Properties parameters = new Properties();
         parameters.put("user", "jal");
         parameters.put("password", "jal");

         Connection conn = DriverManager.getConnection(dbURL, parameters);
         conn.setAutoCommit(false);                // Important

         String query = "insert into datavault.h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailid) values(?,?,?,?,?)";
         long lStartTime = System.currentTimeMillis();
         int size = 24_000_000;
         int commitSize = 50_000_000;
         int commitCount = 0;
         int batchSize = 10000;
         int batchCount = 0;


         CopyManager cm = ((PGConnection) conn).getCopyAPI();

         cm.copyIn("copy datavault.h_test from stdin with (format text)", new InsertReader(size));
         conn.commit();

         long lEndTime = System.currentTimeMillis();
         long difference = lEndTime - lStartTime;
         double rps = (double) size / ((double)difference / 1000);
         System.out.println("Inserted " + size + " rows in " + difference + " milliseconds, " + rps + " rows per second");
      } catch(Exception e) {
         e.printStackTrace();
      }
   }
}

class InsertReader extends Reader {
   private final int m_size;

   private final StringBuilder m_buffer = new StringBuilder();

   private int m_bufferIndex;

   private int m_count;

   public InsertReader(int count) {
      m_size = count;
   }

   @Override public int read(char[] cbuf, int off, int len) throws IOException {
      if(m_count >= m_size)
         return -1;
      int buflen = m_buffer.length();
      int remainder = buflen - m_bufferIndex;
      if(remainder < len) {
         generateRecords();
         buflen = m_buffer.length();
         remainder = buflen - m_bufferIndex;
      }

      if(remainder > len)
         remainder = len;
      m_buffer.getChars(m_bufferIndex, m_bufferIndex + remainder, cbuf, off);
      m_bufferIndex += remainder;
      return remainder;
   }

   private void generateRecords() {
      m_buffer.replace(0, m_bufferIndex, "");
      m_bufferIndex = 0;
      int left = m_size - m_count;
      if(left > 1000)
         left = 1000;

      while(left-- > 0) {
         addRecord(m_count++);
      }
   }

   private void addRecord(int nr) {
      m_buffer
         .append(Integer.toString(nr))
         .append("\t")
         .append("1")
         .append("\t")
         .append("12")
         .append("\t")
         .append("2017-06-09 00:00:00")
         .append("\t")
         .append("boe" + nr + "king" + nr)
         .append("\n");
   }

   @Override public void close() throws IOException {
   }
}

...

  • The format to write each record's data in is error prone, as we need exact conversions for all possible types. Dates, for instance, are hard: you need to use the same format as defined on the db server, or set the server to some known format (which means other parallel tasks might suffer).
  • It is hard to handle transactions.


Stuff that has little or no effect

Parameter changes

For these tests I set the following postgreSQL config parameters:

ParameterDescription
shared_buffers = 512MBChanging this value to 8GB has no measurable effect. This seems logical as inserts should not use that much buffer space - since a commit is included the to-disk time is the limiting factor.


Other parameters that might be of importance were set as follows:


work_mem 512MB
synchronous_commit off
commit_delay 100000
autovacuum_naptime 10min

The latter was done because I noticed autovacuum running several times during those inserts. That seemed like a bad plan.

Using unlogged tables

Setting the table to unlogged had no apparent effect.


...