/
Pentaho: lookup component observations

Pentaho: lookup component observations

The lookup, when caching, loads each row in memory. The whole list gets loaded by getRows() and returns a List<Object[]). Each list entry is a row, each object entry is a column.

The row object[] gets allocated as (resultset.getColumnCount + 10). So for our 4 columns we allocate an Object[14] for each row. This means (16+14*4) = 72 bytes per row. For 24 mln rows: 1.647 GB

For each data type we need to add storage too. For the bookingdetail line this is:

WhatTypeExampleCalculationSize
keyStringGM|76614952|KOSTENPLAATS|101720375W40 + 35*2110
organizationLong
2424
exists_in_dvLong
2424
surrogate_keyLong
2424
Total182

For 24 mil rows this means: 4.165GB memory.

Total for the whole cache is 5.812 GB


Data collected while loading cache in new streaming code:

After load of 8.6 mln recs:

After 9.3mln


Experimental code changes

Rewrote lookup cache load:

  • Load rows streaming instead of making a zillion copies.
  • Force use readonly cache which caches badly but less horrible than DefaultCache

Loading 10 million rows now shows:

20 million Object[] is caused by the expensive split in lookup data and result data (one Object[][] array for each).

The number of Long objects is caused by very sad storage for cached data: 2 of the columns use a small int value, but they are stored as Long instances by reference (at 460MB costs).



Sizes of Java structures on a 64bit JVM

The following seem to be the sizes of Java objects. Please remember that complete Java objects (instances) are always 8-byte aligned, so an Object's size is always a multiple of 8 bytes.

Size (bytes)Rounded sizeWhat
1216An instance of Object (but rounded size is 16 bytes due to alignment)
4
Size of an int in Object
8
Size of a long or double in an object
4
Object pointer (surprising, but probably due to pointer compression)
1616Array base size (12 bytes object, 4 bytes length). Will be followed by length * datatype size
40 + 2*n
String(n): 24 bytes for String object (object, char[] reference, start, end), 16 bytes for char[] plus 2*n bytes for string length
2424Long wrapper