Part 2 – Blackboard Performance Tuning: An Iterative Approach

This is Part 2 of 3 in this series:

Part 1 – Overview & Architecture

Part 2 – JVM Tuning Methods

Part 3 – Additional Tuning Methods

 

JVM PERFORMANCE TUNING

The most significant application performance gains that can be achieved are by properly tuning each Blackboard application server’s JVM. Therefore part 2 is dedicated to JVM tuning. Before examining JVM tuning options understanding the life cycle of an object in the JVM heap is valuable:

  1. Object is created in ‘Eden’
  2. Object is transferred to the free survivor space (there are two survivor spaces) within the ‘young’ generation.
  3. Object is transferred between the survivor spaces until garbage collection frees it or it has been copied a predefined number of times (default is 31.)
  4. After the object reaches the threshold it is transferred to the ‘tenured’ generation where it resides until it is ready to be freed by garbage collection.

The function of this architecture and its relevance to tuning:

  • Accessing the young generation (read, write) is inexpensive but garbage collection within the young generation is expensive. If there is not enough space in Eden or a survivor space new objects will be prematurely tenured and their access will be more expensive.
  • Accessing the tenured generation (read, write) is expensive but garbage collection within the tenured generation is inexpensive. If the tenuring threshold is too high then a large number of objects will die in the young generation and garbage collection will be more expensive.
  • The larger the young generation is the fewer minor garbage collections will take place. A larger young generation implies a smaller tenured generation which increases the frequency of major collections.

To attain ideal JVM performance objects should reside in the young generation for as long as they are actively accessed and should be tenured when they are about to die or access to them drops off. Depending on the nature of the application it may be valuable to control the distribution of garbage collection as well (smaller GCs in the young generation result in shorter pauses than large collections in the tenured generation.) The following can be configured to affect the behavior  and potentially improve a JVM’s performance (these settings can be configured in Blackboard’s main configuration file, blackboard\config\bb-config.properties):

NOTE: The following recommended configurations are the result of analyzing Blackboard’s tuning recommendations and case studies and the author’s experience. These are not the direct recommendations of Blackboard. Furthermore, these are recommended baseline configurations to be adjusted by a test-driven approach to tuning.

REQUIRED CONFIGURATIONS

Heap Size – The size of the heap is the combined size of the young and tenured generation. This value is limited by available hardware and the memory requirements of the OS and other applications. The heap should be as large as possible without constricting the host OS.

Parameters (should be set equal):

bbconfig.min.heapsize.tomcat=<value>

bbconfig.max.heapsize.tomcat=<value>

Permanent Generation Size – The permanent generation contains permanent objects and global variables dedicated to the JVM.

Parameter:  bbconfig.max.permsize.tomcat=<value>

The recommended baseline combined heap and permanent generation size is approximately one-half the total available RAM. For every 4 GB of heap memory a minimum of 256 MB should be dedicated to the permanent generation. For instance, if the total available RAM is 24 GB, then the size might be 12 GB and the permanent generation would be 768 MB.

Stack Size – The size of each thread. This size should be set based on the requirements of specific version of the Blackboard application. The default stack size for a given version is sufficient, but Blackboard provides further analysis of suggested stack sizes for various versions in their knowledge base on the behind the blackboard website.

Parameter: bbconfig.max.stacksize.tomcat=<value>

Maximum Threads – The maximum number of threads permitted to exist within the heap. This cap should prevent the heap from running out of memory. The total number of threads that can exist without risking overflow is determined by the size of the Java heap and the maximum thread size.

The recommended baseline maximum number of threads is approximately 150 – 170 times the Java heap size (in GB.) Thus, if the heap is 2.5 GB an appropriate maximum threads setting would be 160 * 2.5 or 400 threads.

Parameter: bbconfig.appserver.maxthreads=<value>

OPTIONAL CONFIGURATIONS

Young to Tenured Generation Ratio – Determines the amount of space available to each generation.The young generation should be large enough to ensure that objects are not prematurely tenured. It is important to fine tune this ratio but it is often better to configure a slightly larger young generation than necessary rather than risk objects being prematurely tenured.

The recommended baseline ratio of tenured to young generation is approximately 1:4. It is often more desirable to specify the size of the young generation explicitly because it allows for a more precise configuration (the tenured generation will be automatically sized to heap minus young generation.)

Parameters:

-XX:NewRatio=<value> – The ratio of the tenured generation to the young generation.

-XX:NewSize=<value> – The size of the young generation in megabytes (tenured generation is inferred from heap and young generation sizes.)

-XX:MaxNewSize=<value> – The maximum size of the young generation in megabytes (tenured generation is inferred from heap and young generation sizes.)

Survivor Space Size – Determines the amount of space available in each survivor space (and indirectly the size of Eden.) The survivor spaces should be large enough to ensure that objects are not prematurely tenured. The default value is adequate for a baseline configuration.

Parameter:

-XX:SurvivorRatio=<value> – The ratio of the size of each survivor space (there are two) to the size of Eden. For example a value of 10 means that each survivor space is 1\10 the size of Eden and therefore 1\12 the size of the young generation or 1\12 (space 0) + 1\12 (space 1) + 10\12 (Eden.)

Tenuring Threshold – Determines the number of times that an object can be copied between survivor spaces before being tenured. This value should be high enough to enough to ensure that as many objects as possible stay in the young generation as long as they are actively needed but low enough to ensure that objects are tenured when they are no longer needed or about to die. The default value is adequate for a baseline configuration.

Parameter:

-XX:MaxTenuringThreshold=<value> – The number of times that an object is copied before being tenured (the default is 31.)

Garbage Collector Type – The size, lifespan, and manipulation of the objects that reside within the heap determine the effectiveness of garbage collection. There are a variety of garbage collectors to suit different applications (i.e. throughput collector, concurrent low-pause collector, incremental low-pause collector.) The default collector is adequate for a baseline configuration; however after testing most institutions find that the concurrent low-pause collector the most effective choice for Blackboard.

Parameters:

-XX:+UseParallelGC – The throughput collector uses the default tenured collector and a parallel version of the young generation collector.

-XX:+UseConcMarkSweepGC – The concurrent low-pause collector performs tenured collection concurrently with the execution of the application. The application is briefly paused during collection.

-XX:+UseParNewGC – Can be used in conjunction with the concurrent low-pause collector. This switch enables parallel young generation garbage collection in conjunction with the concurrent collections and is valuable in multi-core\processor environments.

-Xincgc – The incremental low-pause collector collects a portion of the tenured generation at each minor collection. It is slower than the default collector but minimizes long pauses from major collections.

Note that -XX:+UseParallelGC, -XX:+UseConcMarkSweepGC, and -Xincgc are mutually exclusive. Using any of these switches together will result in unpredictable behavior.


Additional Switches
– There are a very large number of additional JVM tuning options most of which are not covered here. This article will cover those options found to be most common and valuable to tuning the Blackboard application. Sun’s website should be consulted for more information and an exhaustive list of these options (i.e. http://blogs.sun.com/watt/resource/jvm-options-list.html.) None of these additional options should be introduced to the baseline configuration. They should be added and tested one at a time to determine their impact on performance.

Common Parameters:

-XX:+UseTLAB – Enables thread-local object allocation and is faster than the default atomic operation.

+XX:UseISM – Enabled intimate shared memory which reduces the overhead of virtual to physical address translation when using larger heaps (Solaris only.)

-XX:+CMSParallelRemarkEnabled (only used in combination with ParNewGC) – Decreases remark pauses.

-XX:ParallelGCThreads=<value> – The number of parallel threads that the JVM uses to perform garbage collection in the young generation (the default is the number of number of processors.)

 

CASE STUDY

You are the Blackboard Server Administrator for a large institution of several hundred thousand users. Assuming that the application is already in place, tune the Blackboard application and its environment. The environment is as follows:

8 application servers

2 collaboration servers

8 processors per server (3.2 GHz)

20 GB RAM per server

Windows 2008 R2

Blackboard 9.1 Enterprise with Content and Community


The following could be an appropriate baseline configuration for each server:

Heap – 10000m

Perm Gen Size – 768m

Stack Size – 320k

Max threads – 1500

Young Generation – 2500m

Additional Switches – none

 

This would appear as the following in bb-config.properties:

bbconfig.appserver.maxthreads=1500

bbconfig.min.heapsize.tomcat=10000m

bbconfig.max.heapsize.tomcat=10000m

bbconfig.max.permsize.tomcat=768m

bbconfig.max.stacksize.tomcat=320k

bbconfig.jvm.options.extra.tomcat=-XX:NewSize=2500m -XX:MaxNewSize=2500m

 

The next step is to load test the application with the baseline configuration. From the load test the application responsiveness should be recorded. It is also valuable analyze and record garbage collection performance which is logged in blackboard\logs\tomcat\gc.log:

Total time for which application threads were stopped: 0.0000696 seconds

Application time: 119.2037722 seconds

Total time for which application threads were stopped: 0.0003771 seconds

Application time: 600.1297291 seconds

Total time for which application threads were stopped: 0.0006816 seconds

Application time: 0.0184405 seconds

Total time for which application threads were stopped: 0.0000854 seconds

Application time: 335.6940502 seconds

81399.961: [GC 81399.961: [ParNew: 249951K->3801K(276480K), 0.0107200 secs] 585391K->339241K(1198080K), 0.0108022 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]

Total time for which application threads were stopped: 0.0112471 seconds

Application time: 0.0004188 seconds

Total time for which application threads were stopped: 0.0001240 seconds

Application time: 0.0000176 seconds

Total time for which application threads were stopped: 0.0000670 seconds

Application time: 0.0000276 seconds

Total time for which application threads were stopped: 0.0000674 seconds

Application time: 0.0000260 seconds

Total time for which application threads were stopped: 0.0000676 seconds

Application time: 0.0000139 seconds

“Total time for which application threads were stopped” indicates an application pause.

“81399.961: [GC 81399.961: [ParNew: 249951K->3801K(276480K), 0.0107200 secs] 585391K->339241K(1198080K), 0.0108022 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]” indicates a full garbage collection.

After the baseline configuration is tested and the results recorded, tests of individual variables can begin. The most obvious method of tuning is to change the value of one variable, load test, and repeat. However, it is possible to save a great deal of time if using a load balanced environment with identical application servers (ideally clones.) Since the test applies a controlled load, setting the load balancer to a round-robin routing configuration will apply an identical load to the identical servers. Configuring a single variable differently on each node allows the complete testing of that variable with a single load test. In our example one might choose to tune the size of the young generation first. They would set the size of the young generation as follows on each of the eight application servers: 1800, 1850, 1900, 1950, 2050, 2100, 2150, 2200 (the baseline test for 2000 already exists.) If ideal performance does not fall in this range (performance continues to increase from 1800 to 2200 or from 2200 to 1800) another test should be performed in another range. If the test of say ‘2150’ performed the best, one might wish to narrow the value further by running a test of: 2110, 2120, 2130, 2140, 2160, 2170, 2180, 2190.

Once the first variable has been adequately tested and tuned it should be set back to its baseline configuration and a test of the next variable should begin. This process should proceed until all variables that the admin wishes to test are tuned. Optionally, the system can be tuned further by using the new configuration as a new baseline and repeating the testing\tuning process as many times as needed.. This helps to account for the unpredictable nature of cumulative changes that multiple variable configurations can have. For example the configuration of two separate variables may independently improve performance, but the combined change may not yield the same performance gain or worse cause a performance loss.

A sample configuration of the tuned JVM might be:

bbconfig.appserver.maxthreads=1500

bbconfig.min.heapsize.tomcat=10000m

bbconfig.max.heapsize.tomcat=10000m

bbconfig.max.permsize.tomcat=768m

bbconfig.max.stacksize.tomcat=320k

bbconfig.jvm.options.extra.tomcat=-XX:NewSize=2500m -XX:MaxNewSize=2500m -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:ParallelGCThreads=6 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=16 -XX:MaxTenuringThreshold=28

Advertisements

About blackboardcodemonkey
My name is Tony West. I am currently a Blackboard server administrator and software developer for a consortium of universities in the mid-west. I was employed by Blackboard from 2009 – 2010 out of Rochester, New York. I am passionate about application development and administration.

One Response to Part 2 – Blackboard Performance Tuning: An Iterative Approach

  1. panache bra says:

    I enjoy your post. I’ve already shared it on my digg account.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: