Part 3 – Blackboard Performance Tuning: An Iterative Approach

This is Part 3 of 3 in this series:

Part 1 – Overview & Architecture

Part 2 – JVM Tuning Methods

Part 3 – Additional Tuning Methods

 

In Part 1 the following was presented as the top-down list of resources related to the Blackboard application that affect its performance:

  1. Software
    • Blackboard Application
    • JVM
    • Tomcat
    • Web Server (IIS, Apache)
  2. Operating System and Services
  3. Server Hardware
  4. Network Architecture and Hardware

Part 2 discussed in detail tuning a Java Virtual Machine for use with the Blackboard application. The JVM was discussed first because it has the greatest potential performance gains than tuning any other single item on the list. This section contains suggestions for tuning the other items on the list. Please note that this section and this series are not exhaustive. For complete tuning practices please consult the documentation for each individual technology. There are additional resources listed at the end of this section.

 

BLACKBOARD APPLICATION TUNING

Garbage Collection Timeout – This value determines how long a garbage collection is permitted to run before it is forced to terminate (default is 30 seconds.) A system under a heavy load will sometimes require a garbage collection longer that will take longer than the default timeout permits. If this occurs the garbage collection will timeout and the heap will eventually run out of memory causing the JVM (and subsequently the Blackboard application) to crash. A good practice is to double the GC timeout to 60 seconds although this may need further adjustment for some systems.

The timeout is set by the wrapper.ping.timeout parameter and must be changed in both apps/tomcat/conf/wrapper.conf and config/tomcat/conf/wrapper.conf.bb.

Other Timeouts – Timeouts for session, assessments, and so on can be set within Blackboard. Generally, shorter timeouts result in improved performance but diminished stability and longer timeouts result in diminished performance but improved stability.

Assessment Timeout – For example “<session-timeout>20</session-timeout>” can be set within blackboard/webapps/assessment/WEB-INF/web.xml. See http://kb.blackboard.com/display/KB/Preventing+assessment+timeouts

Session Timeout – See http://kb.blackboard.com/display/KB/Modifying+the+default+timeout+session.

OPERATING SYSTEM TUNING (i.e. Windows Server)

Processor Scheduling – Performance gain may be achieved if processor scheduling is set to “Background Services.”

Disable Unused Services – Performance gain may be achieved by disabling any unnecessary services.

Paging File – Paging file size may be considered as another variable in the tuning process to test against the baseline configuration. Properly tuning the paging file size may achieve performance gains.

TCP Chimney – Disabling TCP Chimney may achieve performance gain. This can be accomplished by running “netsh int tcp set global chimney=disabled” at a command prompt.

WEB SERVER TUNING (i.e. IIS)

Allow Persistent Connections – Represented by the parameter bbconfig.webserver.keepalive in bb-config.properties. The recommended setting is 1.

Persistent Connection Timeout – Represented by the parameter bbconfig.webserver.keepalivetimeout in bb-config.properties. The recommended setting is 15.

HTTP Compression – Represented by the parameter bbconfig.webserver.compression in bb-config.properties. The recommended setting is ‘Yes’

 

ADDITIONAL RESOURCES

Blackboard

http://kb.blackboard.com/display/KB/Windows+2008+Performance+Guide

http://kb.blackboard.com/display/KB/Windows+2003+Performance+Guide

http://kb.blackboard.com/display/KB/The+Java+Service+Wrapper

http://kb.blackboard.com/display/KB/Preventing+assessment+timeouts

http://kb.blackboard.com/display/KB/Modifying+the+default+timeout+session

https://behind.blackboard.com

Document Library > Blackboard Learn 9.1: Performance Optimization Guide

Document Library > Hardware Sizing Guides

Java

http://blogs.sun.com/watt/resource/jvm-options-list.html

http://www.oracle.com/technetwork/java/tuning-139912.html

Windows Server

http://www.microsoft.com/whdc/system/sysperf/perf_tun_srv.mspx

http://www.microsoft.com/whdc/system/sysperf/Perf_tun_srv-R2.mspx

Advertisements

Part 2 – Blackboard Performance Tuning: An Iterative Approach

This is Part 2 of 3 in this series:

Part 1 – Overview & Architecture

Part 2 – JVM Tuning Methods

Part 3 – Additional Tuning Methods

 

JVM PERFORMANCE TUNING

The most significant application performance gains that can be achieved are by properly tuning each Blackboard application server’s JVM. Therefore part 2 is dedicated to JVM tuning. Before examining JVM tuning options understanding the life cycle of an object in the JVM heap is valuable:

  1. Object is created in ‘Eden’
  2. Object is transferred to the free survivor space (there are two survivor spaces) within the ‘young’ generation.
  3. Object is transferred between the survivor spaces until garbage collection frees it or it has been copied a predefined number of times (default is 31.)
  4. After the object reaches the threshold it is transferred to the ‘tenured’ generation where it resides until it is ready to be freed by garbage collection.

The function of this architecture and its relevance to tuning:

  • Accessing the young generation (read, write) is inexpensive but garbage collection within the young generation is expensive. If there is not enough space in Eden or a survivor space new objects will be prematurely tenured and their access will be more expensive.
  • Accessing the tenured generation (read, write) is expensive but garbage collection within the tenured generation is inexpensive. If the tenuring threshold is too high then a large number of objects will die in the young generation and garbage collection will be more expensive.
  • The larger the young generation is the fewer minor garbage collections will take place. A larger young generation implies a smaller tenured generation which increases the frequency of major collections.

To attain ideal JVM performance objects should reside in the young generation for as long as they are actively accessed and should be tenured when they are about to die or access to them drops off. Depending on the nature of the application it may be valuable to control the distribution of garbage collection as well (smaller GCs in the young generation result in shorter pauses than large collections in the tenured generation.) The following can be configured to affect the behavior  and potentially improve a JVM’s performance (these settings can be configured in Blackboard’s main configuration file, blackboard\config\bb-config.properties):

NOTE: The following recommended configurations are the result of analyzing Blackboard’s tuning recommendations and case studies and the author’s experience. These are not the direct recommendations of Blackboard. Furthermore, these are recommended baseline configurations to be adjusted by a test-driven approach to tuning.

REQUIRED CONFIGURATIONS

Heap Size – The size of the heap is the combined size of the young and tenured generation. This value is limited by available hardware and the memory requirements of the OS and other applications. The heap should be as large as possible without constricting the host OS.

Parameters (should be set equal):

bbconfig.min.heapsize.tomcat=<value>

bbconfig.max.heapsize.tomcat=<value>

Permanent Generation Size – The permanent generation contains permanent objects and global variables dedicated to the JVM.

Parameter:  bbconfig.max.permsize.tomcat=<value>

The recommended baseline combined heap and permanent generation size is approximately one-half the total available RAM. For every 4 GB of heap memory a minimum of 256 MB should be dedicated to the permanent generation. For instance, if the total available RAM is 24 GB, then the size might be 12 GB and the permanent generation would be 768 MB.

Stack Size – The size of each thread. This size should be set based on the requirements of specific version of the Blackboard application. The default stack size for a given version is sufficient, but Blackboard provides further analysis of suggested stack sizes for various versions in their knowledge base on the behind the blackboard website.

Parameter: bbconfig.max.stacksize.tomcat=<value>

Maximum Threads – The maximum number of threads permitted to exist within the heap. This cap should prevent the heap from running out of memory. The total number of threads that can exist without risking overflow is determined by the size of the Java heap and the maximum thread size.

The recommended baseline maximum number of threads is approximately 150 – 170 times the Java heap size (in GB.) Thus, if the heap is 2.5 GB an appropriate maximum threads setting would be 160 * 2.5 or 400 threads.

Parameter: bbconfig.appserver.maxthreads=<value>

OPTIONAL CONFIGURATIONS

Young to Tenured Generation Ratio – Determines the amount of space available to each generation.The young generation should be large enough to ensure that objects are not prematurely tenured. It is important to fine tune this ratio but it is often better to configure a slightly larger young generation than necessary rather than risk objects being prematurely tenured.

The recommended baseline ratio of tenured to young generation is approximately 1:4. It is often more desirable to specify the size of the young generation explicitly because it allows for a more precise configuration (the tenured generation will be automatically sized to heap minus young generation.)

Parameters:

-XX:NewRatio=<value> – The ratio of the tenured generation to the young generation.

-XX:NewSize=<value> – The size of the young generation in megabytes (tenured generation is inferred from heap and young generation sizes.)

-XX:MaxNewSize=<value> – The maximum size of the young generation in megabytes (tenured generation is inferred from heap and young generation sizes.)

Survivor Space Size – Determines the amount of space available in each survivor space (and indirectly the size of Eden.) The survivor spaces should be large enough to ensure that objects are not prematurely tenured. The default value is adequate for a baseline configuration.

Parameter:

-XX:SurvivorRatio=<value> – The ratio of the size of each survivor space (there are two) to the size of Eden. For example a value of 10 means that each survivor space is 1\10 the size of Eden and therefore 1\12 the size of the young generation or 1\12 (space 0) + 1\12 (space 1) + 10\12 (Eden.)

Tenuring Threshold – Determines the number of times that an object can be copied between survivor spaces before being tenured. This value should be high enough to enough to ensure that as many objects as possible stay in the young generation as long as they are actively needed but low enough to ensure that objects are tenured when they are no longer needed or about to die. The default value is adequate for a baseline configuration.

Parameter:

-XX:MaxTenuringThreshold=<value> – The number of times that an object is copied before being tenured (the default is 31.)

Garbage Collector Type – The size, lifespan, and manipulation of the objects that reside within the heap determine the effectiveness of garbage collection. There are a variety of garbage collectors to suit different applications (i.e. throughput collector, concurrent low-pause collector, incremental low-pause collector.) The default collector is adequate for a baseline configuration; however after testing most institutions find that the concurrent low-pause collector the most effective choice for Blackboard.

Parameters:

-XX:+UseParallelGC – The throughput collector uses the default tenured collector and a parallel version of the young generation collector.

-XX:+UseConcMarkSweepGC – The concurrent low-pause collector performs tenured collection concurrently with the execution of the application. The application is briefly paused during collection.

-XX:+UseParNewGC – Can be used in conjunction with the concurrent low-pause collector. This switch enables parallel young generation garbage collection in conjunction with the concurrent collections and is valuable in multi-core\processor environments.

-Xincgc – The incremental low-pause collector collects a portion of the tenured generation at each minor collection. It is slower than the default collector but minimizes long pauses from major collections.

Note that -XX:+UseParallelGC, -XX:+UseConcMarkSweepGC, and -Xincgc are mutually exclusive. Using any of these switches together will result in unpredictable behavior.


Additional Switches
– There are a very large number of additional JVM tuning options most of which are not covered here. This article will cover those options found to be most common and valuable to tuning the Blackboard application. Sun’s website should be consulted for more information and an exhaustive list of these options (i.e. http://blogs.sun.com/watt/resource/jvm-options-list.html.) None of these additional options should be introduced to the baseline configuration. They should be added and tested one at a time to determine their impact on performance.

Common Parameters:

-XX:+UseTLAB – Enables thread-local object allocation and is faster than the default atomic operation.

+XX:UseISM – Enabled intimate shared memory which reduces the overhead of virtual to physical address translation when using larger heaps (Solaris only.)

-XX:+CMSParallelRemarkEnabled (only used in combination with ParNewGC) – Decreases remark pauses.

-XX:ParallelGCThreads=<value> – The number of parallel threads that the JVM uses to perform garbage collection in the young generation (the default is the number of number of processors.)

 

CASE STUDY

You are the Blackboard Server Administrator for a large institution of several hundred thousand users. Assuming that the application is already in place, tune the Blackboard application and its environment. The environment is as follows:

8 application servers

2 collaboration servers

8 processors per server (3.2 GHz)

20 GB RAM per server

Windows 2008 R2

Blackboard 9.1 Enterprise with Content and Community


The following could be an appropriate baseline configuration for each server:

Heap – 10000m

Perm Gen Size – 768m

Stack Size – 320k

Max threads – 1500

Young Generation – 2500m

Additional Switches – none

 

This would appear as the following in bb-config.properties:

bbconfig.appserver.maxthreads=1500

bbconfig.min.heapsize.tomcat=10000m

bbconfig.max.heapsize.tomcat=10000m

bbconfig.max.permsize.tomcat=768m

bbconfig.max.stacksize.tomcat=320k

bbconfig.jvm.options.extra.tomcat=-XX:NewSize=2500m -XX:MaxNewSize=2500m

 

The next step is to load test the application with the baseline configuration. From the load test the application responsiveness should be recorded. It is also valuable analyze and record garbage collection performance which is logged in blackboard\logs\tomcat\gc.log:

Total time for which application threads were stopped: 0.0000696 seconds

Application time: 119.2037722 seconds

Total time for which application threads were stopped: 0.0003771 seconds

Application time: 600.1297291 seconds

Total time for which application threads were stopped: 0.0006816 seconds

Application time: 0.0184405 seconds

Total time for which application threads were stopped: 0.0000854 seconds

Application time: 335.6940502 seconds

81399.961: [GC 81399.961: [ParNew: 249951K->3801K(276480K), 0.0107200 secs] 585391K->339241K(1198080K), 0.0108022 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]

Total time for which application threads were stopped: 0.0112471 seconds

Application time: 0.0004188 seconds

Total time for which application threads were stopped: 0.0001240 seconds

Application time: 0.0000176 seconds

Total time for which application threads were stopped: 0.0000670 seconds

Application time: 0.0000276 seconds

Total time for which application threads were stopped: 0.0000674 seconds

Application time: 0.0000260 seconds

Total time for which application threads were stopped: 0.0000676 seconds

Application time: 0.0000139 seconds

“Total time for which application threads were stopped” indicates an application pause.

“81399.961: [GC 81399.961: [ParNew: 249951K->3801K(276480K), 0.0107200 secs] 585391K->339241K(1198080K), 0.0108022 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]” indicates a full garbage collection.

After the baseline configuration is tested and the results recorded, tests of individual variables can begin. The most obvious method of tuning is to change the value of one variable, load test, and repeat. However, it is possible to save a great deal of time if using a load balanced environment with identical application servers (ideally clones.) Since the test applies a controlled load, setting the load balancer to a round-robin routing configuration will apply an identical load to the identical servers. Configuring a single variable differently on each node allows the complete testing of that variable with a single load test. In our example one might choose to tune the size of the young generation first. They would set the size of the young generation as follows on each of the eight application servers: 1800, 1850, 1900, 1950, 2050, 2100, 2150, 2200 (the baseline test for 2000 already exists.) If ideal performance does not fall in this range (performance continues to increase from 1800 to 2200 or from 2200 to 1800) another test should be performed in another range. If the test of say ‘2150’ performed the best, one might wish to narrow the value further by running a test of: 2110, 2120, 2130, 2140, 2160, 2170, 2180, 2190.

Once the first variable has been adequately tested and tuned it should be set back to its baseline configuration and a test of the next variable should begin. This process should proceed until all variables that the admin wishes to test are tuned. Optionally, the system can be tuned further by using the new configuration as a new baseline and repeating the testing\tuning process as many times as needed.. This helps to account for the unpredictable nature of cumulative changes that multiple variable configurations can have. For example the configuration of two separate variables may independently improve performance, but the combined change may not yield the same performance gain or worse cause a performance loss.

A sample configuration of the tuned JVM might be:

bbconfig.appserver.maxthreads=1500

bbconfig.min.heapsize.tomcat=10000m

bbconfig.max.heapsize.tomcat=10000m

bbconfig.max.permsize.tomcat=768m

bbconfig.max.stacksize.tomcat=320k

bbconfig.jvm.options.extra.tomcat=-XX:NewSize=2500m -XX:MaxNewSize=2500m -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:ParallelGCThreads=6 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=16 -XX:MaxTenuringThreshold=28

Part 1 – Blackboard Performance Tuning: An Iterative Approach

This is Part 1 of 3 in this series:

Part 1 – Overview & Architecture

Part 2 – JVM Tuning Methods

Part 3 – Additional Tuning Methods


PREFACE

Blackboard provides performance tuning documentation for their application in the document library at the behind the blackboard website (https://behind.blackboard.com.) Blackboard’s guidelines are a valuable starting point and is a sufficient comprehensive solution for smaller institutions. However a test driven approach that relies on an iterative cycle of tuning and load tests will yield more substantial performance gains and adapts to the variety of host architectures. The following steps are a top-level overview of an iterative approach to tuning the Blackboard application and its host environment:

  1. Use Blackboard and Java tuning practices as a guideline to develop a baseline configuration.
  2. Isolate a single configuration variable (i.e. garbage collector choice, heap generation ratio, survivor space size, optional switches, etc.)
  3. Load test a variety of configurations for the chosen variable (while maintaining the base configuration for all other variables) to determine  the most effective setting.
  4. Repeat 2-3 for all variables.
  5. Following these steps will provide a more efficient configuration than the base configuration. However changing one variable may alter the effect of a change in another variable. The optional steps below will account for the interdependency of the variables and hone in on the most effective configuration possible:

  6. Use the tuned configuration arrived at in step 4 as the new baseline configuration.
  7. Repeat steps 2-4.


ARCHITECTURE

To understand effective performance tuning configurations of the Blackboard application and its host environment it is valuable to define the function and architecture of the application and its environment. The following is a top-down list of the resources that impact net performance of the application:

  1. Software
    • Blackboard Application
    • JVM
    • Tomcat
    • Web Server (IIS, Apache)
  2. Operating System and Services
  3. Server Hardware
  4. Network Architecture and Hardware

The Blackboard application is a collection of Java Server Pages and Servlets (accompanied by occasional perl.) This code is nothing more than a set of instructions. The performance of the application is determined by the methods used to execute the instructions. It is executed within a virtual machine model called a Java Virtual Machine (JVM.) The JVM provides an environment capable of executing Java bytecode (compiled Java source code) and storing information and complex data structures within its heap.

The web server and Tomcat serve requests to the Blackboard application. The web server hosts websites and delivers webpages to users, in this case the Blackboard website. However the web server does not directly make requests or receive responses from the JVM. The web server interfaces with Tomcat, a servlet container, which in turn interfaces with the JVM. Tomcat behaves as a Java-exclusive web server and acts as an adapter between the web server and the JVM.

The Operating System interfaces with the hardware on which it resides to expose the resources to Blackboard. The greatest effect that the operating system has on performance will be in the choice of operating system. This article will not promote any operating system over another and acknowledges that each has advantages (for simplicity this series’ examples will assume Windows Server OS.) Beyond the choice in operating system, OS tuning may impact Blackboard (i.e. processor scheduling, virtual memory configuration \ paging file size) and should be considered variables in the tuning process.

Network and server hardware are peripheral to the discussion of tuning the Blackboard application. These resources usually exist by the time tuning is an active topic of concern for an institution. For those institutions interested in hardware sizing, this article will not discuss the topic for it is worthy of its own comprehensive discussion. Blackboard’s documentation on hardware sizing can be found on the behind the blackboard website (https://behind.blackboard.com.) The purpose of mentioning both network and server in this discussion is the influence each has on tuning:

Hardware of individual servers directly affects available resources to the Blackboard application and related processes. The number of processors and the speed of processors and bus impact processing speed, internal communication, memory management and garbage collection, etc. Memory speed and quantity affects the potential size and composition of the Blackboard heap (JVM) and Blackboard memory management \ garbage collection.

The network determines the load and rate that resources on different servers can communicate with one another, external resources, and the end user. The larger the institution, the greater potential impact that institution’s network infrastructure will have on performance for hardware resources are often discrete and many.