Pages - Menu

Showing posts with label Performance tuning. Show all posts
Showing posts with label Performance tuning. Show all posts

Scaling up a server

Wednesday, July 24, 2013

We have two scaling options: vertical and horizontal scaling. Vertical scaling relates to adding more CPUs to a machine. To better utilize the server hardware we can add more WebLogic instances to the machine that could lead to increased application throughput. To determine if this is indeed the case we need to benchmark. Benchmarking for scalability is about measuring resource utilization. Good scalability means that service levels can be maintained while the workload is increased. If an application does not scale well, it is not fully utilizing the hardware. Consequently, throughput will degrade. Ideally, a linear load increase should lead to a linear degradation in service levels and performance. Linear scalabilty can be approached when so-called share nothing clusters are used. The nodes provide the same functionality and know nothing about other nodes in the cluster (no HTTP session replication). In this case, the computing ability of the cluster increases almost linearly as more nodes are added to the cluster, if the back-end information systems, such as a database, are powerful enough.

Applications that ‘share nothing’ are usually sharing state through the database. The application-tier can scale as far as when the database becomes a bottleneck. In general, relying on a single shared resource will eventually cause contention for that resource and thus limit the scalability. Caching is a good resolution. When we cache data at the application-tier we avoid calls to the database (and also avoid relational data to object data conversions). When using a cluster of server instances we need to maintain multiple caches; this is not a problem for read-only data but for read/write data it is. Caching solutions, such as Coherence, provide different kind of caching, i.e., replicated and partitioned. Replicated does not scale well when cache writes are involved as the data needs to be replicated across all the nodes in the grid. A partioned cache, on the other hand, scales very well when cache writes are involved as data ownership is spread throughout the cluster (the system automatically rebalances the data when the number of nodes in the grid changes – we do not need to decide on how to partition the data, it comes out of the box). Another plus is that access to the cache means at most one network trip, this in order to maintain linear scalability. An optimization on read-access can be made when data can be obtained locally (sticky access) in this case a hybrid solution such as the near cache can be applied. Note that when Coherence*Web is used to cache HTTP session objects a near cache data structure is used. More information on Coherence*Web can be found in the post Setting-up a WebLogic Cluster that uses Coherence.

Database caching patterns are cache-aside, read/write-through or write-behind. The caching pattern cache-aside is discussed in the post Hibernate and Coherence. The caching patterns read/write-through or write-behind are discussed in the post Coherence and Hibernate: Decoupling the Database.

To scale the processing, data grids allow for a targeted execution of processing in the nodes, i.e., the map/reduce pattern (used, for example, in Google’s bigtable or Apache’s hadoop). The core of the map/reduce pattern are the following steps:

    Map step – a master node takes some input (problem), partitions it into smaller sub-problems and distributes those to worker nodes. Note that a worker node in turn can do the same.
    Reduce step – the master takes all the answers and combines them to get the output.

Sort of a divide and conquer algorithm, with the key difference that the map/reduce algorithm handles the data as key-value pairs.

Horizontal scalling relates to adding more machines to the environment, which gives a failover capability that we cannot get with vertical scaling. A good approach is to combine both scaling techniques to obtain better CPU utilization and failover capability. The post Setting-up a WebLogic Cluster that spans Multiple Machines shows an example, as the title suggests, of how to set-up a cluster that spans multiple machines.

JVM Crash and Native OutOfMemory Exception

Tuesday, July 23, 2013

JVM Crash Investigation

JVM Core Dump is the most important File to investigate the JVM Crash. By default the Core Dump will be generated. But Just in case if JVM is not able to generate the Core Dump then there may be the following reasons: 

  • If there is not enough disk space or quota to write the file in your File System.  
  • If JVM is not having to create or write a file in the directory. 
  • If another file exists in the same directory with that is read-only or write-protected. Unix/Linux-specific: Use the limit or ulimit commands to determine if core dumps are disabled.  

Example, on Linux, the command “ulimit -c unlimited” enables core dumps to be written, no matter what their size. Core dump sizes can be restricted if disk space limitations are a concern. 
It may be possible to get a thread dump before the process exits. HotSpot supports the Java_Option -XX:+ShowMessageBoxOnError; the corresponding JRockit option is -Djrockit.waitonerror. When the JVM is crashing, it may prompt the user :::Do you want to debug the problem?::: This pauses the process, thereby creating an opportunity to generate a thread dump (a stack trace of every thread in the JVM), attach a debugger, or perform some other debugging activity. However, this does not work in all cases (for eg., in case of stack overflow). 

Crash Because of OutOfMemory: Please apply the Following Flag in your JAVA_OPTIONS in the start Script of your Server: 
-XX:+HeapDumpOnOutOfMemoryError 

 Details: 

1) -XX:+HeapDumpOnOutOfMemoryError option available in 1.5.0_07 and 1.4.2_12, producing an hprof binary format heap dump (By default the DUMP will be generated in your Systems TEMP directory….In Window machine we can easily findout the TEMP directory by running the command “echo %TEMP%”) 

 2) Analyse hprof heap dumps using HAT, or jhat or YourKit (has an hprof import option) hprof heap dumps are platform independent and so you don’t need to analyze the dump on the same system that produced it. 

3) running with -XX:+HeapDumpOnOutOfMemoryError does not impact performance – it is simply a flag to indicate that a heap dump should be generated when the first thread throws OutOfMemoryError. 

 4) -XX:+HeapDumpOnOutOfMemoryError does not work with -XX:+UseConcMarkSweepGC in 1.5.0_07 

TestCase: 

public class TestXss
{
public static void main(final String[] args) throws Exception
{
 // start the given number of threads
 for (int i = 1; i <= Integer.parseInt(args[0]); i++)
  {
    System.out.println("Starting Thread " + i);
    final Thread t = new Thread("T[" + i + "]")
     {
       public void run()
        { 
          try
            {
              while (true)
                {
                  Thread.sleep(1000);
                }
            }
          catch (Exception e)
           {
             e.printStackTrace();
           }
       }
    };
  t.setDaemon(true);
  t.start();
  Thread.sleep(5);
 }
// wait 22 Thread.sleep(1000000);
}
}
Output:

NOTE:   java -Xmx1496m -Xss1m TestXss 5000
Starting Thread 411
Starting Thread 412
Starting Thread 413
Starting Thread 414
Starting Thread 415
Starting Thread 416
Starting Thread 417
Exception in thread “main” java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:574)
at TestXss.main(TestXss.java:18)


Note: If you see OOM in the Native Space  ( unable to create new native thread) Then Please first of all Try to decrease the -Xss size. It’s default value is -Xss512K   ———>Reduce it to ——–>  -Xss256K

Like this you will see that JVm is able to create more Native Threads…But beware that if u will decrease it more than a certain limit …u may start getting “java.lang.StackOverflowError”.
 

Blogger news

Blogroll

Most Reading