Search This Blog

Tuesday, January 13, 2009

Creating Dynamic Images with Java and GridGain

At my current client we have a requirement to dynamically create images with a specific text, for example an authorization code. These images are created in Java and encoded as a PNG file. For the generation of text based images I created a library which is available on Google Code, see [1]. These images are generated by a servlet of which the URL is embedded in the HTML as an image tag or in a background of a div element.

During performance tests I saw that the generation of images is a very CPU intensive task. Because there are a lot of users who request a dynamically generated image, this functionality can cripple the CPU's which has a negative impact on the throughput and the stability of the system. Because this is a specific, CPU intensive task, it is a perfect candidate to off-load this process to another machine so the CPU intensive task is distributed over a number of machines.

One way to distribute the processing of CPU intensive tasks is to use the open source grid computing platform GridGain. GridGain is "an open source computational grid framework that enables Java developers to improve general performance of processing intensive applications by splitting and parallelizing the workload." See also [3]. Image generation is a perfect candidate to distribute with GridGain.

I have created the following setup to do some first tests with GridGain in this specific scenario:

  1. Created a web application with a servlet which is called by clients. This servlet is responsible for executing the image generation task on the grid and streaming the result back to the client as a PNG.

  2. Created a GridGain task which creates a job which is responsible for the actual image generation. This job can be executed on any of the nodes participating in the grid.

  3. Used two physical nodes in my first setup, node 1: PentiumM 2 GHz with 2 GB's RAM, node 2: Pentium4 2.2 GHz with 1 GB RAM.

  4. Used Apache JMeter to create a load test [4].


See the following diagram for an overview of my setup:


I used 5 concurrent threads (users) in my JMeter script with a ramp-up time of 0ms which means all 5 users are active at once.

  1. My first test run was used to create a baseline. This version did not use the grid at all and the image generation logic was implemented directly in the servlet. This test was run on node 1. With a couple of tests there was an average of 6.7 transactions per second.

  2. My second test was used to determine the overhead of the grid. I expect the throughput to be a little less than in my first test. The same node was used as in 1. With a couple of tests there was an average of 6.5 transactions per second. Almost as good as without the grid. In this specific scenario the overhead of the grid is negligible. However, this can vary based on your specific situation.

  3. In my third test I added a second node, node 2, to the grid. With a couple of tests there was an average of 9.5 transactions per second. This is a throughput increase of 46%. Pretty impressive. In my test, both nodes had 100% CPU utilization.


This is really impressive considering the amount of work I had to install and configure GridGain, almost nothing. With the default configuration, GridGain uses IP multi-cast to discover all the grid nodes. I just had to start gridGain on the second node and this node automatically participated in the grid. Other strategies can be used to implement the discovery process, for example JMS topics. When a given task is executed for the first time on a given node, the grid takes care of deploying this task on that specific node. No need for manual deployment of tasks. Failover and load balancing of tasks to other nodes when a node has crashed is enabled by default. Everything is also well documented which really helps creating a consistent package.

Conclusion
Distributing CPU intensive tasks across several physical machines is an effective way to increase the throughput and reducing the risk that this process will become a bottleneck or has a negative impact on the stability of the system as a whole. GridGain is a platform which enables (Java) developers to create a computing grid and execute tasks on this grid. When you have computational intensive tasks in your application, make sure you check out GridGain.

Resources
[1] Image Text library
[2] GridGain
[3] Highscalability.com
[4] JMeter

4 comments:

Anonymous said...

I think I'd run the load tests such that the CPU didn't max out on any node. I normally keep the CPU around 80% max when I run scalability tests. Also did you try using terracotta?

dkharlamov said...

IMHO GridGain in this particular case is much easier.

jcraane said...

Hi Matt,

I haven't tried Terracotta for this case. How would you use Terracotta in this particular case? I plan to write another Blog post of the alternatives for distributing cpu intensive processes to other machines. But as dkharlamov says it is Particularly easy with GridGain.

Anonymous said...

you have a nice site.thanks for sharing this site. various kinds of ebooks are available here

http://feboook.blogspot.com