Monday, 6 July 2009

Apache Tomcat clustering, load balancing, failover, session replication, and optimisation

Share
You all are probably familiar with high-availability clusters.

To cut long story short, the idea of high-availability clusters aims at creating failure resistant, reliable, and ultra fast blistering systems.

HA clusters most commonly use the following techniques (or should I say "buzzwords"?):
  • clustering
  • load balancing
  • failover
  • session replication
There are lots of documentation, forums, blog posts about it, but they are full of errors (including Apache's official mod_proxy documentation).

Here I show you how to cluster, load balance, failover, and replicate session using Apache HTTP Sever 2.2.11 and Apache Tomcat 6.0.20.

The infrastructure

I have three machine:
  • public - Apache HTTP 2.2.11, load balancer
  • backend1 - Apache Tomcat 6.0.20, web container, IP: 172.16.253.88
  • backend2 - Apache Tomcat 6.0.20, web container, IP: 172.16.253.7
public is a front-end Apache HTTP which balances traffic and distributes the requests to two backend Tomcats.

Optimisation

I did some additional things to make my HA cluster even more swift, my tips are:
  • use AJP communication protocol - AJP is a binary protocol and is far more efficient than verbose text-based HTTP protocol

  • use APR based Apache Tomcat Native library - it allows optimal performance in production environment, when enabled, the AJP connector will use a socket poller for keepalive, increasing scalability of the server, also this will reduce significantly the amount of processing threads needed by Tomcat

  • unload all unnecessary modules on Apache HTTP Server - simply don't waste your CPU and memory for things that wan't be used at all

  • optimise JVM parameters - last year I wrote an article about JVM performance tuning for Java EE, you can read it here: Tuning JVM for Java EE development
Simple clustred application

I wrote a very simple web application. I called it test-app.

It basically consists of META-INF/context.xml, WEB-INF/web.xml and index.jsp files.

META-INF/context.xml is used to inform Tomcat that our application is to be clustered.

In order to do so, inside the META-INF/context.xml file, I added distributable="true" attribute to <Context /> element:
<?xml version="1.0" encoding="UTF-8"?>
<Context distributable="true" />
My index.jsp file looked something like this:
<html>
<head><title>Cluster test app</title></head>
<body>
<h1>Backend Tomcat: <%= java.net.InetAddress.getLocalHost().getHostAddress() %></h1>
<h1>Session ID: <%=request.getSession().getId()%></h1>
<%
Long counter = (Long) request.getSession().getAttribute("counter");
if (counter == null) {
counter = 0l;
}
counter++;
request.getSession().setAttribute("counter", counter);
%>
<h2>current counter value is ${counter}</h2>
</body>
</html>
Apache HTTP Server as a proxy and a load balancer

I installed brand new copy of Apache HTTP 2.2.11.

I read the documentation about mod_proxy, mod_proxy_ajp, and mod_proxy_balancer:
I opened conf/httpd.conf file and uncommented these modules:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
Then, at the end of includes list, I added:
# Proxy
Include conf/extra/httpd-proxy.conf
Finally, I created conf/extra/httpd-proxy.conf file and wrote:
<Proxy balancer://wwwcluster>
BalancerMember ajp://172.16.253.88:8009 route=www1
BalancerMember ajp://172.16.253.7:8009 route=www2
</Proxy>

ProxyPass /test-app balancer://wwwcluster/test-app stickysession=JSESSIONID

<Location /balancer-manager>
SetHandler balancer-manager
Order Deny,Allow
Deny from all
Allow from localhost
</Location>
Apache Tomcat clustering and routing

On both Tomcats, I opened conf/server.xml and, as a first child of default <Host /> element, I copied and pasted the following <Cluster /> definition:
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
channelSendOptions="8">
<Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false" notifyListenersOnReplication="true" />
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership className="org.apache.catalina.tribes.membership.McastService"
address="228.0.0.4" port="45564" frequency="500" dropTime="3000" />
<Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
address="auto" port="4000" autoBind="100" selectorTimeout="5000"
maxThreads="6" />
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" />
</Sender>
<Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector" />
<Interceptor
className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor" />
<!-- prints stats on message traffic -->
<Interceptor
className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor" />
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
filter="" />
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" />
<ClusterListener
className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener" />
<ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener" />
</Cluster>
Also, I added jvmRoute attributes to <Engine /> element, on both Tomcats respectively:
<Engine name="Catalina" defaultHost="localhost" jvmRoute="www1">
<Engine name="Catalina" defaultHost="localhost" jvmRoute="www2">
Load balancing and sticky sessions

OK, now as I showed you how I created my test application and how I configured my servers, you can follow my steps to see how clustering, load balancing, failover and session replication work.

Start Apache HTTP Server.

Go to:
http://localhost/test-app
There will be an error:
503 Service Temporarily Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
When you'll access this URL:
http://localhost/balancer-manager
you will see that both Tomcats have status set to Err. They are simply stopped, thus cannot be connected to, thus the Err status.

One thing was weird, routes were not set as defined in conf/extra/httpd-proxy.conf, they were blank:

Thankfully, they can be changed from within balancer-manger, just click on a cluster member and set proper routes: www1 and www2, just as in proxy config file.

First, start 172.16.253.88 Tomcat:
INFO: Cluster is about to start
2009-07-02 10:36:34 org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/172.16.253.88:4000
2009-07-02 10:36:34 org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
2009-07-02 10:36:34 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
2009-07-02 10:36:35 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
2009-07-02 10:36:35 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
2009-07-02 10:36:36 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
2009-07-02 10:36:36 org.apache.catalina.ha.session.JvmRouteBinderValve start
INFO: JvmRouteBinderValve started
2009-07-02 10:36:36 org.apache.catalina.ha.session.DeltaManager start
INFO: Register manager /test-app to cluster element Host with name localhost
2009-07-02 10:36:36 org.apache.catalina.ha.session.DeltaManager start
INFO: Starting clustering manager at /test-app
2009-07-02 10:36:36 org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [/test-app]: skipping state transfer. No members active in cluster group.
2009-07-02 10:36:36 org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
2009-07-02 10:36:36 org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
2009-07-02 10:36:36 org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/16 config=null
2009-07-02 10:36:36 org.apache.catalina.startup.Catalina start
INFO: Server startup in 2670 ms
Then, start 172.16.253.7 Tomcat:
INFO: Cluster is about to start
2009-01-02 10:36:59 org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/172.16.253.7:4000
2009-01-02 10:36:59 org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
2009-01-02 10:36:59 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
2009-01-02 10:37:00 org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 16, -3, 88}:4000,{-84, 16, -3, 88},4000, alive=27016,id={-
67 -32 98 38 58 -88 73 -102 -98 -112 29 -21 -20 103 -28 -103 }, payload={}, command={}, domain={}, ]
2009-01-02 10:37:00 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
2009-01-02 10:37:01 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
2009-01-02 10:37:02 org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
2009-01-02 10:37:02 org.apache.catalina.ha.session.JvmRouteBinderValve start
INFO: JvmRouteBinderValve started
2009-01-02 10:37:02 org.apache.catalina.ha.session.DeltaManager start
INFO: Register manager /test-app to cluster element Host with name localhost
2009-01-02 10:37:02 org.apache.catalina.ha.session.DeltaManager start
INFO: Starting clustering manager at /test-app
2009-01-02 10:37:02 org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl
2009-01-02 10:37:02 org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
WARNING: Manager [/test-app], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 16, -3, 88}:4000,{-84, 16, -3
, 88},4000, alive=28516,id={-67 -32 98 38 58 -88 73 -102 -98 -112 29 -21 -20 103 -28 -103 }, payload={}, command={}, domain={}, ]. This operation will
timeout if no session state has been received within 60 seconds.
2009-01-02 10:37:02 org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor report
INFO: ThroughputInterceptor Report[
Tx Msg:1 messages
Sent:0,00 MB (total)
Sent:0,00 MB (application)
Time:0,02 seconds
Tx Speed:0,03 MB/sec (total)
TxSpeed:0,03 MB/sec (application)
Error Msg:0
Rx Msg:0 messages
Rx Speed:0,00 MB/sec (since 1st msg)
Received:0,00 MB]

2009-01-02 10:37:02 org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [/test-app]; session state send at 02.01.09 10:37 received in 110 ms.
2009-01-02 10:37:02 org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
2009-01-02 10:37:02 org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
2009-01-02 10:37:02 org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/15 config=null
2009-01-02 10:37:02 org.apache.catalina.startup.Catalina start
INFO: Server startup in 2583 ms
After the startup of the second Tomcat, in first 172.16.253.88 Tomcat's console you should see:
2009-07-02 10:37:01 org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl
2009-07-02 10:37:03 org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor report
INFO: ThroughputInterceptor Report[
Tx Msg:2 messages
Sent:0,00 MB (total)
Sent:0,00 MB (application)
Time:0,02 seconds
Tx Speed:0,07 MB/sec (total)
TxSpeed:0,07 MB/sec (application)
Error Msg:0
Rx Msg:2 messages
Rx Speed:0,00 MB/sec (since 1st msg)
Received:0,00 MB]
Open and refresh a few times:
http://localhost/test-app
you will see:
Backend Tomcat: 172.16.253.88
Session ID: 8D5B2E961C57B0BB1A9BA93B8A41F61C.www1
current counter value is 1

Backend Tomcat: 172.16.253.88
Session ID: 8D5B2E961C57B0BB1A9BA93B8A41F61C.www1
current counter value is 2

Backend Tomcat: 172.16.253.88
Session ID: 8D5B2E961C57B0BB1A9BA93B8A41F61C.www1
current counter value is 3
Load balancing and sticky sessions work.

You can now refresh:
http://localhost/balancer-manager
to see how many requests and kB were sent to each Tomcat.

Failover and session replicating

Let's see how the failover is behaving.

Kill 172.16.253.88 Tomcat, in 172.16.253.7 Tomcat console you will see:
2009-01-02 10:39:32 org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 16, -3, 88}:4000,{-84, 16, -3, 88},4000,
alive=179535,id={-67 -32 98 38 58 -88 73 -102 -98 -112 29 -21 -20 103 -28 -103 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={},
]]
2009-01-02 10:39:32 org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
INFO: Received member disappeared:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 16, -3, 88}:4000,{-84, 16, -3, 88},4000, alive=179535,i
d={-67 -32 98 38 58 -88 73 -102 -98 -112 29 -21 -20 103 -28 -103 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]
Refresh:
http://localhost/test-app
you will see:
Backend Tomcat: 172.16.253.7
Session ID: 8D5B2E961C57B0BB1A9BA93B8A41F61C.www2
current counter value is 4

Backend Tomcat: 172.16.253.7
Session ID: 8D5B2E961C57B0BB1A9BA93B8A41F61C.www2
current counter value is 5
All requests are now distributed to 172.16.253.7 Tomcat.

Session ID is the same as previously, but this time www2 is appended at the end. Counter's value was replicated.

When you access:
http://localhost/balancer-manager
you will see that 172.16.253.88 Tomcat's status is Err.

When you start 172.16.253.88 Tomcat, in 172.16.253.7 Tomcat's console you will see:
2009-01-02 10:40:40 org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 16, -3, 88}:4000,{-84, 16, -3, 88},4000, alive=1016,id={-2
3 -9 50 91 -8 -56 75 11 -86 -107 -46 -16 75 99 -21 8 }, payload={}, command={}, domain={}, ]
If you open another browser, load balancer probably will redirect you to just recovered 172.16.253.88 Tomcat.

Summary

I know it looks simple (and it really is simple!), but there is so much rubbish, misleading, out-dated information out there to dig through... It always takes some time to find something useful.

If you have any questions, and I think you will, just shoot, but don't kill :)

Cheers,
Łukasz

19 comments:

Josh said...

This is absolutely killer, thanks a lot for taking the time to write it.

kiril said...

This post is very thorough and helpful. I've been looking for tutorial-like post describing apache/tomcat clustering for a while, THANK YOU!

I do have an issue even though I followed all the steps. I didn't read the suggested articles for module proxy, mod_proxy_ajp, and load balencer module. This is pretty much the only detail I missed even though I am going to catch up on it later on tonight.

So long story short:

I have two tomcat instances running on my localhost, but different ports 85, 86. I also included all the configuration accordingly from the tutorial. When I go to Apache 2.2 localhost I see only the message "It Works!"

Should I not expect to see my proxy to my one of my tomcat nodes?

Also if I go to my Apache 2.2 localhost:80\test-app or balancer-manager I get 404 exception.

Any insights will be greatly appreciated.

Thank you.

Łukasz said...

Hi kiril,

Please do paste your load balancer configuration.

I can't say much if I don't see changes you made.

Cheers,
Łukasz

Ganesan said...

Hey .. This is awosome posting. Thanks a lot to post this.

Here is my problem. i hav done successfully the cluster configs. Now all are set. I have started my first app server. i am getting the same logs that you mentioed for the first app server. Then i am starting the second server and in the logs i coudnt see the logs that you posted. and when i am accessing the test app i am getting the session id. And i have stopped my first app then refreash the browser. now i am getting new session id with the WWW2. Seems somewhere i am missin. Could you pls help me. below is the log.

Ganesan said...

FROM APP1
Jul 24, 2009 3:04:27 PM org.apache.catalina.ha.tcp.SimpleTcpCluster start
INFO: Cluster is about to start
Jul 24, 2009 3:04:29 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/10.210.23.33:4000
Jul 24, 2009 3:04:32 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Jul 24, 2009 3:04:32 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Jul 24, 2009 3:04:33 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
Jul 24, 2009 3:04:33 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
Jul 24, 2009 3:04:34 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
Jul 24, 2009 3:06:28 PM org.apache.catalina.ha.session.DeltaManager start
INFO: Register manager /test-app to cluster element Engine with name Catalina
Jul 24, 2009 3:06:28 PM org.apache.catalina.ha.session.DeltaManager start
INFO: Starting clustering manager at /test-app
Jul 24, 2009 3:06:28 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#/test-app]: skipping state transfer. No members active in cluster group.
Jul 24, 2009 3:06:28 PM org.apache.catalina.ha.session.JvmRouteBinderValve start
INFO: JvmRouteBinderValve started
Jul 24, 2009 3:06:29 PM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
Jul 24, 2009 3:06:33 PM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
Jul 24, 2009 3:06:33 PM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/2750 config=null
Jul 24, 2009 3:06:35 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 133152 ms

Ganesan said...

FROM APP2:
Jul 24, 2009 5:47:53 AM org.apache.catalina.ha.tcp.SimpleTcpCluster start
INFO: Cluster is about to start
Jul 24, 2009 5:47:53 AM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/10.100.20.159:4000
Jul 24, 2009 5:47:53 AM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Jul 24, 2009 5:47:53 AM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Jul 24, 2009 5:47:54 AM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
Jul 24, 2009 5:47:54 AM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
Jul 24, 2009 5:47:55 AM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
Jul 24, 2009 5:47:55 AM org.apache.catalina.ha.session.DeltaManager start
INFO: Register manager /test-app to cluster element Engine with name Catalina
Jul 24, 2009 5:47:55 AM org.apache.catalina.ha.session.DeltaManager start
INFO: Starting clustering manager at /test-app
Jul 24, 2009 5:47:55 AM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#/test-app]: skipping state transfer. No members active in cluster group.
Jul 24, 2009 5:47:55 AM org.apache.catalina.ha.session.JvmRouteBinderValve start
INFO: JvmRouteBinderValve started
Jul 24, 2009 5:47:55 AM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
Jul 24, 2009 5:47:55 AM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
Jul 24, 2009 5:47:55 AM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/15 config=null
Jul 24, 2009 5:47:55 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 2894 ms

Also you are using a word called "tomcat console" in your post. What to you mean by the tomcat console. is it log file and tomcat mamanger?

Thanks in advance

Ganesan.J

ssamayoa said...

It is really necesary configure this?

<Engine name="Catalina" defaultHost="localhost" jvmRoute="www1">

It seems that only add a distinctive string at the end of the cookie data.

Regards.

Łukasz said...

Ganesan,

First thing: I need your config (the balancer and proxy config).

Second thing: by "tomcat console" I mean tomcat log. In Windows log is printed directly to console, that is why I wrote "tomcat console".

Cheers,
Łukasz

Łukasz said...

Ssamayoa,

It is absolutely crucial to add jvmRoute attribute.

Thanks to this load balancer knows where are you coming from and redirects you to the original server. This feature is called sticky session.

Cheers,
Łukasz

Almir said...

Thank you for this excellent post!

I am following this configuration step-by-step but i am putting all three apps: apache http and two tomcats on the same server. And, it's a linux box.

Everything looks like it's working fine, however, the problem is with tomcats "detecting" each other when it comes to synchronizing session info. Basically, it looks like both tomcats create their new memberships as they both report
"No members active in cluster group"
Once I kill one tomcat, user from that tomcat is sent to second tomcat but session data is not available therefore, user is "logged out"

I believe the problem is multicast IP and port as this determines the cluster membership and this is what "joins" tomcats into that single cluster.

Anything that I need to additionally ensure to make this work on Linux?

Configuration files are exactly as suggested in your post!

Thanks in advance

Almir

Alexander said...

Lukasz, thank you for writing this tutorial! I've made a lot of progress from where I was after finding your tutorial. You're definitely right about there not being a resource out there that explains this process in nearly as much detail as you have.

I am running into a couple issues, and I'm not sure what the cause is. I'm attempting to have each Tomcat on a separate machine and the load balancing/clustering work over the two of them. Right now the Apache is sitting on Machine1. I'm hoping the issues I'm having are not something related to the network multicasting (or lack of).

If the user starts out on Machine1 in test-app and then I crash Machine1's Tomcat service, it will move over to Machine2 seamlessly. However, if I begin on Machine2 (after having restarted Tomcats, closed browser, etc.) - when I crash it, the session is lost. That is, failover isn't working from 2 to 1. (But it works 100% of the time for 1 to 2. It doesn't seem to matter which is first in the settings files OR which machine's service begins first.)

Machine1 Log on failover from 2 to 1:

Aug 3, 2009 9:29:53 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{0, 0, 0, -100}:4000,{0, 0, 0, -100},4000, alive=652089,id={-2 24 -6 104 99 68 79 78 -69 84 3 64 12 123 -70 122 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]]
Aug 3, 2009 9:29:53 AM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
INFO: Received member disappeared:org.apache.catalina.tribes.membership.MemberImpl[tcp://{0, 0, 0, -100}:4000,{0, 0, 0, -100},4000, alive=652089,id={-2 24 -6 104 99 68 79 78 -69 84 3 64 12 123 -70 122 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]

End Machine1 Log on failover from 2 to 1

Machine2 Log on failover from 2 to 1:

Aug 3, 2009 9:29:52 AM org.apache.coyote.http11.Http11Protocol pause
INFO: Pausing Coyote HTTP/1.1 on http-8080
Aug 3, 2009 9:29:53 AM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
Aug 3, 2009 9:29:53 AM org.apache.catalina.ha.session.JvmRouteBinderValve stop
INFO: JvmRouteBinderValve stopped
Aug 3, 2009 9:29:53 AM org.apache.catalina.ha.session.DeltaManager stop
INFO: Manager [/test-app] expiring sessions upon shutdown
Aug 3, 2009 9:29:53 AM org.apache.catalina.tribes.membership.McastServiceImpl$ReceiverThread run
WARNING: Error receiving mcast package. Sleeping 500ms
java.net.SocketException: socket closed
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.PlainDatagramSocketImpl.receive(Unknown Source)
at java.net.DatagramSocket.receive(Unknown Source)
at org.apache.catalina.tribes.membership.McastServiceImpl.receive(McastServiceImpl.java:314)
at org.apache.catalina.tribes.membership.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:414)
Aug 3, 2009 9:29:53 AM org.apache.coyote.http11.Http11Protocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-8080

End Machine2 Log on failover from 2 to 1

Alexander said...

I'm not sure of a way to post the httpd-proxy.conf and portions of the server.xml because it is disallowing non-HTML tags. My httpd-proxy.conf is identical though, except for the IP addresses. My <Engine> and <Cluster> elements are also the same.

Additionally, the http://localhost/balancer-manager is not working. I get a HTTP 403 Forbidden error: "You are not authorized to view this page.
You might not have permission to view this directory or page using the credentials you supplied." It seemed like this was a key step.

(I wasn't sure of a better way to be able to post the config files so I removed the opening <'s.)

Thank you for your time.
-Alexander

erik said...

it's so much help for, simple but powerful.... tankQ Łukasz

MoZ said...

Almir, I am having the same issues that you seem to be experiencing, except I am on a windows machine.

After much searching I found that it is to do with a bug thats disables LoopbackMode in Tomcat 6.0.20.

See: https://issues.apache.org/bugzilla/show_bug.cgi?id=47308

It will apparently work if the tomcats are on different machines pyshical or virtual.

Łukasz said...

MoZ, thanks for your valuable comment.

It will help many readers!

MoZ said...

Happy to help, although I wish I'd spelt physical properly!

Anyway, I just followed your post with a tomcat instance on a ubuntu vm and it worked like a charm.

If you still need to use two tomcat 6.0.20's on the same machine then you can build them manually using svn as the fixes have been committed.

http://tomcat.apache.org/svn.html

Thanks for the post Łukasz, much appreciated.

Alana said...

I recently came across your blog and have been reading along. I thought I would leave my first comment. I dont know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.


Maria

http://memory1gb.com

enriqueism said...

Hi, very nice post. Coud you explain in more detail the Apache Portable Runtime (APR), I don't understand how It works or how I install it in Tomcat. Thanks please keep your great work.

Łukasz said...

Hi enriqueism,

Here you can find lots of information about APR and Tomcat:

http://tomcat.apache.org/tomcat-6.0-doc/apr.html

http://apr.apache.org/

Cheers,
Łukasz