Sunday, 2 August 2009

shoal/jxta - any good?

Hi,

looking for re-usable code to drive a clustered java application.

Shoal promises
  • peers discovery

  • group membership

  • group messaging

  • small replicated cache
About what I need. Used by Glassfish and seems to have other uses too. There is recent activity on the project. GPL not ASL but I it's an in-house project only.

Deal? Invest more effort? Hold on! Here's their list of open bugs
  • (36) Messages received not in same order as when sent

  • (61) When members join the group concurrently, join notifications of some members are often duplicated or missed

  • (74) potential to miss FAILURE_NOTIFICATION when multiple instances killed at same time

  • (83) When group leader failed, any member couldn't receive FailureRecovery notification
Now, the've got versions 1.0 and 1.1.
1.1 is said to be a "work in progress".
So is 1.0 okay? I don't know

Another oddity: a CVS commit message seems to imply (83) has been fixed.
But it's still open in the issue tracker



Jxta - the foundation of Shoal - is another unknown.

When I troubleshoot late at night I prefer to have an intimate knowledge of my whole stack.
Jxta is another big (?) thing to learn. Thankfully sources are available but still..



Does it look easier just to write my clustering code from scratch?
Implement proper Paxos?
Be less dependent on other people's bugs?



P.S. I've had a look at Geronimo clustering as well. They allow Cache servers to be in separate jvm-s. Hmm.. Somehow my intuition doesn't immediately suggest integrating that code is an easy route either..

P.P.S. Zookeeper was another obvious candidate. What stops me here is that Zookeper seems to have a very hard dependency on disk. Every change is persisted. Node recovery is done by reading the transaction log. It's a mismatch for my goals. I need something blazingly fast living completely in RAM

2 comments:

SteveL said...

If you look at

SmartFrog's SVN site
, we have something called Anubis, which is a robust grouping protocol for use on a single site. Used in production, algorithm proven to work by the mathematicians, implementation tested. Read the papers in the doc/ directory to find out more

Anton Tagunov said...

Thanks a lot Steve!

Anubis looks promising.
I'm looking into it