mainly marketing articlle by cofiiunder What if you didn't have to do any of this funny business to get scalability and reliability? What if the JVM had access to a service that you could plug into to make its heap durable, arbitrarily large, and shared with every other JVM in your application tier? Enter Terracotta, network-attached, durable virtual heap for the JVM. In the spirit of full-disclosure, I'm a co-founder of Terracotta and work there as a software developer. Terracotta is an infrastructure service that is deployed as a stand-alone server plus a library that plugs into your existing JVMs and transparently clusters your JVM's heap. Terracotta makes some of your JVM heap shared via a network connection to the Terracotta server so that a bunch of JVMs can all access the shared heap as if it were local heap. You can think of it like a network-attached filesystem, but for your object data; see Figure 1.
RP has extremely good performance and scalability properties. Many uses of RP in the Linux Kernel have resulted in several orders of magnitude performance improvement compared to locking and transactional memory. Is it easy to program with? RP is not difficult to program with. Allowing each execution sequence to proceed using its own view of memory, by default, simplifies concurrent programming because it prevents memory from changing spontaneously. Threads are guaranteed to observe coherent memory, i.e., memory will contain values that were actually written at some time in the past. Read paths also exhibit deterministic performance characteristics, since they can not block or retry. This feature simplifies programming of time-sensitive systems. Nevertheless, RP is a new programming paradigm with a new interface and there are several ways to misuse it. Read Copy Update (RCU), an early example of RP, has seen extensive use in the Linux Kernel at over 2000 uses
Google Maps, Yahoo! Mail, Facebook, MySpace, YouTube, and Amazon are examples of Web sites built to scale. They access petabytes of data sending terabits per second to millions of users worldwide. The magnitude is awe-inspiring. Users view these large-scale Web sites from a narrower perspective. The typical user has megabytes of data that are downloaded at a few hundred kilobits per second. Users are not so interested in the massive number of requests per second being served; they care more about their individual requests. As they use these Web applications, they inevitably ask the same question: "Why is this site so slow?" The answer hinges on where development teams focus their performance improvements. Performance for the sake of scalability is rightly focused on the back end. Database tuning, replicating architectures, customized data caching, and so on allow Web servers to handle a greater number of requests.
James Hamilton has published a thorough summary of Facebook's Cassandra, another scalable key-value store for your perusal. It's open source and is described as a "BigTable data model running on a Dynamo-like infrastructure." Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes. # Google Code for Cassandra - A Structured Storage System on a P2P Network # SIGMOD 2008 Presentation. # Video Presentation at Facebook # Facebook Engineering Blog for Cassandra # Anti-RDBMS: A list of distributed key-value stores # Facebook Cassandra Architecture and Design by James Hamilton