James Hamilton has published a thorough summary of Facebook's Cassandra, another scalable key-value store for your perusal. It's open source and is described as a "BigTable data model running on a Dynamo-like infrastructure." Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes. # Google Code for Cassandra - A Structured Storage System on a P2P Network # SIGMOD 2008 Presentation. # Video Presentation at Facebook # Facebook Engineering Blog for Cassandra # Anti-RDBMS: A list of distributed key-value stores # Facebook Cassandra Architecture and Design by James Hamilton
Describe Cassandra data model with examples and simplicity. This tutorial includes column, super column, column family, super column family, keyspace, validator & comparator.
IF YOU HAVE TOO MANY COLUMN FAMILIES YOU WILL GET OOM UNLESS YOU RECONFIGURE SOMETHINGS! PER MACHINE HEAP SIZE LIMIT! ---- for clock sync: Added “noapictimer irqpoll” to end of kernel line in /boot/menu.lst Added following line to /etc/rc.local echo “jiffies” > /sys/devices/system/clocksource/clocksource0/current_clocksource Configured ntp better e.g. ntp.conf # don’t give up so easily, even though this should never happen tinker panic 0 # log your drift and stats and review them periodically driftfile /var/lib/ntp/ntp.drift statsdir /var/log/ntpstats/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable # use lots of NTP servers to get closest synchronization to actual time server 0.vmware.pool.ntp.org server 1.vmware.pool.ntp.org server 2.vmware.pool.ntp.org server ntp.ubuntu.com etc