Product Cover Image

Practical Cassandra: A Developer's Approach

By Russell Bradberry, Eric Lubow

Published by Addison-Wesley Professional

Published Date: Dec 17, 2013


”Eric and Russell were early adopters of Cassandra at SimpleReach. In Practical Cassandra, you benefit from their experience in the trenches administering Cassandra, developing against it, and building one of the first CQL drivers. If you are deploying Cassandra soon, or you inherited a Cassandra cluster to tend, spend some time with the deployment, performance tuning, and maintenance chapters… If you are new to Cassandra, I highly recommend the chapters on data modeling and CQL.”

From the Foreword by Jonathon Ellis, Apache Cassandra Chair


Build and Deploy Massively Scalable, Super-fast Data Management Applications with Apache Cassandra


Practical Cassandra is the first hands-on developer’s guide to building Cassandra systems and applications that deliver breakthrough speed, scalability, reliability, and performance. Fully up to date, it reflects the latest versions of Cassandra–including Cassandra Query Language (CQL), which dramatically lowers the learning curve for Cassandra developers.


Pioneering Cassandra developers and Datastax MVPs Russell Bradberry and Eric Lubow walk you through every step of building a real production application that can store enormous amounts of structured, semi-structured, and unstructured data. Drawing on their exceptional expertise, Bradberry and Lubow share practical insights into issues ranging from querying to deployment, management, maintenance, monitoring, and troubleshooting.


The authors cover key issues, from architecture to migration, and guide you through crucial decisions about configuration and data modeling. They provide tested sample code, detailed explanations of how Cassandra works ”under the covers,” and new case studies from three cutting-edge users: Ooyala, Hailo, and eBay.


Coverage includes


  • Understanding Cassandra’s approach, architecture, key concepts, and primary use cases– and why it’s so blazingly fast
  • Getting Cassandra up and running on single nodes and large clusters
  • Applying the new design patterns, philosophies, and features that make Cassandra such a powerful data store
  • Leveraging CQL to simplify your transition from SQL-based RDBMSes
  • Deploying and provisioning through the cloud or on bare-metal hardware
  • Choosing the right configuration options for each type of workload
  • Tweaking Cassandra to get maximum performance from your hardware, OS, and JVM
  • Mastering Cassandra’s essential tools for maintenance and monitoring
  • Efficiently solving the most common problems with Cassandra deployment, operation, and application development


Table of Contents

Foreword by Jonathon Ellis xiii

Foreword by Paul Dix xv

Preface xvii

Acknowledgments xxi

About the Authors xxiii


Chapter 1: Introduction to Cassandra 1

A Greek Story 1

What Is NoSQL? 2

There’s No Such Thing as “Web Scale” 2


Where Cassandra Fits In 5

What Is Cassandra? 5

Cassandra Terminology 8

Our Hope 9


Chapter 2: Installation 11

Prerequisites 11

Installation 11

Configuration 13

Cluster Setup 15

Summary 16


Chapter 3: Data Modeling 17

The Cassandra Data Model 17

Model Queries—Not Data 19

Collections 22

Summary 25


Chapter 4: CQL 27

A Familiar Way of Doing Things 27

Summary 39


Chapter 5: Deployment and Provisioning 41

Keyspace Creation 41

Replication Strategies 42

Snitches 43

Partitioners 46

Node Layout 48

Firewalls 49

Platforms 49

Summary 50


Chapter 6: Performance Tuning 51

Methodology 51

Tuning 52

System Tuning 62

Solid-State Drives 64

JVM Tuning 65

Summary 67


Chapter 7: Maintenance 69

Understanding nodetool 69

Ring Information 72

ColumnFamily Statistics 73

Thread Pool Statistics 74

Compactions 76

Backup and Restore 79

CommitLog Archiving 81

Summary 82


Chapter 8: Monitoring 83

Logging 83

JMX and MBeans 85

Health Checks 91

Summary 96


Chapter 9: Drivers and Sample Code 99

Java 100

C# 104

Python 108

Ruby 112

Summary 117


Chapter 10: Troubleshooting 119

Toolkit 119

Common Problems 121

Summary 126


Chapter 11: Architecture 127

Meta Keyspaces 127

Gossip Protocol 129

Failure Detection 130

HintedHandoffs 131

Bloom Filters 131

Summary 134


Chapter 12: Case Studies 135

Ooyala 135

Hailo 137

eBay 141

Summary 147


Appendix A: Getting Help 149

Preparing Information 149

IRC 149

Mailing Lists 149


Appendix B: Enterprise Cassandra 151

DataStax 151

Acunu 152

Titan by Aurelius 153

Pentaho 154

Instaclustr 154 


Index 157

Purchase Info

ISBN-10: 0-13-344022-2

ISBN-13: 978-0-13-344022-5

Format: eBook (Watermarked)?

This eBook includes the following formats, accessible from your Account page after purchase:

ePubEPUBThe open industry format known for its reflowable content and usability on supported mobile devices.

MOBIMOBIThe eBook format compatible with the Amazon Kindle and Amazon Kindle applications.

Adobe ReaderPDFThe popular standard, used most often with the free Adobe® Reader® software.

This eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours.

Includes EPUB, MOBI, and PDF


Add to Cart