How to Install and Configure Cassandra
Apache Cassandra is a distributed NoSQL database designed for handling massive amounts of data across many Breeze instances with no single point of failure. It provides linear scalability, tunable consistency, and excels at write-heavy workloads such as event logging, IoT data, and messaging systems.
Installing Cassandra
On your Breeze instance running Ubuntu, add the Apache Cassandra repository and install:
sudo apt install -y apt-transport-https gnupg
curl -fsSL https://downloads.apache.org/cassandra/KEYS | sudo gpg --dearmor -o /usr/share/keyrings/cassandra.gpg
echo "deb [signed-by=/usr/share/keyrings/cassandra.gpg] https://debian.cassandra.apache.org 50x main" | \
sudo tee /etc/apt/sources.list.d/cassandra.list
sudo apt update
sudo apt install -y cassandra
Single-Node Configuration
For development on a single Breeze instance, the default configuration works. Edit /etc/cassandra/cassandra.yaml to verify these settings:
cluster_name: 'BreezeCluster'
listen_address: localhost
rpc_address: localhost
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "127.0.0.1"
endpoint_snitch: SimpleSnitch
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
Multi-Node Cluster Setup
For a production cluster across multiple Breeze instances, configure each node with its own IP and designate seed nodes:
cluster_name: 'BreezeCluster'
num_tokens: 256
listen_address: this-breeze-ip
rpc_address: 0.0.0.0
broadcast_rpc_address: this-breeze-ip
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "breeze1-ip,breeze2-ip"
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: true
Start Cassandra on each node:
sudo systemctl enable cassandra
sudo systemctl start cassandra
Check the cluster status:
nodetool status
All nodes should show UN (Up Normal).
Creating a Keyspace and Table
Connect with the CQL shell and create a keyspace with the appropriate replication strategy:
cqlsh
CREATE KEYSPACE myapp WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
};
USE myapp;
CREATE TABLE events (
event_id UUID,
user_id UUID,
event_type TEXT,
event_time TIMESTAMP,
payload TEXT,
PRIMARY KEY ((user_id), event_time, event_id)
) WITH CLUSTERING ORDER BY (event_time DESC)
AND default_time_to_live = 7776000; -- 90 days TTL
The partition key (user_id) distributes data evenly, and the clustering key (event_time DESC) orders events within each partition for efficient time-range queries.
Memory and Performance Tuning
Edit /etc/cassandra/jvm-server.options to set the heap size based on your Breeze instance resources:
-Xms4G
-Xmx4G
-Xmn800M
Set heap to no more than 50% of available RAM, with a maximum of 8 GB. For Breeze instances with 8 GB RAM, use 4 GB heap. Also adjust concurrent_reads and concurrent_writes in cassandra.yaml based on your CPU core count (typically 16 times the number of cores for reads, 8 times for writes).
Monitoring with nodetool
Key monitoring commands:
nodetool status # Cluster overview
nodetool info # Node-level details
nodetool tpstats # Thread pool statistics
nodetool cfstats myapp # Table-level statistics
nodetool compactionstats # Active compactions
Regular compaction and repair operations keep your Cassandra cluster healthy across your Breeze instances. Schedule repairs with nodetool repair weekly during off-peak hours.