How to change Cassandra configurations

mrios
New Member

We are running 4.16.01 OPDK and we'd like to know what is the recommended way to change configuration on Cassandra.

We're trying to increase Xmx to 16GB and to use G1GC instead of CMS. I tried a few different things without success so far.

1) trying to add tokens on the cassandra.properties under /opt/apige/customer/application though I couldn't find the tokens or values in the files that are in apigee-cassandra/conf.

2) change values and configuration on the file /opt/apigee/apigee-cassandra/conf/cassandra-env.sh I read this comment https://community.apigee.com/answers/24212/view.html though it doesn't explain exactly how to do it. I tried a few things but it didn't work. Also, if I try to change the values in cassandra-env.sh file once the process it's restarted it overrides all the values with the default ones.

One thing I noticed is that the logic on that file (below) is alway forcing the process to start with a maximum heap size of 8GB. In our case we have 48 GB on that box, we don't want to use all that for that process but 16GB would make more sense. I understand that this is a Cassandra recommendation but given our case we would like to bump up that number.

if [ "$half_system_memory_in_mb" -gt "1024" ]
then
    half_system_memory_in_mb="1024"
fi
if [ "$quarter_system_memory_in_mb" -gt "8192" ]
then
    quarter_system_memory_in_mb="8192"
fi
if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
then
    max_heap_size_in_mb="$half_system_memory_in_mb"
else
    max_heap_size_in_mb="$quarter_system_memory_in_mb"
fi
MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

The change for the G1GC is because when we run nodetool repair on some nodes we're getting an OutOfMemoryError and we are hoping to avoid that with these changes.

0 4 1,004
4 REPLIES 4

Not applicable

@Matias,

Cassandra version has changed from 2.0.15 to 2.1.13 in 1605 and to 2.1.16 in 1701.
I see that that you are using Apigee edge 1601 which is pretty old and I think you should upgrade to the latest apigee edge version ASAP.
Once you are on 1701/above, you will be on 2.1.16 version of Cassandra which is better than 2.0.15 for sure.

What you are trying to do on 1601 is significant, and I don't know if that is going to fix your issue.
If you want to increase the heap size and change the GC mechanism, I would recommend you to upgrade apigee edge to latest first and assess if you still need to make the Cassandra changes you called out.

I also recommend you to approach apigee support team to get proper recommendations around the changes and verify if there is something wrong with your present setup.

BTW I don't think there is any CWC token for the heap settings, but you can change the source/conf/Cassandra-env.sh and restart for the new settings to get reflected.

cc @Baba Krishnankutty

@Matias

Why are are you trying to increase heap ? What is the issue you are trying to address ?

@Maruti Chand it'd be great to upgrade to 17.05 or 17.09 but a decision was made and we'll stick to 16.01 for this Q4. I'll play with the change in source/conf/Cassandra-env. error.png

@Baba Krishnankutty This week while running nodetool repair in one of the nodes we had an OOM error. The 11GB heap dump shows the exception attached. We also found, after enable logs in INFO level, that the GC was working quite intensive during the repair.

INFO [ScheduledTasks:1] 2017-10-16 21:10:34,403 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 11848666 ms for 653 collections, 1623982360 used; max is 8375238656
INFO [ScheduledTasks:1] 2017-10-16 17:50:38,283 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 618239 ms for 27 collections, 8374041512 used; max is 8375238656
INFO [ScheduledTasks:1] 2017-10-16 17:40:18,583 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 255634 ms for 11 collections, 8372191000 used; max is 8375238656

Those are just a few of the many GCInspector logs that we have with

Do you have any recommendation about running the repair with a subrange for the kms keyspace?

@Matias, looks like there is a bug in 1601 which might be causing this. Can you work with support so that they can give you a script/steps to get that fix in 1601?

cc @cocoandjan