Getting Cassandra PoolTimeoutException when Saving KeyValue Map

I have a custom OAuth JWT api proxy which stores some data in KeyValue Maps. When doing load testing (30 concurrent users) we are getting below errors in MessageProcessor logs. Is there any Cassandra connection pool settings we can increase to resolve this?

2016-03-31 05:14:12,317 org:OrgName env:staging api:oauth-jwt_rev16_2015_12_11_rev9_2016_03_01 rev:2 policy:Key-Value-Map-Operations-Save-RT-Ticket-TokenFlow Apigee-Main-636 ERROR DATASTORE.CASSANDRA - AstyanaxCassandraClient.logHostPoolInCaseOfErrors() : Cassandra Host Pool under use - All Hosts: xxx.xx.xx.xxx(xxx.xx.xx.xxx):9160,yyy.yy.yy.yyy(yyy.yy.yy.yyy):9160,zzz.zz.zz.zzz(zzz.zz.zz.zzz):9160. Active Hosts: zzz.zz.zz.zzz(zzz.zz.zz.zzz):9160,xxx.xx.xx.xxx(xxx.xx.xx.xxx):9160
2016-03-31 05:14:12,318 org:OrgName env:staging api:oauth-jwt_rev16_2015_12_11_rev9_2016_03_01 rev:2  Apigee-Main-636 ERROR MESSAGING.FLOW - AsyncExecutionStrategy$AsyncExecutionTask.logException() : Exception caught 
com.apigee.datastore.DataAccessException: Error while accessing datastore;Please retry later
	at com.apigee.datastore.client.astyanax.AstyanaxCassandraClient.getColumnValueByCompositeColumns(AstyanaxCassandraClient.java:877) ~[datastore-1.0.0.jar:na]
	at com.apigee.keyvaluemap.dao.nosql.KeyValueMapDaoImpl.getMap(KeyValueMapDaoImpl.java:74) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.getKeyValueMapFromDAO(KeyValueMapServiceImpl.java:346) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.internalGetKeyValueMapIfExists(KeyValueMapServiceImpl.java:351) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.createOrUpdateKeyValueMap(KeyValueMapServiceImpl.java:175) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.steps.keyvaluemapoperations.KeyValueMapOperationsExecution.execute(KeyValueMapOperationsExecution.java:118) ~[keyvaluemap-operations-1.0.0.jar:na]
	at com.apigee.messaging.runtime.steps.StepExecution.execute(StepExecution.java:136) ~[message-processor-1.0.0.jar:na]
	at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:95) [message-flow-1.0.0.jar:na]
	at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:65) [message-flow-1.0.0.jar:na]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_45]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
	at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
2016-03-31 05:14:12,787 org:OrgName env:staging api:oauth-jwt_rev16_2015_12_11_rev9_2016_03_01 rev:2 policy:Key-Value-Map-Operations-Save-RT-Ticket-TokenFlow Apigee-Main-657 ERROR DATASTORE.CASSANDRA - AstyanaxCassandraClient.getColumnValueByCompositeColumns() : Exception while  getColumnValueByCompositeColumns; columnFamily:keyvaluemaps_r21, columnFamilyName : intralinks,key:[env, staging, kvmaps, refreshtoken2ticket], columnNamesStart:null, columnNamesEnd:null, timeRange :10000, columns :null, startKey:{} 
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=None(0.0.0.0):0, latency=2115(2115), attempts=2]Timed out waiting for connection
	at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231) ~[astyanax-core-1.56.43.jar:na]
	at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198) ~[astyanax-core-1.56.43.jar:na]
	at com.netflix.astyanax.connectionpool.impl.LeastOutstandingExecuteWithFailover.borrowConnection(LeastOutstandingExecuteWithFailover.java:74) ~[astyanax-core-1.56.43.jar:na]
	at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117) ~[astyanax-core-1.56.43.jar:na]
	at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338) ~[astyanax-core-1.56.43.jar:na]
	at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1.execute(ThriftColumnFamilyQueryImpl.java:175) ~[astyanax-thrift-1.56.43.jar:na]
	at com.apigee.datastore.client.astyanax.AstyanaxCassandraClient.getColumnValueByCompositeColumns(AstyanaxCassandraClient.java:849) ~[datastore-1.0.0.jar:na]
	at com.apigee.keyvaluemap.dao.nosql.KeyValueMapDaoImpl.getMap(KeyValueMapDaoImpl.java:74) [keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.getKeyValueMapFromDAO(KeyValueMapServiceImpl.java:346) [keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.internalGetKeyValueMapIfExists(KeyValueMapServiceImpl.java:351) [keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.createOrUpdateKeyValueMap(KeyValueMapServiceImpl.java:175) [keyvaluemap-1.0.0.jar:na]
	at com.apigee.steps.keyvaluemapoperations.KeyValueMapOperationsExecution.execute(KeyValueMapOperationsExecution.java:118) [keyvaluemap-operations-1.0.0.jar:na]
	at com.apigee.messaging.runtime.steps.StepExecution.execute(StepExecution.java:136) [message-processor-1.0.0.jar:na]
	at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:95) [message-flow-1.0.0.jar:na]
	at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:65) [message-flow-1.0.0.jar:na]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_45]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
	at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
2016-03-31 05:14:12,788 org:OrgName env:staging api:oauth-jwt_rev16_2015_12_11_rev9_2016_03_01 rev:2 policy:Key-Value-Map-Operations-Save-RT-Ticket-TokenFlow Apigee-Main-657 ERROR DATASTORE.CASSANDRA - AstyanaxCassandraClient.logHostPoolInCaseOfErrors() : Cassandra Host Pool under use - All Hosts: xxx.xx.xx.xxx(xxx.xx.xx.xxx):9160,yyy.yy.yy.yyy(yyy.yy.yy.yyy):9160,zzz.zz.zz.zzz(zzz.zz.zz.zzz):9160. Active Hosts: zzz.zz.zz.zzz(zzz.zz.zz.zzz):9160,xxx.xx.xx.xxx(xxx.xx.xx.xxx):9160
2016-03-31 05:14:12,790 org:OrgName env:staging api:oauth-jwt_rev16_2015_12_11_rev9_2016_03_01 rev:2  Apigee-Main-657 ERROR MESSAGING.FLOW - AsyncExecutionStrategy$AsyncExecutionTask.logException() : Exception caught 
com.apigee.datastore.DataAccessException: Error while accessing datastore;Please retry later
	at com.apigee.datastore.client.astyanax.AstyanaxCassandraClient.getColumnValueByCompositeColumns(AstyanaxCassandraClient.java:877) ~[datastore-1.0.0.jar:na]
	at com.apigee.keyvaluemap.dao.nosql.KeyValueMapDaoImpl.getMap(KeyValueMapDaoImpl.java:74) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.getKeyValueMapFromDAO(KeyValueMapServiceImpl.java:346) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.internalGetKeyValueMapIfExists(KeyValueMapServiceImpl.java:351) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.keyvaluemap.service.KeyValueMapServiceImpl.createOrUpdateKeyValueMap(KeyValueMapServiceImpl.java:175) ~[keyvaluemap-1.0.0.jar:na]
	at com.apigee.steps.keyvaluemapoperations.KeyValueMapOperationsExecution.execute(KeyValueMapOperationsExecution.java:118) ~[keyvaluemap-operations-1.0.0.jar:na]
	at com.apigee.messaging.runtime.steps.StepExecution.execute(StepExecution.java:136) ~[message-processor-1.0.0.jar:na]
	at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:95) [message-flow-1.0.0.jar:na]
	at com.apigee.flow.execution.AsyncExecutionStrategy$AsyncExecutionTask.call(AsyncExecutionStrategy.java:65) [message-flow-1.0.0.jar:na]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_45]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_45]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
	at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Solved Solved
2 13 2,001
1 ACCEPTED SOLUTION

Not applicable

Not exactly. It's the max connections per host. So the max pool size = '# of C* hosts * maxactive_cassandra_connections"

View solution in original post

13 REPLIES 13

I see maxactive_cassandra_connections=5 in /apps/apigee4/conf/apigee/message-processor/keyvaluemap-datastore.properties. Is this the max pool size?

Not applicable

Not exactly. It's the max connections per host. So the max pool size = '# of C* hosts * maxactive_cassandra_connections"

We have 3 Cassandra hosts. So the max pool size should be 15. Maybe thats not sufficient with the load being exerted. We have bumped it to 10 so the max pool size is 30 now. Will let you know if that worked.

Changing maxactive_cassandra_connections from 5 to 10 did not make a difference. We made this change on the Router/Message-Processor servers. Does this change need to be made on the other servers as well?

Is this the correct setting to resolve this issue?

Any way we can monitor the active connections via JMX using JConsole?

Edge has several components which uses the C* internally. So please look into all "*-datastore.propeties" along with counter.properties to modify that value. This is only applicable to Management server and Message Processor

Thanks @Niraj Tulachan Do you know the JMX MBean which we can monitor to show the active # of connections?

This did not make any difference. We are still getting PoolTimeoutException's.

@Niraj Tulachan since we are getting "timed out waiting for connection" can we increase the thrift socket timeout in keyvaluemap-datastore.properties?

#To set the Hector's/Astyanax's thrift socket timeout in millis

cassandra.defaults.thrift.socketTimeoutInMillis=10000

Is there any configuration property which sets the MaxTimeoutWhenExhausted of Astyanax connection pool?

I don't think changing socket timeout will help since you're getting the PoolTimeoutException i.e. basically, all the connections in the pool are busy, that is why the next request is not able to get any connection from the pool and eventually times out with that exception. If you're encountering the above exception, best thing is to increase the # of connections

Thanks @Niraj Tulachan we will try with higher "maxactive_cassandra_connections".

Is there a way to monitor active connections in pool so we can set this value accordingly?

Currently, there isn't 😞

Can you please share which file we need to make the changes and what is the max connections per host we can configure from above pool size formula.

Hi all, is there a solution for this as I don't see one in the conversation?