Re: External Access To Cassandra

Report Inappropriate Content · 10-21-2015 06:45 AM

(note: yes, I know about BaaS/Usergrid, we just don't have it installed)

I would like to do some prototyping work and would like to leverage our on-prem cassandra cluster. If I create my own keyspace, is there any real downside in doing so or argument against doing so?

This is not production work and I fully expect you guys to talk about negative performance impact if an external system starts to really hog resource from Apigee 🙂

Report Inappropriate Content

- the prototype services would be external and accessing cassandra, so totally outside of apigee, just using cassandra because it's there.

sarthak

@Jonathan Baney ... I think more than the performance impact I will be worried about such activity violating Apigee's T&Cs. Since you will be modifying the Apigee internals in your own way I think that will cause challenge about Apigee's support for the software in the future.

What happens about upgrades ? What happens if something goes wrong ? While doing a future upgrade of the platform or while deploying a hot fix if the upgrade path deletes your keystore or aborts the upgrade due to unrecognized keystores etc.?

I am no legal person , and I have never read Apigee's T&Cs but I have heard (and it made sense to me ) about these sorts of unofficial usage of the underlying software which violates Apigee's support clauses.

That being said I am no authority on any such matter, so lets wait for someone more knowledgeable to comment here. I would also suggest you to raise a support ticket on this. They will be in a better position to validate if such work violates our support clauses.

On a side note - You can leverage the same cassandra cluster for your BaaS/Usergrid. Yes, it will have performance impact as you mentioned but atleast you do it the official way and also get all the benefits of BaaS.

Report Inappropriate Content

Good point about the Ts&Cs. What I'll probably end up doing for quickness is actually write a proxy that leverages the cache or key/value map for my reads/writes 🙂 At least until I can get usergrid or a standalone cassandra cluster.

adas

@Jonathan Baney if you are an on-prem customer, you are free to make changes to your systems they way you want it, but if this starts impacting the existing edge functionality, performance or reliability then you may have a problem. The standard sla and support agreements may not apply any more since you modified parts of the system by yourself. I would suggest you open a support ticket and figure out the implications.

Now coming to the actual question, this is definitely something you can do, it may not be recommended but since its not your production system, and if you understand the risks and decide to give it a shot. Frim technical standpoint this is definitely something you can do. Since you are going to create a new keyspace altogether, and have your column families created in that keyspace, you can drop it anytime without impacting existing schema or data. Things that you should be aware of:

- pounding on this new keyspace can cause performance implications to the existing edge

- if this keyspace grows very large, you would need to manage the data, disk space etc.

- if you frequently create/update/delete data from this keyspace it would trigger compactions which would start impacting the overall system health of your cassandra nodes

- make sure theres no dependency on the existing data or column families, and completely isolate this new keyspace so that you can delete it, if you start seeing negative impact.

If you really need to store some arbitrary dataset Api BaaS should be the right solution. You can also play around with the kvm in edge.

Report Inappropriate Content

Setting up Cassandra

Download the Cassandra release from http://cassandra.apache.org/down...

I downloaded release 1.1.4

Create the directory “/usr/local/cassandra” and expand the TAR in it

You now have “/usr/local/cassandra/apache-cassandra-1.1.4” as the Cassandra home directory

Create the default data and log directories for Cassandra

These are “/var/lib/cassandra” and “/var/log/cassandra”
To use different directories, update the “cassandra.yaml” file in the “conf” sub-directory

Update Cassandra to listen for connections on the local machine’s IP address instead of on “localhost”

This is required to connect to Cassandra from a remote Java client (on a separate Windows 7 machine in my case)
Open “cassandra.yaml” in the “conf” sub-directory and update the variables “listen_address” and “rpc_address”
- listen_address: 192.168.3.133
- rpc_address: 192.168.3.133
- where “192.168.3.133” was the address of the machine running Cassandra in my case. Use your machine’s IP address in your setup.

(Optional) Set the name of your Cassandra cluster

Open “cassandra.yaml” in the “conf” sub-directory and update the variable “cluster_name”. E.g, cluster_name: ‘Test Cluster’

(May be Required) Handling an intermittent Java exception while starting or remotely connecting to Cassandra

In my case this happened because the JVM started by Cassandra did not have access to enough memory
Update the file “http://cassandra-env.sh” in the “conf” sub-directory
- Change the line: JVM_OPTS=”$JVM_OPTS -Xss160k”
- Set the –Xss flag to a value higher than 160, e.g., 256 or 512

CentOS 6.3 includes a firewall that is on by default and blocks incoming connections

Update the firewall configuration and allow port 9160 through for incoming connections
- 9160 is the default RPC port that Cassandra listens on
- You can change the port in the “cassandra.yaml” file
- Access the firewall from “System / Administration / Firewall” in the CentOS menu
- Add the port under “Other Ports”

That’s it! Cassandra is set up
Run Cassandra by executing the command “./bin/cassandra” from the Cassandra home directory

You can also execute the command with the “-f” flag if you want Cassandra to run in the background

Running the Cassandra CLI

Launch the Cassandra CLI by executing the command “./bin/cassandra-cli” from the Cassandra home directory

You will see a Java exception “org.apache.thrift.transport.TTransportException: Java | Oracle Community.ConnectException: Connection refused”
This is because you have configured Cassandra to listen on the local IP address instead of the default “localhost”

On the CLI, enter the command “connect <Local IP Address>/<RPC Port>;”

In my case: connect 192.168.3.133/9160;

You can review all the CLI commands by entering “?” on the command line
Enter the command “show cluster name;” to verify your cluster name
Find out more about the CLI here: http://www.datastax.com/docs/1.1...

Write and Run your Remote Java Client

NOTE: I did this using Pelops which is a great Java client for Cassandra

Follow the tutorial here: https://github.com/s7/scale7-pelops

I created a Key Space named “test” and a Column Family named “Users”
My sample Java client is below. I ran this in Eclipse Juno.

You will need the following JARs

apache-cassandra-thrift-1.1.4.jar
- From: /usr/local/cassandra/apache-cassandra-1.1.4/lib
commons-pool-1.6.jar
- From: http://commons.apache.org/pool/d...
libthrift-0.7.0.jar
- From: /usr/local/cassandra/apache-cassandra-1.1.4/lib
log4j-1.2.16.jar
- From: http://logging.apache.org/log4j/...
scale7-core-1.3.0.jar
- From: https://github.com/s7/mvnrepo/tr...
scale7-pelops-1.3-1.1.x.jar
- From: https://github.com/s7/mvnrepo/tr...
slf4j-api-1.7.0.jar
- From: http://www.slf4j.org/download.html
slf4j-log4j12-1.7.0.jar
- From: http://www.slf4j.org/download.html
uuid-3.3.jar
- From: http://eaio.com/maven2/com/eaio/...

References

Cassandra: The Definitive Guide (http://shop.oreilly.com/product/...)
http://wiki.apache.org/cassandra...
https://github.com/s7/scale7-pelops
http://zefonseca.com/blogs/zen/s...