Limit to entries in apigee vault?

Not applicable

I'm looking to use the apigee vault to store some sensitive information. I then plan to write a nodejs proxy to return some of the data. We need to store about 400K-600K entries in an apigee vault. Has anybody run into performance issues with Apigee vaults? Is there a limit to the number of entries within a given vault?

Solved Solved
1 15 398
1 ACCEPTED SOLUTION

adas
Participant V

@jose.cedeno There are no such limitations as far as the number entries are concerned but note that the vault is built on top of the keyvaluemaps which is stored in Cassandra. In the current implementation, the entire set of keys/values for a complete map is stored as a single json blob in one fat row, and that would have a limit of 15MB.

Regarding the performance implications, the current kvm design has serious performance implications owing to the way its stored in C* whereby the performance severely degrades when the number entries grows. You can imagine, the entire json blob being fetched and then selected keys being parsed out of that payload. We have seen the performance degrade drastically beyond a certain point, anything over 5-10K entries becomes expensive.

We are aware of these limitations and that's the reason our new data platform aka "core persistence service" actually addresses these issues by storing the data in a slightly normalized form. This features is only available to "new customers" since there's a data migration involved to move existing customers to the key platform. You can read more about CPS here:

All new customers including trial and paid who have been provisioned on or after Oct 2015 should be on this new platform. To check if your org is CPS enabled, you can simply make a GET call:

curl -v https://api.enterprise.apigee.com/v1/o/{org}
If you see an org level property: 
<Property name="features.isCpsEnabled">true</Property>

If set to true, this means your org is cps enabled and you can leverage the new data platform to take advantage of the cps micro-services. I hope this was helpful. Please accept my answer if it helped you resolve your query.

View solution in original post

15 REPLIES 15

adas
Participant V

@jose.cedeno There are no such limitations as far as the number entries are concerned but note that the vault is built on top of the keyvaluemaps which is stored in Cassandra. In the current implementation, the entire set of keys/values for a complete map is stored as a single json blob in one fat row, and that would have a limit of 15MB.

Regarding the performance implications, the current kvm design has serious performance implications owing to the way its stored in C* whereby the performance severely degrades when the number entries grows. You can imagine, the entire json blob being fetched and then selected keys being parsed out of that payload. We have seen the performance degrade drastically beyond a certain point, anything over 5-10K entries becomes expensive.

We are aware of these limitations and that's the reason our new data platform aka "core persistence service" actually addresses these issues by storing the data in a slightly normalized form. This features is only available to "new customers" since there's a data migration involved to move existing customers to the key platform. You can read more about CPS here:

All new customers including trial and paid who have been provisioned on or after Oct 2015 should be on this new platform. To check if your org is CPS enabled, you can simply make a GET call:

curl -v https://api.enterprise.apigee.com/v1/o/{org}
If you see an org level property: 
<Property name="features.isCpsEnabled">true</Property>

If set to true, this means your org is cps enabled and you can leverage the new data platform to take advantage of the cps micro-services. I hope this was helpful. Please accept my answer if it helped you resolve your query.

+1, Great in detail Answer @arghya das

@arghya das thanks for that detailed answer. I checked and our organization does has CPS enabled. Does that mean that we should have no problems (performance wise) with storing 400K - 600K entries in an apigee vault?

Thanks for the detailed response @arghya das . Our org does have CPS enabled. Are you saying that we should be able to store 400K - 600K entries in a vault without running into performance issues fetching items later on out of the vault?

Yes, if you have cps enabled it should be a lot better. Can you try the same script in your org and see how much of a difference you are seeing.

The developer that was working on this feature directly is out until Monday. I'll have him re-run the script next time that he's in the office. He was just inserting a few records and both of our orgs (free and paid) already had CPS enabled.

The fact that only one record can be uploaded at a time to a vault is not very efficient 😞 .

With cps you can update each entry because the data is stored in normalized form. Whatever limitations I mentioned earlier is not applicable to the cps enabled orgs. If you are still seeing issues, then we need to look into it.

Not applicable

We wrote a script to start testing the vault to make sure that we can store entries. The script works correctly, but it takes about 3-4 seconds for each value. We encrypt the value before storing it in the vault and we generate a random id for that value we store in the vault. The random id is stored in a database.

This type of operation for hundreds of thousands of entries is too slow. Is it possible to push more than one entry to an Apigee vault at a time?

Currently not possible. Like I mentioned everytime you have to write, its actually fetching the entire thing and writing it back. Thats the reason for the slowness.

We are also coming up with a feature called encrypted kvm on top of the cps platform that customers can leverage. It should be faster, more secure and more reliable. However its not live yet. I do not have an ETA on the release date for that feature.

Thank you for the clarification. It cleared up the questions that I had about CPS. I guess, we'll have to use a different backend to store the 400K entries rather than use Apigee vault. I was hoping to be able to leverage the vaults 😞 .

Not applicable

@arghya das We ran the script again and this is the time that we are seeing to store the value in the vault with CPS enabled:

2016/04/18 10:46:14 time to push entry to vault: 3.879209037s

2016/04/18 10:46:17 time to push entry to vault: 3.315861261s

2016/04/18 10:46:21 time to push entry to vault: 3.342456417s

Right now, we have the script inserting just a few values into the vault. Is this performance that we should expect out of the vault? What can you suggest to troubleshoot and improve the performance?

@jose.cedeno It would be helpful to understand how you are capturing these numbers. You mentioned that the script that inserts to vault does some logic to generate and encrypt the values. Can you try inserting plaintext entries so that we can rule out any script overheads during the insert and purely capture the response time from the vault inserts. There's no way it should take that long to insert an entry.

@arghya das we capture the time before sending the POST request and right after the request is sent. We are using go, we'll try a different language and using curl from the command line.

Curl gives similar numbers. Here it takes 4.112 and 3.719 seconds to add an entry.

%time curl -u "example@example.com:$password" 'https://api.enterprise.apigee.com/v1/o/OUR_ORG/vaults/vault-test/entries' --data '{"name": "curl-test-1", "value": "xxx"}' -H 'Content-type: application/json' curl -u "example@example.com:$password" --data -H 0.01s user 0.00s system 0% cpu 4.112 total 

% time curl --verbose -u "example@example.com:$password" 'https://api.enterprise.apigee.com/v1/o/OUR_ORG/vaults/vault-test/entries' --data '{"name": "curl-test-2", "value": "xxx"}' -H 'Content-type: application/json' curl --verbose -u "example@example.com:$password" --data -H 0.01s user 0.00s system 0% cpu 3.719 total

As a baseline comparison, hitting Google's homepage takes just 0.137 seconds.

% time curl --silent 'https://google.com' >/dev/null
curl --silent 'https://google.com' > /dev/null  0.00s user 0.01s system 5% cpu 0.137 total

akoo
Participant V

Hello all, I wanted to add an important note: encrypted KVMs are here. Details are in our documentation: http://docs.apigee.com/api-services/reference/key-value-map-operations-policy . You now have an option for encrypted data without having to use Node.js.