Our customer wants to store one million of user records at monthly batch process and need efficient way of storing data onto On-prem BaaS Collections entities. Is there any way like parallel query, bulk operation or SQL Loader like functions in BaaS with Apigee Edge? The source data is in JSON format converted from XML data. Please give some advice or suggestions for the customer.
Answer by Manish S · May 28, 2015 at 05:54 AM
Regarding your query about number of entities in a POST call - I have tried 50000 entities in one requests ( using Node.JS) and it has worked fine. so I believe limit of entities in one POST call is certainly more than 50K . Even though I have not tried, but I feel if you have proper infra and large size machines, you should be able to POST upto 1 Lakh in single call as well.
In case you want to POST millions of data, you need to break it into batches of say 50K/1Lakh ( depending on your infra ) and make requests.
In PUT call, there can be only one entity at a time.
For PUT, number of BaaS calls = number of entities you need to update .
I hope this helps.
@ManishS, Thank you for the very useful information.Do you also have any idea on the supported concurrency of simultaneous request handled by BaaS? In the case of 1Lakh can we make 10 requests at a time or do we need to wait for the response to send the next one sequentially?
It may also depend on the infra, capacity of web server, etc. But if you have any comment, I’ll appreciate it.
BaaS is optimized for batch write operation. So simultaneous calls for POST can be made easily . I have tried 3 concurrent calls from Node.JS in batches of 50 thousand ( total 1.5 Lakh entities) and it did work perfectly ( it had one cassandra node only ) . So with a decent infra set up I believe you should be able to achieve lot more .
@Manish S, Understood. Thank you so much for the answer.
Answer by pbhogill · May 15, 2015 at 07:28 PM
Hi @Toshihiro Shibamoto there is another thread that discusses this topic somewhat. You can also use a scripting tool / language to invoke the API to push data into BaaS. Also you can post multiple records to a BaaS collection using one POST call.
Thank you so much Prithpal Bhogill. I saw the post earlier but thought it looks like a different thing if I understand correctly. We just need an efficient handling of very huge number of data without filtering them out. Does BaaS have any optimization mechanism internally for processing multiple entities in a single POST/PUT query? If it's not the case the time taken for handling a million records would be still longer than acceptable range and then we need some kind of implementation tricks as mentioned above. The customer is trying to get benchmark data measured with the scale if available. Will appreciate if any further comments.
Answer by alan@apigee.com · May 19, 2015 at 08:30 PM
- ideally you won't need to do massive numbers of uploads, but per Prithpal, you can POST multiple entities at the the same time. Unfortunately, you can't PUT multiple items at a time.
Depending on the use case, it might be more efficient to "PUT" (Update) a small set of records instead of deleting all records and rePOSTing all records.
@alan@apigee.com Thanks a lot for the comments. What do you mean by '"PUT" (Update) a small set of records' while you said 'can't PUT multiple items at a time'? And is there a restriction in the numbers to POST multiple entities - is it 1000 also? Then what is the best practice to handle 1million of data to POST/PUT with using any ql statement or whatever to optimize it?