Analytic Data not appearing in UI

Not applicable

Hi guys,

So the issue we currently have is that data from the past week is not appearing in our Analytic UI

We managed to resolve part of the issue and we are now seeing new data appearing in the UI however data for the past 7 days still seems to be missing.

When checking in Qpid we can see the following:

qpid-stat -q

qpid-queue-stats

So we can see the data is stuck and isn't being consumed....Has anyone seen this before and does anyone have any recommendations on what we can do to progress this?


0 9 700
9 REPLIES 9

Can you paste the output of the qpid-stat -q command and the other one? Did you restart the qpid services?

Hmm strange - I added them when I created the post but they're not there! I restarted qpid services yes! Here is

the 2 outputs:

Qpid-stat -q

5713-qpid-stat-q.png

qpid-queue-stats

5714-queue-stats.png

Could you check to see on your master Postgres if you have the data?
You can use psql.sh to open a coomand line and check. The query to verify will be

Select count(*), date_trunc('day', client_received_start_timestamp) as dt from analytics."<org>.<env>.fact" group by dt order by dt desc;

Check if yo have data for past 7 days

@spadmanabhan

I checked on master & slave and they both showed the same results - No data from 21st through to 28th! So no data exists between it

5717-postgres-missing-data.png

Not applicable

Alex,

Do you have more than one qpid? if so can you please provide qpid-stat -q out put from all the qpid instances? From the above qpid-stat out put you have provided seems like messages are getting consumed.

@RGanapavarapu

Hi

It seems that new data going through the qpid with an issue as the stats have changed since the stats I posted yesterday - To me it seems new data is being processed but not old data...But that wouldn't make sense since if data was stuck it wouldn't process any data (from my general MQ knowledge) - The node that is believed to not be processing data is Node 2 (with the mismatch on stats)

Node1:

5718-node-1-qpidstats.png

Node2

5719-node-2-qpidstats.png

karthik
Participant I

@Alex

Is this happening in production/non-production environments? In any case, did any monitoring alerts go off? Were there any scheduled/unscheduled maintenance/network outages that could have impacted this?

Is there any way to go back to elk/log stash and look for requests from this time period?

I've had similar issues(but not this exact same) and had to bounce both the qpid and postgres servers and traffic started collecting after both restarts.

@Karthik this is happening in Non Prod - No monitoring alerts went off as we only have probes set up (Is there another way to monitor APIs? If so please let me know!!)

No maintenance or outages that we have been made aware of - The only thing is Postgres was unavailable for a few days then once getting everything back up and running we've come to this problem!

We dont have elk/log stash although there was definitely requests during the period as the reason this was spotted was when asked to investigate a non prod issue we saw there was nothing but there should be!

Perhaps we need to look at doing a controlled stop/start in APIGEE Recommended order (we have only had components stopped & started at different points) - It's strange that new data is processing OK though....

karthik
Participant I

@Alex

You might want to check out test.apigee.com, where you can setup tests. If you need to set alerts/monitoring internally you can use existing enterprise licenses you have in-house . Say Nagios or tibco hawk or autosys.

Typically I would setup an monitor/alert for a business end point like below.

http://youentrprise.com/v1/businessapi

1)Setup a script to make a curl call to hit above end point every 5-10 minutes(pre defined interval via crontab/equivalent).

2) You expect a 200 OK with a pre-defined payload. If you get a 20x series response, all is good.

3) If you start getting non-20x series errors && to avoid false alarms set up alarms in such a way that if you get 3 consecutive non-20x responses trigger an alert to Ops teams.

4) There is also a RMP Health check Proxy as part of apigee best practices. You might want to extend that to point to your backends as well.

The overall idea is to know about the issue before the customer knows it and address it ASAP. So I would setup both internal & external triggers(test.apigee.com is an external trigger).

Hope this helps!

Cheers

Karthik