What message goes to MP log when a proxy fails to deploy?

Not applicable

So we had a problem today where a proxy failed to deploy to all message processors. The solution was pretty simple: restart the MPs one at a time and then re-install.

Thing is - I would like to raise a pro-active alert to do just this.. right? So what do i look for in the logs to know that a single MP has suffered this failure?

Note that this is similar behavior to what others see here: https://community.apigee.com/questions/8900/the-revision-is-deployed-and-traffic-can-flow-but-1.html

I just want to be proactive in private cloud - and cull the bad message processors 🙂

2 4 403
4 REPLIES 4

Not applicable

Hi @Benjamin Goldman , Instead of getting that information from logs I believe its better to check the response of this management API

/v1/organizations/$orgname/apis/$apiname/deployments

which will give you the status of deployment in all the Mps like below

<APIProxyDeployment name="$apiname">
<Organization>$orgname</Organization>
<Environment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="environmentStatusForRevision" name="prod"><Revision xsi:type="revisionStatusInEnvironment" name="1"><Configuration>
<BasePath>/</BasePath>
</Configuration>
<Server xsi:type="serverDeploymentStatus"><UUID>a9aafa4d-4d67-424c-9a3a-d6c7aadcd783</UUID>
<Status>deployed</Status>
<Type>message-processor</Type>
</Server>
<Server xsi:type="serverDeploymentStatus"><UUID>40129622-1602-43a3-911f-be6940c02483</UUID>
<Status>deployed</Status>
<Type>router</Type>
</Server>
<State>deployed</State>
</Revision>
</Environment>
</APIProxyDeployment>

if it is not deployed properly you will see 'error' in the state for that component and based on that you can then restart or check the problem with that component .

Not applicable

So this works - and this is how i find the MPs that are in trouble when i have a failed proxy push. Thats not the problem im trying to solve. IM looking for a way to be proactive about this right?

Does the message processor do something when it suddenly cant be reached by the publish mechanism? If it does - thats something I can trap in the logs (or elsewhere) and use to restart the MP proactively as part of a larger cluster - and minimize bad user experience.

Of course - if I could guarantee that my network never had any problems that would work too - but my network is way too big to rule out issues.

Think of it this way: i would rather have code recycle my MPs before a user sees a problem than wait for the problem, do what you suggest (which will only impact things AFTER a change in deployed code!) and then react to the problem.

@Benjamin Goldman , I understand what you are saying but the deployment on a MP fails for many reasons (connectivity between Mp - C*,zk or mgmt) / OOM / diskspace etc .

May be you should consider all those scenarios and prepare a check list .

Easy to start will be to grep for 'Exception' in the message processor and management server logs before any deployments as any problem will be logged as an Exception . I think this will cover most of the scenarios .

If you think otherwise or know of any other scenarios where you don't see Exceptions but deployment fails , it will definitely help everyone here 🙂

So im currently dumping all of my Router and MP logs into elastic search (not the rest yet.. im slow...) Let me mine that tomorrow to see if I find anything w/ the string "Exception" in it that might be useful.

Im hoping to find a fingerprint of something here right? Like "hey - i cant see ZK..." which can trigger some maintenance script (think along the lines of what Netflix does...) Ill get back to this tomorrow with any results. Maybe you will have more insight based on what I find 🙂