Router restart failing with error: com.apigee.kernel.exceptions.spi.UncheckedException: Router not started because, Load Balancing could not be initialized

1 2 1,582

Issue: There is a known issue with nginx routers where restart of a router is not successful.

Here is the state of router after restart:

apigee-service: edge-router: Not running (DEAD)

Subsequently the following error message can be found in router logs:

2016-06-10 09:25:26,250 main ERROR KERNEL - MicroKernel.deployAll() : MicroKernel.deployAll() : Error in deploying the deployment : MessageProcessorManagementService

com.apigee.kernel.exceptions.spi.UncheckedException: Router not started because, Load Balancing could not be initialized

Fix: Delete all files under /opt/nginx/conf.d and restart the router.

Comments
Not applicable

As the files that are stored in /opt/nginx/conf.d are recreated every time edge-router runs we have this issue every time edge-router is restarted or when the machine is rebooted.

After some investigation I came to the conclusion that the script /opt/nginx/scrips/apigee-nginx should run as root (sudo) in stead as user apigee (see below snapshot if /opt/nginx/logs/error.log). Although the above mentioned fix works it feels like a side-effect.

Does anyone know the root-cause for this and have a more sustainable fix?

2016/06/15 13:00:01 [warn] 4404#4404: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /opt/nginx/conf/nginx.conf:2
2016/06/15 13:00:01 [emerg] 4404#4404: bind() to 172.29.228.144:80 failed (13: Permission denied)
2016/06/15 13:01:05 [warn] 4699#4699: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /opt/nginx/conf/nginx.conf:2
2016/06/15 13:01:05 [warn] 4712#4712: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /opt/nginx/conf/nginx.conf:2
2016/06/15 13:01:05 [alert] 4714#4714: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4715#4715: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4716#4716: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4717#4717: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4719#4719: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4720#4720: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4721#4721: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)
2016/06/15 13:01:05 [alert] 4722#4722: setrlimit(RLIMIT_NOFILE, 400000) failed (1: Operation not permitted)

Update 2016-06-18:

After further investigation it is determined that the issue does not occur if on-boarding is not done yet. Files in /opt/nginx/conf.d are at that moment:

rw-rr-  1 apigee apigee 1321 Jun 17 10:50 0-default.conf
rw-rr-  1 apigee apigee  667 Jun 17 11:15 0-edge-health.conf
rw-rr-  1 apigee apigee 1062 Jun 17 10:50 0-map.conf
rw-rr-  1 apigee apigee  689 Jun 17 10:50 0-upstream-stats.conf
After executing on-boarding steps and rebooting RMP, the issue is reproduced. Files in /opt/nginx/conf.d after restart are:
rw-rr-  1 apigee apigee 1321 Jun 17 10:50 0-default.conf
rw-rr-  1 apigee apigee  961 Jun 17 13:25 0-edge-health.conf
rw-rr-  1 apigee apigee  422 Jun 17 13:12 0-fallback.conf
rw-rr-  1 apigee apigee 1062 Jun 17 10:50 0-map.conf
rw-rr-  1 apigee apigee  409 Jun 17 13:12 0-upstream-pools.conf
rw-rr-  1 apigee apigee  689 Jun 17 10:50 0-upstream-stats.conf
rw-rr-  1 apigee apigee 1781 Jun 17 13:16 tomtom_dev_development.conf.bad
rw-rr-  1 apigee apigee 1783 Jun 17 13:12 tomtom_prod_default.conf.bad

It is not investigated yet why tomtom....conf.bad have extension .bad

Update 2016-07-04:

Most likely this (.bad extension) was caused by the fact that we used port '80' for the virtual host which is a port in the range of the first 1000 and therefore more rights are needed. When using the recommended ports such as 9001 and 9002 I did not encounter this issue.

srraorams
Participant II

any other solution apart from removing the files and restart , because i tried this and it didn't work in my case

Version history
Last update:
‎06-10-2016 11:59 AM
Updated by: