Migration(from DC-1 to DC-2)-> Management Sever startup error

We are moving Apigee edge 4.16.09 from DC-1 to DC-2 and during the process we have encountered an issue while starting MS.

Steps:

DC-1:

Backup all components.

DC-2

Bootstrap,install the components and restore to new DC.

All components are online except the MS because of openldap issue...

Debugging now but anyone has any pointers please reply..

==

2017-07-21 02:11:30,225  main ERROR KERNEL.DEPLOYMENT - 
ServiceDeployer.startService() : ServiceDeployer.deploy() : Got a life 
cycle exception while starting service [SecurityService, Unable to add 
initialize users and resources] : {}
com.apigee.kernel.exceptions.spi.UncheckedException: Unable to add initialize users and resources
        at com.apigee.rbac.impl.ServiceUtil.nodeForPermissionsExists(ServiceUtil.java:268) ~[rbac-1.0.0.jar:na]
        at com.apigee.rbac.datastore.LdapDataStore.initializeNameSpace(LdapDataStore.java:457) ~[rbac-1.0.0.jar:na]
        at com.apigee.rbac.impl.UserServiceImpl.initialize(UserServiceImpl.java:98) ~[rbac-1.0.0.jar:na]
        at com.apigee.security.usermanagement.UserManager.<init>(UserManager.java:77) ~[security-1.0.0.jar:na]
        at com.apigee.security.SecurityManager.<init>(SecurityManager.java:78) ~[security-1.0.0.jar:na]
        at com.apigee.security.SecurityServiceImpl.start(SecurityServiceImpl.java:77) ~[security-1.0.0.jar:na]
        at 
com.apigee.kernel.service.deployment.ServiceDeployer.startService(ServiceDeployer.java:168)
 [microkernel-1.0.0.jar:na]
        at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:71) [microkernel-1.0.0.jar:na]
        at com.apigee.kernel.MicroKernel.deployAll(MicroKernel.java:178) [microkernel-1.0.0.jar:na]
        at com.apigee.kernel.MicroKernel.start(MicroKernel.java:139) [microkernel-1.0.0.jar:na]
        at com.apigee.kernel.MicroKernel.start(MicroKernel.java:135) [microkernel-1.0.0.jar:na]
        at com.apigee.kernel.MicroKernel.main(MicroKernel.java:84) [microkernel-1.0.0.jar:na]
Caused by: javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]
        at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3161) ~[na:1.8.0_112]
        at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3082) ~[na:1.8.0_112]
        at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2888) ~[na:1.8.0_112]
        at com.sun.jndi.ldap.LdapCtx.searchAux(LdapCtx.java:1846) ~[na:1.8.0_112]
        at com.sun.jndi.ldap.LdapCtx.c_search(LdapCtx.java:1769) ~[na:1.8.0_112]
        at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(ComponentDirContext.java:392) ~[na:1.8.0_112]
        at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:358) ~[na:1.8.0_112]
        at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:341) ~[na:1.8.0_112]
        at javax.naming.directory.InitialDirContext.search(InitialDirContext.java:267) ~[na:1.8.0_112]
        at com.apigee.rbac.impl.ServiceUtil.nodeForPermissionsExists(ServiceUtil.java:265) ~[rbac-1.0.0.jar:na]
        ... 11 common frames omitted
1 7 1,267
7 REPLIES 7

Not applicable

Not sure if I follow the process you are using.

If the plan is to backup DC1 and restore in DC2 you must ensure that IP addresses are not changing and that components startup is performed in the appropriate order for a cold start.

Order:

  1. Cassandra + Zookeeper nodes
  2. PostgreSQL + Postgres Sever nodes
  3. Qpid + Qpid Server nodes
  4. Ldap + Management Server + UI node
  5. Routers + MPs

Hello Maudrit,

We will soon decommission servers in DC1 & move towards DC2. During the process we are migrating the apigee edge components from DC1 to DC2.

We will have different IP's in different DC's and not same.

We have done this approach in past migrating DC1 to DC2 with @Paul Mibus help

Most of the steps worked but while starting MS we are facing issue related to openldap.

Details are in Case # 1430582.

When do we get above error? How to backup ldap and restore in new DC to resolve the issue?

Once MS is up we will correct datastore registrations(internal IP's) in new DC.

-Vinay

Yes, you need to replace IP addresses using recommended steps by component.

I'll look at the ticket and provide feedback.

Please update the ticket..We were able to take backup and restore openldap but see below error once we logged into UI..In zk we don't see any organization details etc..

When do we receive below error. What permissions it is looking for?

==

Error fetching permissions for orgadmin, Error fetching permissions for orgadmin, Error fetching Environments

==

-Vinay

Hi All,

Any update one above issue ?How did you resolved ? we are also facing similar issue.

2019-06-07 12:13:38,483 org: env: target: contextId: action: main DEBUG SERVICES.RBAC - ServiceUtil.nodeForUserRolesExists() : ServiceUtil.nodeForUserRolesExists():Checking userroles node exist in NameSpace:SystemNamespace 2019-06-07 12:13:38,484 org: env: target: contextId: action: main DEBUG SERVICES.RBAC - ServiceUtil.createParentNodeForUserRoles() : ServiceUtil.createParentNodeForUserRoles():Creating parent node for userroles in NameSpace:SystemNamespace 2019-06-07 12:13:38,490 org: env: target: contextId: action: main ERROR KERNEL.DEPLOYMENT - ServiceDeployer.startService() : ServiceDeployer.deploy() : Got a life cycle exception while starting service [SecurityService, Unable to add initialize users and resources] : {} com.apigee.kernel.exceptions.spi.UncheckedException: Unable to add initialize users and resources at com.apigee.rbac.impl.ServiceUtil.createParentNodeForUserRoles(ServiceUtil.java:275) at com.apigee.rbac.datastore.LdapDataStore.initializeNameSpace(LdapDataStore.java:639) at com.apigee.rbac.impl.UserServiceImpl.initialize(UserServiceImpl.java:125) at com.apigee.security.usermanagement.UserManager.<init>(UserManager.java:88) at com.apigee.security.SecurityManagerImpl.<init>(SecurityManagerImpl.java:81) at com.apigee.security.SecurityServiceImpl.start(SecurityServiceImpl.java:98) at com.apigee.kernel.service.deployment.ServiceDeployer.startService(ServiceDeployer.java:168) at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:71) at com.apigee.kernel.MicroKernel.deployAll(MicroKernel.java:190) at com.apigee.kernel.MicroKernel.start(MicroKernel.java:151) at com.apigee.kernel.MicroKernel.start(MicroKernel.java:146) at com.apigee.kernel.MicroKernel.main(MicroKernel.java:95) 2019-06-07 12:13:38,491 org: env: target: contextId: action: main ERROR KERNEL - MicroKernel.deployAll() : MicroKernel.deployAll() : Error in deploying the deployment : SecurityService com.apigee.kernel.exceptions.spi.UncheckedException: Unable to add initialize users and resources at com.apigee.rbac.impl.ServiceUtil.createParentNodeForUserRoles(ServiceUtil.java:275) at com.apigee.rbac.datastore.LdapDataStore.initializeNameSpace(LdapDataStore.java:639) at com.apigee.rbac.impl.UserServiceImpl.initialize(UserServiceImpl.java:125) at com.apigee.security.usermanagement.UserManager.<init>(UserManager.java:88) at com.apigee.security.SecurityManagerImpl.<init>(SecurityManagerImpl.java:81) at com.apigee.security.SecurityServiceImpl.start(SecurityServiceImpl.java:98) at com.apigee.kernel.service.deployment.ServiceDeployer.startService(ServiceDeployer.java:168) at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:71) at com.apigee.kernel.MicroKernel.deployAll(MicroKernel.java:190) at com.apigee.kernel.MicroKernel.start(MicroKernel.java:151) at com.apigee.kernel.MicroKernel.start(MicroKernel.java:146) at com.apigee.kernel.MicroKernel.main(MicroKernel.java:95) 2019-06-07 12:13:38,491 org: env: target: contextId: action: main DEBUG KERNEL - MicroKernel.deployAll() : MicroKernel.deployAll : Terminate on Failure flag is true so halting the boot up process 2019-06-07 12:13:38,492 org: env: target: contextId: action: Thread-2 INFO KERNEL - ShutdownHook.run() : ShutdownHook.run : System shutdown in progress... 2019-06-07 12:13:38,492 org: env: target: contextId: action: Thread-2 DEBUG KERNEL - MicroKernel.unDeployAll() : MicroKernel.unDeployAll() : UnDeploying the deployment : Wipeout 2019-06-07 12:13:38,492 org: env: target: contextId: action: Thread-2 DEBUG KERNEL - MicroKernel.unDeployAll() : MicroKernel.unDeployAll() : UnDeploying the deployment :

Thank you!

We ran into this same issue and just finished working with support to get it sorted out. I put together a cleaned up version of the steps we went through to get the edge-management-server to a healthy state.

In our situation our LDAP data wound up with incorrect "glue" attributes on some objects, likely due to a sync issue that we have yet to identify. You can check to see if you are running into the same issue by following these steps.

Stop apigee-openldap, export your LDAP data, then check for "glue" object types by running these commands on your apigee-openldap node:

apigee-service apigee-openldap stop
slapcat -F /opt/apigee/data/apigee-openldap/slapd.d -l /tmp/glue.ldif
grep glue /tmp/glue.ldif

If you see output like this, then you might be in the same situation we were, and hopefully this information will help you repair your ldap data.

macbook$ grep glue /tmp/glue.ldif
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue

If you don't see any, this fix unfortunately won't apply to you.

Before doing anything else, please backup all your OpenLDAP data from all instances.


Summary of steps:

  1. Edit the ldif file to fix any objects that have objectClass or structualObjectClass of glue.
  2. Shutdown any other LDAP instances that are configured to replicate data.
  3. Stop OpenLDAP, remove the existing DB, load the new data using slapadd, and set file permissions.
  4. Start apige-openldap and edge-management-server and check logs to verify health
  5. Remove the bad DB from other LDAP instances, and start the apigee-openldap service.
  6. Test functionality


1 - Editing the LDIF file

The goal in the object repair is to change the the objectClass and structuralObjectClass attributes to the correct type and add missing attributes. In our case, we only had four to repair. You can use a clean working export to figure out what is missing or wrong, but in our case Apigee support helped us out.

We had four objects we had to repair:

dn: dc=apigee,dc=com

dn: ou=global,dc=apigee,dc=com

dn: ou=users,ou=global,dc=apigee,dc=com

dn: ou=userroles,ou=global,dc=apigee,dc=com

I'll go through them one by one. Lines in bold were modified or added, lines with strikethrough were removed.

dn: dc=apigee,dc=com
creatorsName: cn=manager,dc=apigee,dc=com
entryUUID: cb3b3086-3c40-1039-90fd-5f32d4271a03
createTimestamp: 20190716181053Z
entryCSN: 20190822152107.007197Z#000000#001#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190822152107Z
objectClass: dcObject
objectClass: organization
dc: apigee
o: Apigee
structuralObjectClass: organization
objectClass: top
structuralObjectClass: glue
contextCSN: 20200608174655.054236Z#000000#001#000000
contextCSN: 20200114150746.195003Z#000000#002#000000

dn: ou=global,dc=apigee,dc=com
entryUUID: cb3dc558-3c40-1039-90fe-5f32d4271a03
creatorsName: cn=manager,dc=apigee,dc=com
createTimestamp: 20190716181053Z
entryCSN: 20190716181053.290392Z#000000#000#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190716181053Z
objectClass: top
objectClass: organizationalUnit
structuralObjectClass: organizationalUnit
ou: global
objectClass: glue
structuralObjectClass: glue

dn: ou=users,ou=global,dc=apigee,dc=com
entryUUID: cb3e43f2-3c40-1039-90ff-5f32d4271a03
creatorsName: cn=manager,dc=apigee,dc=com
createTimestamp: 20190716181053Z
entryCSN: 20190716181053.293634Z#000000#000#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190716181053Z
objectClass: top
objectClass: organizationalUnit
structuralObjectClass: organizationalUnit
ou: users
objectClass: glue
structuralObjectClass: glue

dn: ou=userroles,ou=global,dc=apigee,dc=com
entryUUID: cb3e6422-3c40-1039-9100-5f32d4271a03
creatorsName: cn=manager,dc=apigee,dc=com
createTimestamp: 20190716181053Z
entryCSN: 20190716181053.294458Z#000000#000#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190716181053Z
objectClass: top
objectClass: organizationalUnit
structuralObjectClass: organizationalUnit
ou: userroles
objectClass: glue
structuralObjectClass: glue

Save the file as /tmp/glue_fixed.ldif and go to the next step

2 - Shutdown any other replicated OpenLDAP instance.

To prevent another instance from syncing with then one you are working on, stop the service on remotes nodes. If you aren't sure if it's replicating anywhere, you can search the ldap config for 'olcSyncrepl'

grep -ir 'olcSyncrepl:' /opt/apigee/data/apigee-openldap/slapd.d/*
/opt/apigee/data/apigee-openldap/slapd.d/cn=config/olcDatabase={2}bdb.ldif:olcSyncrepl: {0}rid=001 provider=ldap://<remote_sync_ip>:10389/ binddn="cn=manage
3 - Remove the existing data and load new data.

Note: make sure you've backed everything up and that apigee-openldap is stopped.

  1. Change to your ldap data directory, which is probably: /opt/apigee/data/apigee-openldap/ldap
  2. Take a backup of the files in the directory, once you've done that, remove everything but DB_CONFIG.
  3. Now load the fixed ldif file using slapadd.
    slapadd -F /opt/apigee/data/apigee-openldap/slapd.d/ -l /tmp/glue_fixed.ldif
    	
  4. Ensure that the new files created by the slapadd command are owned by the proper user, in our case I had to change ownership to the 'apigee' user and group. If permissions are not correct, apigee-openldap may fail to start.
    chown apigee:apigee *
    	

4 - Start the services to validate changes

At this point you can try starting up the apigee-openldap and edge-management-server services. Both should work at this point. If they do not, check the edge-management-server logs and see if the error has changed.

5 - Cleanup and start any replicated openLDAP instances.

If you are replicating to a remote instance, we need to clean it up before starting it back up. As always, backup all data before making changes. Follow steps 3-1 and 3-2 to give the node a clean DB and start the service.

Do not load the ldif file. Replication will handle pushing data.

6 - Test functionality

At this point our environment was backup and appeared healthy, we testing logging into the edge-ui to make sure our user accounts worked as expected.


Miscellaneous notes

I picked up a few useful things along the way, particularly enabling openLDAP verbose debugging. You can turn it on by modifying this file:

/opt/apigeeapigee-openldap/lib/settings

Change the last digit of this line to '-1' and restart apigee-openldap. See this page about openldap debug levels: https://www.openldap.org/doc/admin24/slapdconfig.html

Note: This setting seems to revert after starting the server, so if the service is restarted, the setting will revert back to the default.

EXTRA_ARGS="-h  ldap://:$LDAP_PORT/ -F $APIGEE_APP_DATADIR/slapd.d/ -u $RUN_USER -d 64"

This gave me some insights as to what the edge-management-server was trying to do when it throws the error:

'Got a life cycle exception while starting service [SecurityService, Unable to add initialize users and resources'.

In the debug logs we see it try to add the userroles ou and fail because it already exists. I'm guessing it doesn't recognize the object as being what it wants because it has object type(s) of "glue".

Debug Logs:

5f396615 >>> dnPrettyNormal: 
=> ldap_bv2dn(ou=userroles,ou=global,dc=apigee,dc=com,0)
<= ldap_bv2dn(ou=userroles,ou=global,dc=apigee,dc=com)=0 
=> ldap_dn2bv(272)
<= ldap_dn2bv(ou=userroles,ou=global,dc=apigee,dc=com)=0 
=> ldap_dn2bv(272)
<= ldap_dn2bv(ou=userroles,ou=global,dc=apigee,dc=com)=0 
5f396615 <<< dnPrettyNormal: , 
5f396615 conn=1000 op=3 ADD dn="ou=userroles,ou=global,dc=apigee,dc=com"
5f396615 ==> bdb_add: ou=userroles,ou=global,dc=apigee,dc=com
5f396615 oc_check_required entry (ou=userroles,ou=global,dc=apigee,dc=com), objectClass "organizationalUnit"
5f396615 oc_check_allowed type "ou"
5f396615 oc_check_allowed type "objectClass"
5f396615 oc_check_allowed type "structuralObjectClass"
5f396615 slap_queue_csn: queueing 0x7fce34105e00 20200816170005.270911Z#000000#001#000000
5f396615 bdb_add: txn1 id: 800001d3
5f396615 bdb_dn2entry("ou=userroles,ou=global,dc=apigee,dc=com")
5f396615 => bdb_dn2id("ou=userroles,ou=global,dc=apigee,dc=com")
5f396615 <= bdb_dn2id: got id=0x4
5f396615 entry_decode: "ou=userroles,ou=global,dc=apigee,dc=com"
5f396615 <= entry_decode(ou=userroles,ou=global,dc=apigee,dc=com)
5f396615 send_ldap_result: conn=1000 op=3 p=3
5f396615 send_ldap_result: err=68 matched="" text=""
5f396615 send_ldap_response: msgid=4 tag=105 err=68
ber_flush2: 14 bytes to sd 16
  0000:  30 0c 02 01 04 69 07 0a  01 44 04 00 04 00         0....i...D....    
ldap_write: want=14, written=14
  0000:  30 0c 02 01 04 69 07 0a  01 44 04 00 04 00         0....i...D....    
5f396615 conn=1000 op=3 RESULT tag=105 err=68 text=

The lines that stand out are:


5f396615 send_ldap_result: err=68 matched="" text=""
5f396615 conn=1000 op=3 RESULT tag=105 err=68 text=
	

According to this page (https://ldapwiki.com/wiki/LDAP%20Result%20Codes), ldap error 68 translates to:

"LDAP_ALREADY_EXISTS - Indicates that the add operation attempted to add an entry that already exists, or that the modify operation attempted to rename an entry to the name of an entry that already exists."

@henrytyler -  Thanks for detailed post.  This fix was used in 4.51.    Unsure how/who created, as we only restarted services during kernel update.