Synchronizer proxy bundle pulling issue

Hello there,

I'm facing issues with deploying new revisions of proxies to runtime. It's stuck at 0% and can run like this forever. I can undeploy current revision, but cannot deploy new and old ones.

It's an on-prem multi-region (two datacenters) setup with region-dedicated environments and environment groups - they are specific to one region only. Cassandra sync is enabled and it does not report any problems. No ApigeeIssues reported. App version 1.11.

Synchronizer access to management plane checked. Service account used by it also verified against required roles.

I have checked environment's synchronizer pod and there are some files gradually being saved to /application/var/tmp after I restart the pod. Plenty of filesystem space available on all nodes. Synchronizer logs repeatedly show block of three SEVERE entries for following classes.

  • com.apigee.hybrid.runtime.contract.sync.context.MasterArtifactDownloader
  • com.apigee.hybrid.runtime.contract.sync.context.ControlPlaneReplicationContext
  • com.apigee.hybrid.runtime.contract.sync.replicators.ControlPlaneToCassandraContractReplicatorImpl

 

apigee-synchronizer {"level":"SEVERE","thread":"Apigee-Timer-7","mdc":{"action":"SYNC","contextid":"2728","env":"@@REDACTED@@","org":"@@REDACTED@@"},"className":"com.apigee.hybrid.runtime.contract.sync.context.MasterArtifactDownloader","method":"download","severity":"SEVERE","message":"failed to download gs://apigee-@@REDACTED@@ from GCS to /application/var/tmp/artifact_8939489889449048897art. Failing the replication","formattedDate":"2024-04-10T14:39:22.619Z","logger":"MasterArtifactDownloader","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = runtime.contract.sync.DownloadError, message = Error downloading gs://apigee-@@REDACTED@@ cause : Connection reset, associated contexts = []}\n"}
apigee-synchronizer {"level":"SEVERE","thread":"Apigee-Timer-7","mdc":{"action":"SYNC","contextid":"2728","env":"@@REDACTED@@","org":"@@REDACTED@@"},"className":"com.apigee.hybrid.runtime.contract.sync.context.ControlPlaneReplicationContext","method":"download","severity":"SEVERE","message":"Error in downloading uri gs://apigee-@@REDACTED@@ to file /application/var/tmp/artifact_8939489889449048897art","formattedDate":"2024-04-10T14:39:22.619Z","logger":"MasterArtifactDownloader","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = runtime.contract.sync.DownloadError, message = Error downloading gs://apigee-@@REDACTED@@ cause : Connection reset, associated contexts = []}\n"}
apigee-synchronizer {"level":"SEVERE","thread":"Apigee-Timer-7","mdc":{"action":"SYNC","contextid":"2728","env":"@@REDACTED@@","org":"@@REDACTED@@"},"className":"com.apigee.hybrid.runtime.contract.sync.replicators.ControlPlaneToCassandraContractReplicatorImpl","method":"lambda$replicateContract$0","severity":"SEVERE","message":"error in downloading artifact gs://apigee-@@REDACTED@@","formattedDate":"2024-04-10T14:39:22.620Z","logger":"CONTRACT-REPLICATION","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = runtime.contract.sync.DownloadError, message = Error downloading gs://apigee-@@REDACTED@@ cause : Error downloading gs://apigee-@@REDACTED@@ cause : Connection reset, associated contexts = []}\n"}

 

I checked http connectivity to storage.googleapis.com from synchronizer pod using curl and  I get proper response. Telnet to storage.googleapis.com 443 ends in closed by foreign host. Don't know how to check gs:// connectivity directly from pod, tbh.

Might this be a firewall case with TCP being blocked? While I'm waiting for my network team to check that on their end I'm asking for any hints here in parallel. Much obliged.

1 0 42
0 REPLIES 0