AEM Cluster‑aware event handling:TopologyEventListener and jobs

Why “cluster‑aware” matters in AEM (especially Cloud Service)

On AEM as a Cloud Service, code always runs in a cluster, instances are ephemeral, and leader election can be in progress when events fire. Your background work must tolerate restarts and resume reliably. If you need a single “primary” to act, use Apache Sling Discovery to identify the leader. [Reference link]

Apache Sling Discovery models your deployment’s topology (instances ↔ clusters). Key types:

TopologyView: snapshot of the current topology; exposes getLocalInstance() and isCurrent(). [sling.apache.org]
InstanceDescription: per‑instance descriptor with isLeader() and isLocal() flags. [sling.apache.org]
ClusterView: the set of instances with one stable leader at a time. [developer.adobe.com]

You obtain the current view via DiscoveryService#getTopology(). [sling.apache.org]

The legacy Granite/JCR ClusterAware interface is deprecated—replace it with TopologyEventListener. [developer.adobe.com]

Quick recap: event options you covered previously

OSGi EventHandler (Sling/OSGi topics): Use for system‑level signals (replication, jobs, resource events). [stacknowledge.in]
JCR EventListener (repository observation): Low‑level, heavy; not cluster‑safe by default; often replaced by Sling resource observation. [stacknowledge.in]
Sling ResourceChangeListener: Higher‑level, modern listener for resource changes (preferred over JCR observation in clustered setups). [stacknowledge.in]
Sling Jobs (JobManager + JobConsumer): Reliable “at least once” background processing, persisted under /var/eventing/jobs and distributed across the topology. [stacknowledge.in]

`TopologyEventListener`: reacting to cluster changes

Implement TopologyEventListener to be notified when the topology changes (initialization, changing, changed, properties changed). Keep handlers fast and non‑blocking. [sling.apache.org], [apache.goo…source.com]

Minimal example

package com.example.core.cluster;

import org.apache.sling.discovery.TopologyEvent;
import org.apache.sling.discovery.TopologyEventListener;
import org.osgi.service.component.annotations.Component;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Component(service = TopologyEventListener.class, immediate = true)
public class ClusterTopologyObserver implements TopologyEventListener {
  private static final Logger LOG = LoggerFactory.getLogger(ClusterTopologyObserver.class);

  @Override
  public void handleTopologyEvent(final TopologyEvent event) {
    switch (event.getType()) {
      case TOPOLOGY_INIT:
        LOG.info("Topology initialized: {}", event);
        // Warm up caches, start leader-only schedulers (see below)
        break;
      case TOPOLOGY_CHANGING:
        LOG.info("Topology changing: {}", event);
        // Pause work that assumes a stable leader
        break;
      case TOPOLOGY_CHANGED:
        LOG.info("Topology changed: {}", event);
        // Re-check leader; reconfigure queues/schedulers
        break;
      case PROPERTIES_CHANGED:
        LOG.info("Topology properties changed: {}", event);
        break;
      default:
        LOG.debug("Unhandled topology event: {}", event);
    }
  }
}

The event types above are defined by the Discovery API; the listener must return quickly and avoid locks.

Leader‑only execution: gate work with Discovery API

When only one node should perform a task (e.g., a job scheduler, deduped producer), gate it with isLeader() from the current topology view:

package com.example.core.cluster;

import org.apache.sling.discovery.DiscoveryService;
import org.apache.sling.discovery.TopologyView;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Component(service = LeaderGate.class)
public class LeaderGate {
  private static final Logger LOG = LoggerFactory.getLogger(LeaderGate.class);

  @Reference
  private DiscoveryService discoveryService; // Obtain current topology view

  public boolean isLeaderNow() {
    // TopologyView is synchronized with TopologyEventListener callbacks
    final TopologyView view = discoveryService.getTopology();
    boolean leader = view.getLocalInstance().isLeader();
    LOG.debug("Leader check: isLeader={} (isCurrent={})", leader, view.isCurrent());
    return leader;
  }
}

DiscoveryService#getTopology() returns the last discovered view (never null), blocking if a topology event is being delivered to keep state consistent. [sling.apache.org]
TopologyView#getLocalInstance().isLeader() tells you if the local instance is the cluster leader. [sling.apache.org], [sling.apache.org]

In AEM as a Cloud Service, designing for leader‑only work aligns with the platform’s guidance to make code cluster‑aware and resilient. [experience….adobe.com]

Reliable background work with Sling Jobs (topology‑aware by design)

Sling Jobs provide persistent, at‑least‑once processing, with storage under /var/eventing/jobs and assignment per instance subtree to avoid concurrent processing across nodes. Only one instance processes a given job at a time; if a consumer fails, the framework retries.

Producer & consumer

// Producer
package com.example.core.jobs;

import java.util.HashMap;
import java.util.Map;
import org.apache.sling.event.jobs.JobManager;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;

@Component(immediate = true)
public class ReportJobCreator {

  public static final String TOPIC = "com/example/jobs/report";

  @Reference
  private JobManager jobManager;

  public void enqueueReport(String reportId) {
    Map<String, Object> props = new HashMap<>();
    props.put("reportId", reportId);
    jobManager.addJob(TOPIC, props);
  }
}

// Consumer
package com.example.core.jobs;

import org.apache.sling.event.jobs.Job;
import org.apache.sling.event.jobs.consumer.JobConsumer;
import org.osgi.service.component.annotations.Component;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Component(
  service = JobConsumer.class,
  property = { JobConsumer.PROPERTY_TOPICS + "=" + ReportJobCreator.TOPIC },
  immediate = true
)
public class ReportJobConsumer implements JobConsumer {
  private static final Logger LOG = LoggerFactory.getLogger(ReportJobConsumer.class);

  @Override
  public JobResult process(final Job job) {
    try {
      String id = (String) job.getProperty("reportId");
      // Do the work (idempotent!)
      LOG.info("Generating report {}", id);
      return JobResult.OK;
    } catch (Exception e) {
      LOG.error("Job failed", e);
      return JobResult.FAILED; // Will be retried
    }
  }
}

At least once execution; design the consumer to be idempotent.
Jobs are persisted and assigned under topology‑aware subtrees (e.g., /var/eventing/jobs/assigned/...), which prevents two instances from processing the same job.
Use JobManager APIs (addJob, createJob) and queue configs to tune concurrency/order.

For schedulers that should fire once per cluster, set scheduler.runOn=LEADER, a recommended pattern in clustered AEM.

Combining OSGi `EventHandler` with cluster awareness

Replication and distribution events often arrive on every pod. If your handler must act exactly once (e.g., notify an external system), use either:

Leader gating inside the handler:

@Component(service = EventHandler.class, immediate = true,
  property = { EventConstants.EVENT_TOPIC + "=" + com.day.cq.replication.ReplicationAction.EVENT_TOPIC })
public class ReplicationEventHandler implements EventHandler {
  @Reference private LeaderGate leaderGate;

  @Override
  public void handleEvent(final Event event) {
    if (!leaderGate.isLeaderNow()) { return; }
    // Safe: execute once per topology
  }
}

2. Subscribe to distribution completion topics that reflect successful dissemination, reducing duplicates

Modern alternative to JCR observation: `ResourceChangeListener`

Prefer ResourceChangeListener for content changes; it avoids long‑lived sessions and better fits clustered AEM. For local‑only vs distributed events, choose the appropriate listener type and design to avoid duplicate work. [experience….adobe.com]

Patterns you can use

1) Leader‑only producers, distributed consumers

Only the leader enqueues jobs (e.g., a nightly batch).
Any node may consume, but the framework ensures one instance processes each job.

2) Topology‑change hooks for resilience

On TOPOLOGY_CHANGING, pause cache flushers or producers.
On TOPOLOGY_CHANGED, refresh leader state and resume.

3) Leader‑only schedulers

Configure Sling Commons Scheduler with scheduler.runOn=LEADER so a recurring task fires once per cluster.

4) Dedup in event handlers

Handlers can trigger twice in Cloud Service; guard with leader checks or switch to distribution completion topics.

Migration notes: from `ClusterAware` to Discovery/Topology events

If you have older code on com.day.cq.jcrclustersupport.ClusterAware, replace it with:

TopologyEventListener for reacting to cluster changes, and
DiscoveryService + isLeader() for leader selection.

Operational tips

Event visibility: OSGi events & job stats are accessible via Sling/AEM consoles; monitor job queues and /var/eventing/jobs.
Design for retries: Sling Jobs retry on failure; ensure idempotency and checkpointing.
Cloud Service guidelines: Persist state, avoid local filesystem, plan for instance replacement at any time.

Featured

The Loneliest Pattern: A Deep Dive into the Singleton

React: Fetch a Binary Image with Axios and Display It as Base64

Containerization: Deep Dive into Linux Cgroups and Namespaces

Sling Content Distribution (SCD): a practical guide for cloud & on‑prem

The Flagship Trap: Why Your New Phone Is Designed to Feel Old by This Time Next Year

File Uploads and Retrieval in Spring Boot

Handling Bearer Tokens in Axios for React Projects

Caching Strategies for Dispatcher & CDN in Dynamic AEM Environments

The Loneliest Pattern: A Deep Dive into the Singleton

React: Fetch a Binary Image with Axios and Display It as Base64

Containerization: Deep Dive into Linux Cgroups and Namespaces

Sling Content Distribution (SCD): a practical guide for cloud & on‑prem

The Flagship Trap: Why Your New Phone Is Designed to Feel Old by This Time Next Year

File Uploads and Retrieval in Spring Boot

Handling Bearer Tokens in Axios for React Projects

Caching Strategies for Dispatcher & CDN in Dynamic AEM Environments

Post Comment Cancel reply

You May Have Missed

Spring: Database connection with MySQL

Spring Security: Custom Authencation with OTP based login

Redux: Vanilla JavaScript

AEM: Content fragments with GraphQL

Containerization: Introduction

Redux: ReactJS via HOC & Hooks

AEM: OIDC based Custom Authentication Handler

AEM: Sightly Dialog Validation

Redux Saga: Middleware

Java: Project Lombok

Cluster-aware event handling in AEM: using TopologyEventListener and topology-aware jobs