Loading Now

AEM Link Checker & Transformer

aem-link-checker-transformer

AEM Link Checker & Transformer

Broken or incorrect links directly impact user experience, SEO, and content quality. AEM Link Checker is a built‑in capability that helps authors and developers automatically validate and manage links authored on pages and transform them if needed.

Link Checker works in an event‑based manner—whenever content is created or updated under /content, link validation is triggered and the results are stored under /var/linkchecker.


What Is AEM Link Checker?

AEM Link Checker is responsible for:

  • Validating internal and external links authored on pages
  • Providing a centralized view of all external links
  • Marking broken links on Author and handling them safely on Publish
  • Supporting link rewriting / transformation through Sling Rewriter

How Link Checker Works (High‑Level Flow)

  1. An author creates or modifies a page under /content
  2. AEM triggers the Link Checker event listener
  3. Links are extracted and stored under /var/linkchecker
  4. Validation status moves from pending → valid / invalid
  5. Results are reflected in Author UI and Link Checker console

Internal vs External Links

Internal Links

  • Internal links are page or asset URLs that belong to the same AEM instance, typically starting with /content (for example: /content/<project>/us/en/home.html).
  • They are validated immediately when links are added or updated on a page.

External Links

  • External links point outside the AEM domain (for example: https://www.google.com).
  • Validation includes syntax checks and availability checks (subject to configuration and scheduler frequency).

Author vs Publish Behavior

Broken Links on Author

Both internal and external broken links are visually highlighted (commonly red) in the authoring interface for quick remediation.

Broken Links on Publish

Broken links are rendered as plain text (the anchor is effectively removed) to prevent users from following invalid targets.


Link Checker User Interfaces

1. Link Checker Console

Provides a complete overview of all links authored across pages. http://localhost:4502/etc/linkchecker.html

Use this to:

  • Review link status
  • Identify broken or pending links
  • Track validation results

2. JSON Endpoint

Returns all validated links programmatically. http://localhost:4502/var/linkchecker.list.json

Useful for:

  • Automation
  • Reporting
  • Debugging

Key OSGi Configurations

Day CQ Link Checker Service

Enables validation of (primarily) external links and manages the lifecycle that moves items from pending to valid/invalid.

  • Scheduler Period: Interval between validation cycles.
  • Link Check Override Patterns: Regex patterns to skip validation for matching URLs. For example, adding ^http://www.google.com will exclude that host from checking.
  • Special Link Prefixes: Prefixes that specify a special link that is not checked or rewritten at all.
  • Special Link Patterns: Patterns that specify a special link that is not checked or rewritten at all.

After the scheduled task runs, status updates are reflected in the Link Checker UI: http://localhost:4502/etc/linkchecker.html

Day CQ Link Checker Transformer

Responsible for rewriting/transforming links during rendering.

  • Skip Validate Href: Skip link validation for Link Checker Transformer.
  • Disable Rewriting: Stop URL rewriting/transformations while keeping checks.
  • Disable Checking: Turn off link checking entirely on a given environment.
  • Strip HTML Extension: If checked, all links with a .html or .htm extension are rewritten and their extension is removed.
  • Rewrite Elements: List of tag:attribute pairs that should be transformed (e.g., a:href, img:src).
  • Blacklisted paths: paths for which stripping of extension is NOT to be performed

Disabling Link Checker in Markup (Selective Bypass)

You can selectively bypass the checker for specific tags:

1. Mark as valid (skip checks and treat the link as valid):

<a href="https://ww.my-url.com" x-cq-linkchecker=”valid”>Home</a>

2. Skip completely (do not validate or transform this link):

<a href="https://ww.my-url.com" x-cq-linkchecker="skip">Home</a>

Link Transformation / Rewriter Implementation

Link transformation allows you to modify URLs at render time—e.g., rewrite image URLs to a CDN host or shorten paths by removing /content/<project>.

Configuration Steps

1. Create a rewriter configuration under your OSGi config so it is applied on both author and publish.

2. Ensure the configuration sets:

  • enabled = true
  • Project‑specific include paths
  • A unique transformerTypes value (e.g., my-rewriter). This name must match your custom transformer implementation for the factory mapping.
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:cq="http://www.day.com/jcr/cq/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    jcr:primaryType="sling:Folder"
    enabled="{Boolean}true"
    generatorType="htmlparser"
    transformerTypes="[my-rewriter]"
    serializerType="htmlwriter"
    order="100"
    contentTypes="[text/html]"
    paths="[/content/project1,/content/project2]">
    <generator-htmlparser
        jcr:primaryType="nt:unstructured"
        includeTags="[A,/A,IMG,/IMG]" />
</jcr:root>

You can use /libs/cq/config/rewriter/default as a reference template for your configuration.

3. Create Transformer- Example: Custom Transformer Skeleton

Create a custom class implementing org.apache.sling.rewriter.TransformerFactory and org.apache.sling.rewriter.Transformer. The transformer will be invoked for matching elements and attributes and can adjust URLs in startElement().

@Component(
        immediate = true,
        service = TransformerFactory.class,
        property = {
                "pipeline.type=my-rewriter"
        })
public class MyRewriter implements TransformerFactory {
    @Override
    public Transformer createTransformer() {
        final Transformer myRewriterTransformer = new MyRewriterTransformer
                (AttributesImpl::new);
        return myRewriterTransformer;
    }

}
public class MyRewriterTransformer implements Transformer {

    private ContentHandler contentHandler;

    private final Supplier<AttributesImpl> attributeSupplier;

    public MyRewriterTransformer(final Supplier<AttributesImpl> attributeSupplier) {
        this.attributeSupplier = attributeSupplier;
    }

    @Override
    public void setDocumentLocator(final Locator locator) {
        contentHandler.setDocumentLocator(locator);
    }

    @Override
    public void startDocument() throws SAXException {
        contentHandler.startDocument();
    }

    @Override
    public void endDocument() throws SAXException {
        contentHandler.endDocument();
    }

    @Override
    public void startPrefixMapping(final String prefix, final String uri) throws SAXException {
        contentHandler.startPrefixMapping(prefix, uri);
    }

    @Override
    public void endPrefixMapping(final String prefix) throws SAXException {
        contentHandler.endPrefixMapping(prefix);
    }

    @Override
    public void startElement(final String uri, final String localName, final String name,
                             final Attributes atts) throws SAXException {
        Attributes out = atts;
        if (atts.getIndex("href") > -1 && name.equalsIgnoreCase("a")) {
            final int hrefIndex = atts.getIndex("href");
            final String original = atts.getValue(hrefIndex);
            final String rewritten = // some rewriting logic here
            final AttributesImpl modified = attributeSupplier.get();
            modified.setAttributes(atts);
            modified.setValue(hrefIndex, rewritten);
            out = modified;
        }else if (atts.getIndex("src") > -1 && name.equalsIgnoreCase("img")) {
            final int srcIndex = atts.getIndex("src");
            final String original = atts.getValue(srcIndex);
            final String rewritten = // some rewriting logic here
            final AttributesImpl modified = attributeSupplier.get();
            modified.setAttributes(atts);
            modified.setValue(srcIndex, rewritten);
            out = modified;
        }
        contentHandler.startElement(uri, localName, name, out);
    }

    @Override
    public void endElement(final String uri, final String localName, final String name) throws SAXException {
        contentHandler.endElement(uri, localName, name);
    }

    @Override
    public void characters(final char[] ch, final int start, final int length) throws SAXException {
        contentHandler.characters(ch, start, length);
    }

    @Override
    public void ignorableWhitespace(final char[] ch, final int start, final int length) throws SAXException {
        contentHandler.ignorableWhitespace(ch, start, length);

    }

    @Override
    public void processingInstruction(final String target, final String data) throws SAXException {
        contentHandler.processingInstruction(target, data);

    }

    @Override
    public void skippedEntity(final String name) throws SAXException {
        contentHandler.skippedEntity(name);
    }

    @Override
    public void dispose() {
        //in case any resources need to be released, it can be done here
    }

    @Override
    public void init(final ProcessingContext processContext, final ProcessingComponentConfiguration
            processingComponentConfiguration)
            throws IOException {
        SlingHttpServletRequest slingRequest = processContext.getRequest();
        // in case any information is needed from the request, it can be obtained here
    }

    @Override
    public void setContentHandler(final ContentHandler contentHandler) {
        this.contentHandler = contentHandler;
    }
}

4. Verify Rewriter Registration

See all active Sling rewriters on the instance: http://localhost:4502/system/console/status-slingrewriter


Operational Guidance & Best Practices

  • Avoid enabling Link Checker at scale on very large, highly dynamic repositories. The event volume under /content can produce heavy processing and large /var/linkchecker trees.
  • Tune the scheduler period to balance freshness with system overhead.
  • Whitelist/override noisy hosts using Link Check Override Patterns to reduce false positives and unnecessary calls.
  • Use selective bypass (x-cq-linkchecker=”valid” or “skip”) for known‑good, ephemeral, or dynamically generated links.
  • Separate environments: Consider disabling checks or rewriting in non‑critical environments, or vice‑versa, depending on your QA workflow.
  • Instrument and monitor: Keep an eye on repository growth in /var/linkchecker and request logs if availability checks are enabled.

When to Prefer Custom Rewriting Over Link Checker

  • You need complex, deterministic URL rules (e.g., path shortening, locale‑aware rewrite, CDN routing) independent of link validity.
  • You want zero runtime validation overhead and rely on CI checks or external link‑checking tools.
  • You must guarantee consistent URLs across author/publish without waiting for scheduled validations.