Blog

Designing a Claude Workflow for CSRD Compliance

,

The European Union’s Corporate Sustainability Reporting Directive is, even after the 2026 rollback, the largest regulatory synthesis workload most sustainability teams will face this decade.

Twelve standards. A double materiality assessment. Cross-references against GRI, IFRS S1 and S2, and sector-specific guidance. Value-chain disclosures most companies cannot cleanly source. An XBRL tagging burden at the end.

For the companies still in scope under the amended Directive (EU) 2026/470, the question is no longer whether to use AI on the work. It is what shape of AI workflow actually fits the work, given what existing tools already cover and what they leave on the table.

The market has answers, of a sort. Vendor platforms (Watershed, Persefoni, Sphera, Normative, Microsoft Cloud for Sustainability) handle the data plumbing layer well: emissions calculation, factor libraries, ingestion at scale, XBRL output. Big Four advisory practices sell AI-enabled CSRD methodology, with the proprietary tools kept private. What neither category fully covers is the practitioner judgment layer where most of the engagement hours sit. Cross-referencing requirements against evidence. Framing impacts, risks, and opportunities. Drafting narratives that survive limited assurance. Mapping clause to clause across frameworks.

This article is the architecture I would build to sit in that gap. A multi-stage Claude Code workflow, with structured outputs between stages, an adversarial review step, and an audit trail built in. It is also the design for a Claude Code skill that would encode the workflow. Whether such a skill is the right form factor alongside the platforms is a live question. The article makes the case for why it has a place.

Key Takeaways

  • CSRD has shrunk but not disappeared. Directive (EU) 2026/470 raised in-scope thresholds to 1,000 employees and €450M turnover. For the companies still in scope, the synthesis workload is materially unchanged.
  • Vendor tools cover data plumbing. They do not cover the judgment layer. Cross-referencing, materiality framing, narrative drafting, and audit defence sit in the gap between platforms and practitioner.
  • A multi-stage Claude workflow with adversarial review survives audit. A single-shot prompt does not. The architecture matters more than the model choice.
  • A Claude Code skill could encode this workflow. Skill metadata at 100 tokens, full workflow at 5,000, sub-agents per stage, an audit trail by default.
  • The most dangerous failure mode is not fabricated standards. It is fabricated quotes from real sources, or real ESRS references with invented paragraph numbers. The adversarial review step exists specifically to catch this.

CSRD in Mid-2026: What Actually Applies

For anyone returning to the directive after the political turbulence of 2025, the picture in June 2026 is roughly this.

Omnibus I, formally adopted by the Council on 24 February 2026 and entered into force on 18 March 2026, materially narrowed CSRD scope. The new thresholds are over 1,000 employees and over €450 million net turnover.

Listed SMEs are now fully exempt. Wave 1 reporters that fall below the new thresholds can be exempted by Member States from reporting on financial year 2025 and 2026. The new scope applies to financial years starting on or after 1 January 2027.

For everyone else, including most large multinationals, the directive is alive and applies on roughly the same calendar.

Twelve standards make up ESRS Set 1. Two are cross-cutting: ESRS 1 General Requirements and ESRS 2 General Disclosures, the only standard whose disclosures are mandatory regardless of materiality. The other ten are topical. ESRS E1 through E5 on environmental matters (climate, pollution, water, biodiversity, circular economy). ESRS S1 through S4 on social matters. ESRS G1 on business conduct. Each topical standard is subject to the double materiality filter.

The simplification path is also in motion. EFRAG published draft Simplified ESRS in November 2025 that reduce mandatory data points by approximately 61% and allow a top-down materiality process. The European Commission’s public consultation on the draft delegated acts closed on 3 June 2026, with adoption expected shortly after, followed by a two-month Parliament and Council scrutiny period. Mandatory application from financial year 2027, voluntary early application for FY 2026.

The Omnibus rollback is real, but it does not solve the consultant’s problem. For companies still in scope, the workload per report has not meaningfully decreased. The simplifications reduce data points, not the volume of synthesis. The materiality methodology and audit trail standards have, if anything, become more important.

Where the Synthesis Bottleneck Actually Is

EFRAG’s own cost-benefit work on Wave 1 reporting identified three top cost drivers. Standards interpretation. Value-chain data collection. Materiality assessment.

Each of these is a synthesis problem at root. A practitioner with finite hours has to cross-reference an external corpus (ESRS standards plus Application Requirements paragraphs plus interoperability mappings to GRI and IFRS S1 and S2) against an internal corpus (existing policies, prior reports, board minutes, stakeholder maps, transition plans).

The output is two artefacts. A defensible materiality matrix. A complete sustainability statement.

A typical mid-sized CSRD engagement involves perhaps 500 to 1,000 hours of human cross-referencing work that no one wants to do. The cost of doing it badly is high (audit pushback, possible regulatory penalty, lost client trust). The cost of skipping it is higher.

AI is structurally the right tool for cross-referencing. Two corpora, mapped against each other, with structured outputs. That is exactly what large language models do well.

The temptation is to paste the standards into ChatGPT, paste the company’s prior report alongside, and ask for a draft. People are doing this right now. It produces text that looks reasonable, occasionally cites the wrong ESRS paragraph or invents one, and produces no audit trail.

The failure mode is invisible until the auditor or the regulator asks where a specific claim came from, and the practitioner has nothing.

The Workflow Architecture

The architecture that would survive limited assurance under CSRD is multi-stage, with each stage running as a sub-agent with its own isolated context window.

The full workflow looks roughly like this, drawn from the multi-stage methodology I have documented for circular economy research and adapted for CSRD’s specific shape.

Stage 1: Standards ingestion
ESRS Set 1, with topical standards parsed into a structured map of disclosure requirements and Application Requirements paragraphs. Interoperability mappings to GRI and IFRS S1 and S2 attached.
Output: a queryable reference layer that does not require the model to remember the standards verbatim.
Stage 2: Company evidence intake
Internal documentation (policies, prior sustainability reports, board minutes, materiality history, stakeholder maps, transition plans) ingested into a structured company-specific reference.
Output: a tagged evidence layer with provenance for every claim.
Stage 3: Cross-reference and gap detection
For each ESRS disclosure requirement that has passed the materiality filter, identify which company evidence supports it, which evidence is missing, and which evidence contradicts the proposed disclosure.
Output: a structured gap report with citations to both the ESRS clause and the underlying company document.
Stage 4: Double materiality sparring
The model proposes candidate impacts, risks, and opportunities (IROs) for the company, with reasoning, drawn from sector context, value-chain positioning, and stakeholder inputs. The practitioner challenges each one. The model is a sparring partner, not the source of truth.
Output: a documented materiality matrix with named rationale.
Stage 5: Disclosure narrative drafting
Per topical ESRS, drafts the disclosure narrative against the company evidence, in the format ESRS 2 expects (description, policies, actions, targets, metrics). Each claim links back to its source document.
Output: per-standard draft narratives.
Stage 6: Adversarial review
A second sub-agent argues against the draft. Specifically: which citations might be fabricated, which company claims are not supported by the underlying evidence, which legal interpretations look confident but wrong, which numbers contradict the financial statement footnotes.
Output: a verification report flagging items for human resolution.

The key word is structured. Each stage feeds the next as structured data, not as conversational prose.

The output of stage 3 is a JSON-shaped gap report, not a long paragraph the next stage has to re-parse. This matters for two reasons.

First, the audit trail is built-in. Every claim points to its evidence source.

Second, the cost economics work. Each sub-agent runs in 5,000 to 15,000 tokens of focused context, not 100,000 of sprawling conversation. The architecture I documented in the energy article applies cleanly here.

What This Would Look Like as a Claude Code Skill

The workflow above is the design for a Claude Code skill, in the same shape as the consulting skills already public on the open-source skills hub. A CSRD skill would have six components.

Skill metadata (~100 tokens)

Loaded on every Claude session for instant recognition when a CSRD task starts. Describes the skill purpose, the trigger conditions, and the inputs it expects (the standards reference plus the company evidence pack).

Skill body (~5,000 tokens)

Loaded only when the skill becomes relevant. Contains the six-stage pipeline, the structured-output schemas between stages, the adversarial-review prompts, and the audit-trail format.

Sub-agent definitions

One per stage. Each sub-agent has its own isolated context window, its own prompt, its own output schema. The orchestrator passes structured artefacts between them.

Reference layer

The structured ESRS map, GRI and IFRS S1 / S2 interoperability tables, and templates for the company evidence intake. Versioned so the skill works with both ESRS Set 1 and Simplified ESRS as the transition unfolds.

Audit trail format

Every claim in the output narrative links to two anchors. The relevant ESRS clause. The company document that supports the claim. The trail is the deliverable, not an afterthought.

Adversarial review prompts

The most important and easiest-to-overlook part. Specific prompts that ask a second Claude pass to find fabricated citations, unsupported claims, and inconsistencies with the financial statements. Verified against primary sources.

The architecture follows the same pattern as the five consulting skills already on the hub, which means it is buildable today against a real engagement.

Where AI Genuinely Helps in CSRD Work

From the academic literature on AI in external audits, EFRAG’s own implementation guidance, and the methodology several major firms publish openly, the places where AI augmentation delivers real value are reasonably consistent.

  • Standards synthesis against company evidence. LLMs are unusually well-suited to comparing a long, structured corpus against a heterogeneous evidence pile. The mapping is the work. Humans are slow at it. Models are fast and consistent.
  • Materiality as a sparring partner. EFRAG’s Implementation Guidance 1 explicitly contemplates a top-down approach starting from business model, sector, and value chain. That is exactly the kind of structured reasoning where a model can propose candidate IROs and a human challenges them.
  • Drafting individual disclosure narratives. The disclosure structure under each topical ESRS (description, policies, actions, targets, metrics) is a templating problem. First-pass drafts against company evidence are a defensible task.
  • Gap detection. Cross-referencing each disclosure requirement against what the company has documented surfaces missing evidence early in the engagement, not at the auditor’s first review.
  • Cross-framework reconciliation. The EFRAG-IFRS interoperability guidance published in May 2024 maps ESRS data points against IFRS S1 and S2. Mapping a company’s existing GRI or ISSB disclosures to ESRS, paragraph by paragraph, is finite, structured, and AI-tractable.

None of this replaces a sustainability practitioner. It replaces the most tedious half of their work, which is exactly the right half to replace.

Where AI Fails for CSRD Work

The case for honest limits is short and sharp. In 2025, two separate government assurance reports were caught using AI-generated content that included fabricated academic citations, non-existent legal references, and quotes attributed to people who had never written them. The contracts were six and seven figures. The reports went through internal review. The fabrication was found by external readers, not the firms themselves.

The reason these matter for CSRD work is not that consultancies will produce 237-page reports with invented case studies. The reason is that the failure mode generalises.

A practitioner under deadline pressure asks Claude to draft an ESRS E1 climate disclosure. Claude produces a coherent paragraph that confidently cites “ESRS E1 paragraph 23(c)” when the actual paragraph 23(c) says something different, or does not exist. Or it quotes the EU Taxonomy in a way that almost matches the real text but inverts the meaning. The output looks identical to a well-researched paragraph. The audit trail is the only difference.

Three failure categories show up repeatedly across documented AI-fabrication cases. Citations to fictitious cases or standards. Fabricated citations to real cases or standards. And real quotes from real sources that do not support, or contradict, the proposition. The third category is the most dangerous, because it survives a casual look. Only an adversarial review with verification against primary sources catches it.

Under CSRD’s limited assurance regime, where ESMA’s 2025 European Common Enforcement Priorities single out materiality assessment quality and disclosure scope as supervisory focus areas, that difference is everything.

The Adversarial Review Step Is the Safeguard

The adversarial review stage in the workflow is not optional. It is the safeguard that distinguishes a CSRD workflow from a content generator that happens to handle sustainability vocabulary.

In practical terms, it means a second Claude pass with a specific brief. The prompts that work, drawn from the methodology for adversarial review on grant proposals and circular economy policy work, run roughly along these lines.

  • “List every citation in this draft. For each one, state whether it is verifiable against a primary source you can produce, or whether it is plausible but unverified.” The fabrication hunt. The pass will surface citations that look right but cannot be anchored.
  • “For each claim about the company in this draft, identify the underlying evidence document that supports it. Flag any claim where you cannot point to a specific source.” The unsupported-claim hunt. Catches text that drifts from the evidence pile into invention.
  • “Identify the three most likely interpretive errors in this draft. Where is the language confident but wrong on legal or regulatory grounds?” The confidence-error hunt. Works because the model is much better at identifying problems in other text than at avoiding them in its own generation.
  • “Cross-check the metrics in this draft against the financial statement footnotes attached. Flag any contradiction.” The connectivity check. ESMA cares deeply about connectivity between sustainability and financial statements. Auditors do too.

The adversarial review will not catch everything. A 2025 study in the International Journal of Accounting Information Systems found that auditor oversight cannot be removed from AI-assisted work.

The combination of adversarial review plus human verification on the flagged items catches the vast majority of fabrication. The remainder is what auditors and ESMA are paid to find.

A Practical Methodology Checklist

Five principles any team using AI on CSRD work should be able to tick before going live.

  1. Structured outputs between stages. If your AI workflow passes free-flowing prose between steps, you have no audit trail. Each stage should produce JSON-shaped artefacts where every claim has a citation anchor.
  2. An evidence layer separate from the model. Company policies, prior reports, and stakeholder inputs should sit in a structured reference the model queries, not in the prompt context. The model should not be expected to remember the company’s data verbatim.
  3. Adversarial review on every output. Not optional. A second pass with the specific prompts above, on every disclosure narrative before human review.
  4. Primary source verification on every regulatory citation. Every ESRS clause, every GRI standard, every legal reference goes against the primary source. The model is allowed to draft the citation. The human or a second model pass verifies it.
  5. An audit trail that links every claim to two anchors. The applicable ESRS or GRI clause, and the company document that supports the claim. If you cannot produce this trail for a given paragraph, that paragraph does not go to the auditor.

If your team cannot tick all five, the AI workflow is not ready for CSRD. That is not a criticism. It is the floor of what survives limited assurance.

What This Changes for Sustainability Consulting

The market for CSRD consulting in mid-2026 splits into rough categories. The Big Four are scaling by spending heavily on AI platforms and packaging them for enterprise clients. Specialist software vendors like Watershed, Persefoni, Sphera, and Normative automate the data plumbing layer well but do not replace the practitioner judgment layer where materiality decisions, narrative drafting, and audit defence actually live.

Smaller practices working in the field have, broadly, two options. They can keep doing the synthesis manually and lose ground to firms that have restructured. Or they can build the workflow architecture above, with the safeguards, and deliver work that is faster, cheaper, and more defensible than the firm down the road that pasted the standards into ChatGPT.

I wrote in the AI-augmented consultant piece about the operational consulting shape this implies. The principle applies cleanly to CSRD.

The valuable hours in a CSRD engagement are the stakeholder interviews, the material-topic judgments, the audit defence conversations, the climate transition plan negotiations with the board. The screen-bound hours are the cross-referencing, the gap detection, the templating, and the consistency checking.

AI handles the screen hours. The practitioner stays where the value is.

Want a Skill Like This?

The architecture above is buildable today. If your team is running CSRD work in 2026, in scope under the new thresholds, and would benefit from a skill shaped to your specific engagement and evidence pack, that is the conversation to start.

The skill would include the adversarial review prompts, the audit trail format, the standards reference layer, and the sub-agent definitions adapted to your workflow. The deal sits inside the existing AI for Sustainability Research consulting practice.

For consultancies considering the same path independently, the architecture above is enough to start. Everything in this article is replicable. If you want a head start with the specific prompts and sub-agent definitions, get in touch.

Frequently Asked Questions

Can I just use ChatGPT for CSRD work instead?

Technically yes, in the same sense that you can edit a video on a phone. ChatGPT is a single-pass conversational model without structured outputs between stages, no separate evidence layer, no built-in adversarial review, and no audit trail. For low-stakes drafting it is fine. For audit-bound CSRD work it is the configuration that produced the documented AI-fabrication failures of 2025. The model is not the issue. The architecture around the model is.

Why a Claude Code skill rather than an existing vendor platform?

Different layers of the same stack, not competitors. Vendor platforms (Watershed, Persefoni, Sphera, Normative, MS Cloud for Sustainability) handle data plumbing well: emissions calculation, factor libraries, XBRL output, ingestion at scale. They do not replace the practitioner judgment layer where materiality decisions, disclosure narratives, evidence-to-clause mapping, and audit defence live. A practitioner-shaped Claude skill sits alongside the platforms, not against them. Use both.

Does this workflow work for SMEs not in CSRD scope?

The architecture is general. The materiality assessment, evidence cross-referencing, and disclosure drafting pattern is the same for VSME (the voluntary standard for SMEs), GRI reporting, IFRS S1 and S2 disclosures, and many private supplier assurance regimes. The reference layer changes. The pipeline structure does not.

What about confidential client data?

The same considerations apply to a Claude workflow as to any cloud-based service. Anthropic publishes data handling and retention practices in detail. For genuinely sensitive engagements, the workflow can be run against on-premise or VPC-deployed models. The architecture is model-agnostic in practice.

How does this scale across a consultancy?

The skill metadata sits in shared infrastructure (a Git repository, typically). Each consultant runs it locally against their own engagement context. The methodology travels. The evidence pack does not. This is structurally the same as how engineering teams share libraries while each team works on their own application code.

Related reading