Why a text diff is the wrong tool

The diff built into Git, your editor, or diff on the command line compares text line by line. That works for source code, where layout is meaningful, but it actively misleads for XML, because XML's textual form has a lot of freedom that doesn't change the data:

  • Whitespace and indentation between elements is insignificant, but a text diff flags every reindentation as a change.
  • Attribute order is not significant in XML, yet a text diff treats <a x="1" y="2"/> and <a y="2" x="1"/> as different.
  • <tag></tag> and <tag/> are the same empty element, but look different as text.
  • Single vs. double quotes around attribute values, or different but equivalent entity escapes, all read as edits.

The result is a diff full of noise: dozens of "changes" where the documents are in fact data-identical, drowning the one real difference you were looking for.

What a structure-aware diff does instead

A proper XML diff parses both documents into node trees first, then compares the trees rather than the text. Because it works on structure, it can normalize away everything that doesn't matter — indentation, attribute order, empty-element form — and report only genuine differences:

  • An element that exists on one side but not the other (added or removed).
  • An attribute that was added, removed, or whose value changed.
  • Text content that differs.
  • Child elements that were reordered (element order is significant, unlike attributes).

Consider these two documents. A text diff shows several changed lines; a structure-aware diff reports exactly one change — the price.

<book id="bk101"
      category="tech">
  <price>44.95</price>
</book>
<book category="tech" id="bk101">
    <price>49.95</price>
</book>

The reindentation and the swapped attribute order are correctly ignored; only 44.95 → 49.95 is a real difference.

A pragmatic pre-step: normalize, then diff

If you only have a text diff available, you can get much of the benefit by canonicalizing both files first, then diffing the results. Run each document through the same formatter with the same indent settings so layout is identical, and the text diff is left to surface only content changes. C14N (XML Canonicalization, the same normalization used for XML digital signatures) goes further — it also normalizes attribute order and namespace declarations — but a consistent pretty-print covers most everyday cases.

Diffing XML programmatically

In Python, the xmldiff library produces a structure-aware diff and can even emit an edit script describing how to turn one document into the other:

from xmldiff import main
diffs = main.diff_files("old.xml", "new.xml")
for change in diffs:
    print(change)

In Java, XMLUnit is the standard choice and lets you configure whether element order or whitespace should be considered. For .NET, the XmlDiff class in Microsoft's XML Diff and Patch toolkit serves the same role.

Compare two XML files online

Paste both documents and get a structure-aware comparison — added, removed, and changed nodes only, with whitespace and attribute order normalized. Runs in your browser.

Open XML Diff →

Frequently Asked Questions

Why not just use a normal text diff for XML?

A text diff compares lines, so reformatting, reindenting, or reordering attributes shows up as differences even though the data is identical. A structure-aware XML diff compares the parsed node trees, so it reports only real changes.

Is attribute order significant in XML?

No. The XML specification states that attribute order is not significant. A correct XML diff treats id="1" class="x" and class="x" id="1" as identical, whereas a text diff flags them as different.

Does element order matter when comparing XML?

Usually yes. Unlike attributes, the order of child elements is significant in XML and can carry meaning, so most diff tools treat reordered children as a change. Some schema-aware tools can mark a specific element's children as unordered, but treating element order as significant is the safe default.

How can I compare XML when I only have a text diff?

Canonicalize both files first — run each through the same formatter with the same indent settings, or apply C14N — then run the text diff on the normalized output. That removes layout and ordering noise so the diff surfaces only content changes.