The XLSX From Hell: When Shapes Break Everything

This post describes a real project I worked on. Names, specific systems, and identifying details have been changed. The technical problems and approaches are real, but code snippets are simplified illustrations not production code.

The "Simple" Request

"Can you auto-fill this Excel template with data from the database?"

I looked at the file. Beautiful report template. Corporate colors. Professional layout. And shapes rounded rectangles with KPIs, summary boxes with dynamic values, decorative elements throughout.

"Sure," I said. "Should be straightforward."

It was not straightforward.

What I Didn't Know About Excel Files

An .xlsx file is a ZIP containing XML. Unzip it and you'll find:

xl/worksheets/sheet1.xml the cells
xl/sharedStrings.xml text content (sometimes)
xl/drawings/drawing1.xml shapes and their text
Various _rels folders relationships between files

The problem: shapes and cells store text in completely different places.

When someone types {{total}} inside a rounded rectangle shape, that text isn't in the cells. It's buried in drawing XML with a different namespace entirely.

Attempt 1: openpyxl

openpyxl is the standard Python library for Excel. Obvious choice.

Result:

Cell text: Replaced
Shape text: Still showing {{placeholder}}

openpyxl handles cells beautifully. It doesn't touch drawing content. The shapes were invisible to it.

Attempt 2: Patch the Drawing XML Manually

Fine. Use openpyxl for cells, then edit the drawing XML directly. Unzip, find-replace, rezip.

Result:

Cell text: Replaced
Shape text: Replaced
Shapes: GONE. All shapes vanished.

What happened?

The Relationship Web

Excel files have internal references everywhere. The worksheet says "my drawings are in drawing1.xml." When openpyxl saved the file, it rewrote the relationship files but it didn't know about my manual edits. The link got severed.

The shapes were still in the ZIP. Excel just didn't know they existed anymore.

Attempt 3: Skip openpyxl, Pure XML

Don't use any Excel library. Open ZIP, modify XML directly, close ZIP.

New problem: Some cell text wasn't replacing.

The culprit: shared strings. Excel doesn't always store text in cells. Often it stores an index, and the actual text lives in sharedStrings.xml. My placeholder {{region}} wasn't in the cell it was at index 7 in the shared strings table.

I had to search multiple XML files.

Attempt 4: Proper XML Parsing with lxml

Parse the XML with lxml, find text nodes, replace content, serialize back.

Result:

Shapes shifted position
Some text cut off
Formatting broken in random places

What now?

The XML Formatting Trap

When you parse XML and serialize it back, parsers "help" by:

Normalizing whitespace
Reordering attributes
Adding XML declarations
Reformatting indentation

The XML is semantically identical but byte-different.

Excel is picky. Some positioning depends on exact byte offsets. Some attributes have implicit dependencies. The "helpful" reformatting broke things in subtle ways.

I spent two days debugging why shapes moved 3 pixels left after processing.

The Breakthrough: Don't Parse At All

After a week of fighting XML parsers, I had a realization:

I don't need to understand the XML. I just need to swap bytes.

def fill_template(template_path, data, output_path):
    # Read the ZIP
    with zipfile.ZipFile(template_path, 'r') as zin:
        files = {name: zin.read(name) for name in zin.namelist()}
 
    # Build replacements as bytes
    replacements = {
        f'{{{{{key}}}}}'.encode('utf-8'): escape_xml(str(value)).encode('utf-8')
        for key, value in data.items()
    }
 
    # Surgical replacement - no parsing
    for filename in files:
        if filename.endswith('.xml'):
            content = files[filename]
            for placeholder, value in replacements.items():
                content = content.replace(placeholder, value)
            files[filename] = content
 
    # Write the ZIP - structure unchanged
    with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as zout:
        for name, content in files.items():
            zout.writestr(name, content)

That's it. The dumbest possible solution.

Cell text: Works Shape text: Works Shapes: Still there Formatting: Untouched Relationships: Never modified

By never parsing the XML, I never gave any tool the chance to "fix" it. The file structure stayed exactly as Excel created it, with only my placeholder bytes swapped out.

The Gotcha: XML Escaping

One more trap. Values with special characters need escaping:

def escape_xml(text):
    return (text
        .replace('&', '&amp;')
        .replace('<', '&lt;')
        .replace('>', '&gt;')
        .replace('"', '&quot;')
        .replace("'", '&apos;'))

Without this, a value like Sales & Marketing becomes invalid XML and corrupts the file.

What I Learned the Hard Way

Excel files are fragile. They're not just "XML in a ZIP." They're a carefully balanced web where touching one thing breaks another.

Libraries can be too clever. openpyxl, lxml they try to help. For Excel files, that help is often destructive. Sometimes the dumb solution wins.

Test with the real application. The file might open in Python. Might even open in LibreOffice. Doesn't mean Excel accepts it.

Shapes are second-class citizens. Most Excel libraries focus on cells. If your template uses shapes, you're largely on your own.

When to Use This Approach

Do this when:

Template has shapes, charts, or complex formatting
You just need to swap placeholder text for values
Preserving exact formatting is critical

Don't do this when:

You need to add/remove rows dynamically
You're generating Excel from scratch
The template structure changes frequently

The Results

Metric	Before	After
Report generation	2-3 hours (manual)	30 seconds
Formatting issues	Frequent	Zero
Shape corruption	N/A	Zero

The finance team was happy. Their beautiful templates stayed beautiful.

Turning a 4-Hour Report Into a Button Click - The system that uses these templates
Scheduled Jobs That Actually Recover - Automating report generation
When the Frontend Sends a Query as a String - Another "simple" problem that wasn't

The "Simple" Request

What I Didn't Know About Excel Files

Attempt 1: openpyxl

Attempt 2: Patch the Drawing XML Manually

The Relationship Web

Attempt 3: Skip openpyxl, Pure XML

Attempt 4: Proper XML Parsing with lxml

The XML Formatting Trap

The Breakthrough: Don't Parse At All

The Gotcha: XML Escaping

What I Learned the Hard Way

When to Use This Approach

The Results

Related Reading

Aamir Shahzad