This post describes a real project I worked on. Names, specific systems, and identifying details have been changed. The technical problems and approaches are real, but code snippets are simplified illustrations not production code.
The "Simple" Request
"Can you auto-fill this Excel template with data from the database?"
I looked at the file. Beautiful report template. Corporate colors. Professional layout. And shapes rounded rectangles with KPIs, summary boxes with dynamic values, decorative elements throughout.
"Sure," I said. "Should be straightforward."
It was not straightforward.
What I Didn't Know About Excel Files
An .xlsx file is a ZIP containing XML. Unzip it and you'll find:
xl/worksheets/sheet1.xmlthe cellsxl/sharedStrings.xmltext content (sometimes)xl/drawings/drawing1.xmlshapes and their text- Various
_relsfolders relationships between files
The problem: shapes and cells store text in completely different places.
When someone types {{total}} inside a rounded rectangle shape, that text isn't in the cells. It's buried in drawing XML with a different namespace entirely.
Attempt 1: openpyxl
openpyxl is the standard Python library for Excel. Obvious choice.
Result:
- Cell text: Replaced
- Shape text: Still showing
{{placeholder}}
openpyxl handles cells beautifully. It doesn't touch drawing content. The shapes were invisible to it.
Attempt 2: Patch the Drawing XML Manually
Fine. Use openpyxl for cells, then edit the drawing XML directly. Unzip, find-replace, rezip.
Result:
- Cell text: Replaced
- Shape text: Replaced
- Shapes: GONE. All shapes vanished.
What happened?
The Relationship Web
Excel files have internal references everywhere. The worksheet says "my drawings are in drawing1.xml." When openpyxl saved the file, it rewrote the relationship files but it didn't know about my manual edits. The link got severed.
The shapes were still in the ZIP. Excel just didn't know they existed anymore.
Attempt 3: Skip openpyxl, Pure XML
Don't use any Excel library. Open ZIP, modify XML directly, close ZIP.
New problem: Some cell text wasn't replacing.
The culprit: shared strings. Excel doesn't always store text in cells. Often it stores an index, and the actual text lives in sharedStrings.xml. My placeholder {{region}} wasn't in the cell it was at index 7 in the shared strings table.
I had to search multiple XML files.
Attempt 4: Proper XML Parsing with lxml
Parse the XML properly, find text nodes, replace content, serialize back.
Result:
- Shapes shifted position
- Some text cut off
- Formatting broken in random places
What now?
The XML Formatting Trap
When you parse XML and serialize it back, parsers "help" by:
- Normalizing whitespace
- Reordering attributes
- Adding XML declarations
- Reformatting indentation
The XML is semantically identical but byte-different.
Excel is picky. Some positioning depends on exact byte offsets. Some attributes have implicit dependencies. The "helpful" reformatting broke things in subtle ways.
I spent two days debugging why shapes moved 3 pixels left after processing.
The Breakthrough: Don't Parse At All
After a week of fighting XML parsers, I had a realization:
I don't need to understand the XML. I just need to swap bytes.
def fill_template(template_path, data, output_path):
# Read the ZIP
with zipfile.ZipFile(template_path, 'r') as zin:
files = {name: zin.read(name) for name in zin.namelist()}
# Build replacements as bytes
replacements = {
f'{{{{{key}}}}}'.encode('utf-8'): escape_xml(str(value)).encode('utf-8')
for key, value in data.items()
}
# Surgical replacement - no parsing
for filename in files:
if filename.endswith('.xml'):
content = files[filename]
for placeholder, value in replacements.items():
content = content.replace(placeholder, value)
files[filename] = content
# Write the ZIP - structure unchanged
with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as zout:
for name, content in files.items():
zout.writestr(name, content)That's it. The dumbest possible solution.
Cell text: Works Shape text: Works Shapes: Still there Formatting: Untouched Relationships: Never modified
By never parsing the XML, I never gave any tool the chance to "fix" it. The file structure stayed exactly as Excel created it, with only my placeholder bytes swapped out.
The Gotcha: XML Escaping
One more trap. Values with special characters need escaping:
def escape_xml(text):
return (text
.replace('&', '&')
.replace('<', '<')
.replace('>', '>')
.replace('"', '"')
.replace("'", '''))Without this, a value like Sales & Marketing becomes invalid XML and corrupts the file.
What I Learned the Hard Way
Excel files are fragile. They're not just "XML in a ZIP." They're a carefully balanced web where touching one thing breaks another.
Libraries can be too clever. openpyxl, lxml they try to help. For Excel files, that help is often destructive. Sometimes the dumb solution wins.
Test with the real application. The file might open in Python. Might even open in LibreOffice. Doesn't mean Excel accepts it.
Shapes are second-class citizens. Most Excel libraries focus on cells. If your template uses shapes, you're largely on your own.
When to Use This Approach
Do this when:
- Template has shapes, charts, or complex formatting
- You just need to swap placeholder text for values
- Preserving exact formatting is critical
Don't do this when:
- You need to add/remove rows dynamically
- You're generating Excel from scratch
- The template structure changes frequently
The Results
| Metric | Before | After |
|---|---|---|
| Report generation | 2-3 hours (manual) | 30 seconds |
| Formatting issues | Frequent | Zero |
| Shape corruption | N/A | Zero |
The finance team was happy. Their beautiful templates stayed beautiful.
Related Reading
- Turning a 4-Hour Report Into a Button Click - The system that uses these templates
- Scheduled Jobs That Actually Recover - Automating report generation
- When the Frontend Sends a Query as a String - Another "simple" problem that wasn't
