How We Built a Multi-Country Accounting Dataset in 12 Weeks
The Problem
A fast-growing accounting technology company approached us with a familiar challenge:
They had a growing library of financial worksheets and workpapers—but no consistent structure behind them.
Formats varied across regions
Logic was inconsistent between sheets
Some workflows worked in isolation, but broke when integrated
There was no clear path to turning this into structured data for product or AI use
They didn’t just need “cleaner spreadsheets.”
They needed a production-ready dataset that could support real-world accounting workflows across multiple countries.
The Objective
Build a standardized, scalable dataset across US and Canadian accounting workflows that could:
Support product development and feature expansion
Be reliably used in live environments (e.g. integrations with platforms like QuickBooks Online and Xero)
Serve as a foundation for future automation and AI initiatives
Our Approach
We structured the engagement into five phases, designed to move from ambiguity to production-ready outputs.
1. Discovery & Dataset Audit
We began by reviewing the client’s existing worksheet library and internal tools.
This included:
Mapping current worksheet structures
Identifying inconsistencies in logic and formatting
Conducting a competitor workpaper analysis to benchmark best practices
This phase ensured we weren’t just improving what existed—we were aligning to what should exist.
2. Worksheet Review & Functional Validation
Next, we reviewed all US and Canadian worksheets in detail.
Our focus:
Language clarity (are these usable by real accountants?)
Functional accuracy (do calculations and logic hold up?)
Workflow alignment (do these reflect real-world accounting processes?)
This is where most “data projects” fall down—they clean data without validating whether it actually works in practice.
3. Dataset Scoping & Standardization
Once validated, we defined the future-state dataset:
Which worksheets could be reused
Which required localization (US vs CA differences)
Which needed to be built from scratch
We also introduced consistent structure across:
Naming conventions
Data inputs/outputs
Calculation logic
4. Dataset Build & Localization
With the structure defined, we moved into build.
Localized worksheets for US and Canadian requirements
Built new worksheets where gaps existed
Standardized formats to ensure consistency across the dataset
This phase turned fragmented assets into a cohesive, structured dataset.
5. Testing & Workflow Simulation
Finally, we tested the dataset in real-world scenarios.
Simulated accounting workflows
Tested integrations with platforms like QuickBooks and Xero
Identified and resolved edge cases
The goal wasn’t just accuracy—it was confidence in production use.
The Outcome
Within ~12 weeks, the client had:
A standardized, multi-country dataset
Consistent worksheet structures across all use cases
Validated outputs tested against real workflows
A foundation ready for product development and AI training
Why This Matters
Most companies underestimate the gap between:
“We have data”
and
“We have usable, production-ready data”
Bridging that gap requires more than data cleaning.
It requires:
Accounting domain expertise
Structured dataset design
Real-world workflow validation
That’s the difference between a dataset that looks good—and one that actually works.
Final Thought
If you’re building software or AI for accountants, your product is only as strong as the data behind it.
And in accounting, data doesn’t just need to be clean—it needs to be correct, contextual, and tested in the real world.