How We Built a Multi-Country Accounting Dataset in 12 Weeks

The Problem

A fast-growing accounting technology company approached us with a familiar challenge:

They had a growing library of financial worksheets and workpapers—but no consistent structure behind them.

  • Formats varied across regions

  • Logic was inconsistent between sheets

  • Some workflows worked in isolation, but broke when integrated

  • There was no clear path to turning this into structured data for product or AI use

They didn’t just need “cleaner spreadsheets.”

They needed a production-ready dataset that could support real-world accounting workflows across multiple countries.

The Objective

Build a standardized, scalable dataset across US and Canadian accounting workflows that could:

  • Support product development and feature expansion

  • Be reliably used in live environments (e.g. integrations with platforms like QuickBooks Online and Xero)

  • Serve as a foundation for future automation and AI initiatives

Our Approach

We structured the engagement into five phases, designed to move from ambiguity to production-ready outputs.

1. Discovery & Dataset Audit

We began by reviewing the client’s existing worksheet library and internal tools.

This included:

  • Mapping current worksheet structures

  • Identifying inconsistencies in logic and formatting

  • Conducting a competitor workpaper analysis to benchmark best practices

This phase ensured we weren’t just improving what existed—we were aligning to what should exist.

2. Worksheet Review & Functional Validation

Next, we reviewed all US and Canadian worksheets in detail.

Our focus:

  • Language clarity (are these usable by real accountants?)

  • Functional accuracy (do calculations and logic hold up?)

  • Workflow alignment (do these reflect real-world accounting processes?)

This is where most “data projects” fall down—they clean data without validating whether it actually works in practice.

3. Dataset Scoping & Standardization

Once validated, we defined the future-state dataset:

  • Which worksheets could be reused

  • Which required localization (US vs CA differences)

  • Which needed to be built from scratch

We also introduced consistent structure across:

  • Naming conventions

  • Data inputs/outputs

  • Calculation logic

4. Dataset Build & Localization

With the structure defined, we moved into build.

  • Localized worksheets for US and Canadian requirements

  • Built new worksheets where gaps existed

  • Standardized formats to ensure consistency across the dataset

This phase turned fragmented assets into a cohesive, structured dataset.

5. Testing & Workflow Simulation

Finally, we tested the dataset in real-world scenarios.

  • Simulated accounting workflows

  • Tested integrations with platforms like QuickBooks and Xero

  • Identified and resolved edge cases

The goal wasn’t just accuracy—it was confidence in production use.

The Outcome

Within ~12 weeks, the client had:

  • A standardized, multi-country dataset

  • Consistent worksheet structures across all use cases

  • Validated outputs tested against real workflows

  • A foundation ready for product development and AI training

Why This Matters

Most companies underestimate the gap between:

“We have data”
and
“We have usable, production-ready data”

Bridging that gap requires more than data cleaning.

It requires:

  • Accounting domain expertise

  • Structured dataset design

  • Real-world workflow validation

That’s the difference between a dataset that looks good—and one that actually works.

Final Thought

If you’re building software or AI for accountants, your product is only as strong as the data behind it.

And in accounting, data doesn’t just need to be clean—it needs to be correct, contextual, and tested in the real world.

Next
Next

The Biggest Problem With Accounting AI Isn’t the AI. It’s the Missing Context.