This iROAS Research Got Pushback. The Authors Respond.

The only thing worse than putting research out into the world and getting pushback is putting research out and hearing crickets.

Last month, I covered new research from Albertsons Media Collective, Ovative Group, and Northwestern's Kellogg School of Management for my column at The Drum. The headline finding: across 42 real campaigns, iROAS varied by an average of 6.5x depending solely on how the measurement was done. In 83% of those campaigns, the result could flip from positive to negative based on methodology alone.

The piece hit a nerve. Over 13,000 impressions on LinkedIn, and a comment thread that turned into a proper methodological debate — the kind you rarely see in retail media, where most discourse stays politely surface-level.

So I reached out to the research team and asked if they'd respond on the record. They said yes right away. That willingness to engage — not retreat — is worth noting on its own.

What the industry said

Three threads stood out from the response.

Prof. Koen Pauwels, Distinguished Professor of Marketing at Northeastern University and a former Principal Research Scientist at Amazon, raised a methodological scope question. He pointed out that the three approaches compared in the study are quite different from one another — not the minor tweaks the headline might suggest — and asked why aggregate approaches like marketing mix modeling and geo-experiments weren't included. It's a fair challenge, and one the authors chose not to address directly (more on that below). (Pauwels writes the excellent Amazon Days Substack, worth subscribing to.)

Venkat Raman, Co-Founder and CEO of Aryma Labs, went further. He argued that BSTS (Bayesian structural time series) isn't a causal method at all — that it's a forecasting algorithm being misapplied, and that the study's conclusions are built on shaky foundations. His comment drew the most engagement in the thread, including a respectful rebuttal from Moody Khan, VP of RMN Measurement Strategy at Circana, who cited peer-reviewed literature supporting BSTS for causal inference. I'm not going to adjudicate that debate. But the fact that it's happening publicly, with named practitioners staking out positions, is healthy for an industry that usually keeps these arguments behind closed doors.

Dan Waldman, Senior Technical Product Manager for Ads Reporting & Measurement at Chewy Ads, asked a question that may matter more for day-to-day practitioners than the methodological argument: what are brands actually using iROAS for? Proving that campaigns work, or optimizing and allocating budget in real time? Those are different jobs, and iROAS — a lagging metric that can take weeks to reach statistical significance — is better suited to the first than the second.

Mirakl Ads is the only retail media solution designed for both 1P & 3P marketplace brands. Why does that matter?

Marketplace sellers demand a seamless advertiser experience that still offers full-funnel ad formats. And retailers need a flexible solution that allows you to scale your media business.

Learn more

The authors respond

I shared these threads with the research team and asked them to respond. The collective statement from the authors was clear on scope: the paper was never designed to evaluate which methodology is best. It was designed to show that the same campaign, measured different ways, produces wildly different numbers — and that most advertisers don't know which way their results were measured.

On the Pauwels and Raman critiques specifically, the authors pointed back to that framing. They weren't testing whether BSTS is a valid causal method or whether MMM belongs in the comparison. They were demonstrating that variation exists within the approaches retail media networks actually use for campaign-level, post-campaign reporting — the reports brands receive and act on every day.

Neither Pauwels' nor Raman's critiques fall within the scope the paper set for itself — and the authors chose not to engage with them directly. What they did engage with was the practical question: what should a mid-market brand actually do with this information?

Liz Roche, VP of Media & Measurement at Albertsons Media Collective, said that right now, advertisers are being asked to reverse-engineer methodology, which isn't scalable, especially for mid-market brands. "The shift we're advocating for isn't one standardized method, but minimum disclosure standards," she said.

The authors went a step further: what mid-market brands need isn't internal data science capability. It's collective leverage. The more brands ask basic questions — was the audience filtered before matching? What was the control group size? Has the methodology changed since our last campaign? — the more networks are pushed to answer consistently. The paper's Appendix D gives brands those questions ready to use, no statistician required.

The conversation that matters

I get pitched a lot of research and white papers. Most of it tells you what you already knew, wrapped in a new data set. This team did something different: they picked a specific, uncomfortable question, put real numbers behind it, and when the industry pushed back, they engaged.

No one paper is going to unravel retail media measurement. But this team isn't trying to eat the whole elephant. They're working through it bite by bite — ROAS Demystified last year, iROAS Demystified this year — and at each step, giving brands something concrete to take away. That's worth more than another panel where everyone agrees measurement is broken and then goes back to their dashboards on Monday.