Now, About that Benchmark: Anchoring Comparables
With momentum slowing around Pillar Two, multinationals will likely continue defending many routine intercompany policies with a time-honored and oft-disparaged relic: the humble benchmark study.
While one can argue that routine profit safe harbors may yet flourish in certain jurisdictions, it’s likely that in most countries, tailored benchmarks will persist as the tip of the spear in future audits of policies supported by transactional profits methods (i.e., TNMM/CPMs, RPMs, etc.).
In this context, one of the most common shortcuts in transfer pricing practice is reliance on ‘Library’ or ‘Off the Shelf’ benchmarks—or subtle variations thereof. These are essentially pre-screened sets of comparables that purport to broadly represent a tested business function without necessarily having been crafted in direct observance of the actual roles of personnel, business activities, risks, and assets of the affiliate selected as the ‘tested party.’ Or even prepared recently, for that matter. But be aware that your advisor is unlikely to flag when they are using library benchmarks.
Such benchmarks may appear tidy and efficient. Their generous interquartile ranges may be wide enough to drive a proverbial truck through. Most importantly, they can be cheap(-ish). But, will they withstand scrutiny from a motivated tax auditor?
Benchmarks Without Context May Miss the Mark
In theory, an economic analysis—in this case, the aforementioned humble benchmark—is only as defensible as the functional analysis that supports it. (See: Just about any random page of the OECD Guidelines for more on this...) Without that anchor, comparables selection may effectively be an exercise in general functional category-matching, rather than a proper comparability analysis of a given transaction.
Perhaps the most iconic library benchmark example is the ‘general admin’ or ‘back office’ services benchmark. This type of set is designed to be broad, by including a de minimis number of each from a variety of common corporate functions, and it boasts a range that conveniently covers most typical markup policy rates. But is there really a suitable connection between each of the search and screening criteria applied and the pages of functional analysis in the prior report section? This linkage is really the whole point of the exercise! At its most basic, if a legal function is performed and is a component of the services cost pool, were legal service firms even sought? Marcomm and PR functions are another similar case. The overall search process takes on a rather different scale and dimensions if there is a mismatch in common, key functions like these (i.e., which tend to trigger large numbers of potential comparable companies requiring screening).
Another iconic example can be found in basic distribution sets. While distribution benchmarks can often be refined through adjustments (i.e., for working capital differences), a mismatch between the types of products and functions reflected in a controlled transaction versus what is in the initial scope of a search strategy can be difficult to explain after the fact. This mismatch happens more often than one might think. Not all consumer good or industrial product distribution models are the same.
A curious omission from many benchmarks in transfer pricing reports is a definitive statement that explains how the search and screening criteria was derived from the relevant facts learned during the functional analysis. Do your reports have a statement like this?
Benchmarks that Unravel
In both of the above examples, it is very possible that final companies selected for a ‘Library’ set have analogous features to one or more companies that were rejected, despite having functions and products or services that closely match. When this occurs, a library benchmark may ironically be providing a tax authority with a roadmap to either pick apart and disregard the benchmark, or exploit the elements of a benchmark that are in sync to generate a very different set and range of profitability.
How does this happen? If a company in the final benchmark set is as similar to your tested party’s functions as it is to one or more of the companies eliminated during (qualitative) screening—albeit eliminated for a different reason in the original screening—it may be hard to explain why it’s excluded now. (Oops…) Tax authorities have less time pressure, and likely have a detailed review process in place. Thus, they may be likely to spot those missed matches.
Act on Local Expectation
When TP compliance budgets are tight, it’s probably tempting to pause on policy upgrades. But local authorities will continue to apply their own interpretations and demand substance behind your positions. A well-articulated functional analysis—reflected in the core design of your benchmarks, not just adjacent to them—remains a central bulwark of your defenses.
Transfer pricing doesn't happen in a vacuum. And neither should benchmarking. Defensible compliance runs through a clear, customized understanding of what each entity actually does. Anything less may just be a guess with a spreadsheet.