With momentum slowing around Pillar Two safe harbors, multinationals will likely continue to defend many of their intercompany policies with a time-honored and oft-disparaged relic: the humble benchmark study.

While one can argue that safeharbors may yet flourish in certain jurisdictions, it’s likely that tailored benchmarks will persist as the tip of the spear in future audits of policies supported by transactional profits methods (i.e., TNMM/CPMs, RPMs, etc.).

In this context, one of the most common shortcuts in transfer pricing practice is reliance on ‘library’ or ‘Off the shelf’ benchmarks—or subtle variations thereof. These are essentially pre-packaged sets of comparables that purport to broadly represent a tested business function without necessarily having been crafted in direct observance of the actual roles of personnel, business activities, risks, and assets of the affiliate selected as the ‘tested party.’ Or even prepared recently, for that matter.

Such benchmarks may appear tidy and efficient. Their generous ranges of profitability indicators may accommodate safe passage by the proverbial freight vehicle. Most importantly, they can be cheap(-ish). But, will they fall short of what real scrutiny from your next audit will demand?

Benchmarks Without Context May Miss the Mark

In theory, an economic analysis—in this case, the aforementioned humble benchmark—is only as defensible as the functional analysis that supports it. (See: Just about any random page of the OECD Guidelines for more on this...) Without that anchor, selection of comparables may effectively be an exercise in general functional category-matching, rather than a proper comparability analysis of a given transaction.

Perhaps the most iconic library benchmark example is the ‘general admin’ or ‘back office’ services benchmark. This type of set is designed to be broad, by including a de minimis number of each from a variety of common functions, and it boasts a wide range that covers typical markup rates. But is there really a suitable connection between each of the search and screening criteria applied and the pages of functional analysis in the prior report section? For instance, if a legal function is performed and a component of the cost pool, were legal service firms sought? Marcomm and PR functions are another similar case. The overall search process takes on rather different scale and dimensions if there is a mismatch in common, key functions like these (i.e., which tend to trigger large numbers of potential comparable companies requiring screening).

Another iconic example can be found in basic distribution sets. While distribution benchmarks can often be refined through adjustments (i.e., for working capital differences), a mismatch between the types of products and functions reflected in a controlled transaction versus what is in the initial scope of a search strategy can be difficult to explain after the fact. This mismatch happens more often than one might think. Not all consumer good or industrial product distribution models are the same.

A curious omission from many benchmarks in transfer pricing reports is a definitive statement that explains how the search and screening criteria was borne from relevant facts learned during the functional analysis. Does yours have one?

Benchmarks that Unravel

In both of the above examples, it is very possible that final companies selected for a ‘library’ set have analogous features to one or more companies that were rejected, despite having functions and products or services that in fact closely match. When this occurs, a library benchmark may ironically be providing a tax authority with a roadmap to either pick apart and disregard the benchmark, or exploit the elements of a benchmark that are in sync to generate a very different set and range of profitability.

How does this happen? If a company in the benchmark set is as similar to a tested party function as it is to one or more of the companies eliminated during (qualitative) screening, it may be hard to explain why it’s excluded. Tax authorities have less time pressure, and likely have a detailed review process in place. Thus, they may be likely to spot those missed matches.

Act on Local Expectation

When allocating a limited TP compliance budget, it’s probably tempting to press pause on policy upgrades. But the reality is that local authorities will continue to apply their own interpretations and demand substance behind your positions. A well-articulated functional analysis—reflected in the core design of your benchmarks, not just adjacent to them—remains a central bulwark of your defenses.

Transfer pricing doesn't happen in a vacuum. And neither should benchmarking. Defensible compliance runs through a clear, customized understanding of what each entity actually does. Anything less may just be a guess with a spreadsheet.

Now, About that Benchmark: Anchoring Comparables

Evident Solutions, LLC

Contact: info@beevident.com

Now, About that Benchmark: Anchoring Comparables

Blog Post Title Two

Evident Solutions, LLC

Contact: info@beevident.com