This section analyzes PEAR's effectiveness by calculating consensus across six recognized explainer agreement measures, including as pairwise rank agreement, rank correlation, and feature agreement. PEAR training not only increases agreement between the explainers utilized in the loss (Grad and IntGrad), but it also makes significant progress in generalizing to explainers that are not visible, such LIME and SHAP.This section analyzes PEAR's effectiveness by calculating consensus across six recognized explainer agreement measures, including as pairwise rank agreement, rank correlation, and feature agreement. PEAR training not only increases agreement between the explainers utilized in the loss (Grad and IntGrad), but it also makes significant progress in generalizing to explainers that are not visible, such LIME and SHAP.

The Trade-Off Between Accuracy and Agreement in AI Models

Abstract and 1. Introduction

1.1 Post Hoc Explanation

1.2 The Disagreement Problem

1.3 Encouraging Explanation Consensus

  1. Related Work

  2. Pear: Post HOC Explainer Agreement Regularizer

  3. The Efficacy of Consensus Training

    4.1 Agreement Metrics

    4.2 Improving Consensus Metrics

    [4.3 Consistency At What Cost?]()

    4.4 Are the Explanations Still Valuable?

    4.5 Consensus and Linearity

    4.6 Two Loss Terms

  4. Discussion

    5.1 Future Work

    5.2 Conclusion, Acknowledgements, and References

Appendix

4.1 Agreement Metrics

In their work on the disagreement problem, Krishna et al. [15] introduce six metrics to measure the amount of agreement between post hoc feature attributions. Let [𝐸1(𝑥)]𝑖 , [𝐸2(𝑥)]𝑖 be the attribution scores from explainers for the 𝑖-th feature of an input 𝑥. A feature’s rank is its index when features are ordered by the absolute value of their attribution scores. A feature is considered in the top-𝑘 most important features if its rank is in the top-𝑘. For example, if the importance scores for a point 𝑥 = [𝑥1, 𝑥2, 𝑥3, 𝑥4], output by one explainer are 𝐸1(𝑥) = [0.1, −0.9, 0.3, −0.2], then the most important feature is 𝑥2 and its rank is 1 (for this explainer).

\ Feature Agreement counts the number of features 𝑥𝑖 such that [𝐸1(𝑥)]𝑖 and [𝐸2(𝑥)]𝑖 are both in the top-𝑘. Rank Agreement counts the number of features in the top-𝑘 with the same rank in 𝐸1(𝑥) and 𝐸2(𝑥). Sign Agreement counts the number of features in the top-𝑘 such that [𝐸1(𝑥)]𝑖 and [𝐸2(𝑥)]𝑖 have the same sign. Signed Rank Agreement counts the number of features in the top-𝑘 such that [𝐸1(𝑥)]𝑖 and [𝐸2(𝑥)]𝑖 agree on both sign and rank. Rank Correlation is the correlation between 𝐸1(𝑥) and 𝐸2(𝑥) (on all features, not just in the top-𝑘), and is often referred to as the Spearman correlation coefficient. Lastly, Pairwise Rank Agreement counts the number of pairs of features (𝑥𝑖 , 𝑥𝑗) such that 𝐸1 and 𝐸2 agree on whether 𝑥𝑖 or 𝑥𝑗 is more important. All of these metrics are formalized as fractions and thus range from 0 to 1, except Rank Correlation, which is a correlation measurement and ranges from −1 to +1. Their formal definitions are provided in Appendix A.3.

\ In the results that follow, we use all of the metrics defined above and reference which one is used where appropriate. When we evaluate a metric to measure the agreement between each pair of explainers, we average the metric over the test data to measure agreement. Both agreement and accuracy measurements are averaged over several trials (see Appendices A.6 and A.5 for error bars).

4.2 Improving Consensus Metrics

The intention of our consensus loss term is to improve agreement metrics. While the objective function explicitly includes only two explainers, we show generalization to unseen explainers as well as to the unseen test data. For example, we train for agreement between Grad and IntGrad and observe an increase in consensus between LIME and SHAP.

\ To evaluate the improvement in agreement metrics when using our consensus loss term, we compute explanations from each explainer on models trained naturally and on models trained with our consensus loss parameter using 𝜆 = 0.5.

\ In Figure 4, using a visualization tool developed by Krishna et al. [15], we show how we evaluate the change in an agreement metric (pairwise rank agreement) between all pairs of explainers on the California Housing data.

\ Hypothesis: We can increase consensus by deliberately training for post hoc explainer agreement.

\ Through our experiments, we observe improved agreement metrics on unseen data and on unseen pairs of explainers. In Figure 4 we show a representative example where Pairwise Rank Agreement between Grad and IntGrad improve from 87% to 96% on unseen data. Moreover, we can look at two other explainers and see that agreement between SmoothGrad and LIME improves from 56% to 79%. This shows both generalization to unseen data and to explainers other than those explicitly used in the loss term. In Appendix A.5, we see more saturated disagreement matrices across all of our datasets and all six agreement metrics.

4.3 Consistency At What Cost?

While training for consensus works to boost agreement, a question remains: How accurate are these models?

\ To address this question, we first point out that there is a tradeoff here, i.e., more consensus comes at the cost of accuracy. With this in mind we posit that there is a Pareto frontier on the accuracy-agreement axes. While we cannot assert that our models are on the Pareto frontier, we plot trade-off curves which represent the trajectory through accuracy-agreement space that is carved out by changing 𝜆.

\ Hypothesis: We can increase consensus with an acceptable drop in accuracy

\ While this hypothesis is phrased as a subjective claim, in reality we define acceptable performance as better than a linear model as explained at the beginning of Section 4. We see across all three datasets that increasing the consensus loss weight 𝜆 leads to higher pairwise rank agreement between LIME and SHAP. Moreover, even with high values of 𝜆, the accuracy stays well above linear models indicating that the loss in performance is acceptable. Therefore this experiment supports the hypothesis.

\ The results plotted in Figure 5 demonstrate that a practitioner concerned with agreement can tune 𝜆 to meet their needs of accuracy and agreement. This figure serves in part to illuminate why our

\ Figure 4: When models are trained naturally, we see disagreement among post hoc explainers (left). However, when trained with our loss function, we see a boost in agreement with only a small cost in accuracy (right). This can be observed visually by the increase in saturation or in more detail by comparing the numbers in corresponding squares.

\ Figure 5: The trade-off curves of consensus and accuracy. Increasing the consensus comes with a drop in accuracy and the trade-off is such that we can achieve more agreement and still outperform linear baselines. Moreover, as we vary the 𝜆 value, we move along the trade-off curve. In all three plots we measure agreement with the pairwise rank agreement metric and we show that increased consensus comes with a drop in accuracy, but all of our models are still more accurate than the linear baseline, indicated by the vertical dashed line (the shaded region shows ± one standard error).

\ hyperparameter choice is sensible—𝜆 gives us control to slide along the trade-off curve, making post hoc explanation disagreement more of a controllable model parameter so that practitioners have more flexibility to make context-specific model design decisions.

\

:::info Authors:

(1) Avi Schwarzschild, University of Maryland, College Park, Maryland, USA and Work completed while working at Arthur (avi1umd.edu);

(2) Max Cembalest, Arthur, New York City, New York, USA;

(3) Karthik Rao, Arthur, New York City, New York, USA;

(4) Keegan Hines, Arthur, New York City, New York, USA;

(5) John Dickerson†, Arthur, New York City, New York, USA (john@arthur.ai).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
SIX Logo
SIX Price(SIX)
$0.01147
$0.01147$0.01147
-0.08%
USD
SIX (SIX) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Solana Treasury Stocks: Why Are These Companies Buying Up SOL?

Solana Treasury Stocks: Why Are These Companies Buying Up SOL?

The post Solana Treasury Stocks: Why Are These Companies Buying Up SOL? appeared on BitcoinEthereumNews.com. In 2020, everyone watched Strategy (called Microstrategy back then) scoop up Bitcoin and turn corporate crypto treasuries into a mainstream story. Now, a new wave is forming. And it’s centered on Solana. Dozens of companies are holding SOL as a bet on price. Except they’re not just holding. They’re building what’s being called Solana treasuries or Digital Asset Treasuries (DATs). These aren’t passive vaults. They’re active strategies that stake, earn yield, and tie into the fast-growing Solana ecosystem. Forward Industries, a Nasdaq-listed firm, recently bought more than 6.8 million SOL, making it the world’s largest Solana treasury company. Others like Helius Medical, Upexi, and DeFi Development are following a similar playbook, turning SOL into a centerpiece of their balance sheets. The trend is clear: Solana treasury stocks are emerging as a new class of crypto-exposed equities. And for investors, the question isn’t just who’s buying but why this strategy is spreading so fast. Key highlights: Solana treasuries (DATs) are corporate reserves of SOL designed to earn yield through staking and DeFi. Companies like Forward Industries, Helius Medical, Upexi, and DeFi Development Corp now hold millions of SOL. Public firms collectively own 17.1M SOL (≈$4B), which makes Solana one of the most adopted treasuries. Unlike Bitcoin treasuries, Solana holdings generate 6–8% annual rewards. It makes reserves into productive assets Solana treasury stocks are emerging as a new way for investors to gain indirect exposure to SOL. Risks remain: volatility, regulation, and concentrated holdings. But corporate adoption is growing fast. What is a Solana treasury (DAT)? A Solana treasury, sometimes called a Digital Asset Treasury (DAT), is when a company holds SOL as part of its balance sheet. But unlike Bitcoin treasuries, these usually aren’t just static reserves sitting in cold storage.  The key difference is productivity. SOL can be staked directly…
Share
BitcoinEthereumNews2025/09/21 06:09
Raoul Pal Predicts Bitcoin’s Correlation With ISM Index

Raoul Pal Predicts Bitcoin’s Correlation With ISM Index

The post Raoul Pal Predicts Bitcoin’s Correlation With ISM Index appeared on BitcoinEthereumNews.com. Key Points: Raoul Pal asserts Bitcoin aligns with the ISM Index cycle. Bitcoin’s price peak predicted for 2026 due to market dynamics. Potential Bitcoin price growth if ISM surpasses 60. Raoul Pal, co-founder and CEO of Real Vision, recently stated that Bitcoin’s price movement is now closely linked to the ISM index, anticipating significant impacts. This connection suggests a potential peak in Bitcoin prices by 2026, aligning with macroeconomic cycles and affecting market dynamics for investors globally. Bitcoin Market Projections Aligned with ISM Growth Raoul Pal, co-founder of Real Vision, asserts a strong connection between Bitcoin and the ISM. He suggests that the Treasury’s decision to extend debt maturity from four to five years artificially lengthens Bitcoin’s market cycle. This effectively reshapes investment expectations into a five-year cycle, delaying projections originally due in 2025 to 2026. Pal expects Bitcoin prices could surpass $300,000 should the ISM rise above 60, riding a wave of increased liquidity. This forecast synchronizes with a broader market understanding that Bitcoin trends reflect major macroeconomic cycles. Raoul Pal stated, “Bitcoin goes up as the ISM goes up… If it goes above 60, I mean, those are high prices in Bitcoin. That’s above $300,000, maybe even higher.” Investor sentiment on social platforms shows keen interest in Pal’s theory. The notion of extending cycle expectations to 2026 has prompted significant discussions among traders, with an emphasis on ISM readings as critical triggers. Official statements from Pal emphasize Bitcoin’s leading position relative to ISM metrics. Key Historical ISM Surges Boost Bitcoin Prices Did you know? Historically, when the ISM Index surpassed 60, Bitcoin has experienced significant rallies, such as those in 2017 and 2020–21. Analysts predict a similar surge if current trends continue. According to CoinMarketCap, Bitcoin’s price currently stands at $111,519.43, reflecting a 24-hour change of -1.41%. The…
Share
BitcoinEthereumNews2025/09/25 19:57
Top Altcoin Primed to Grab Market Share from Cardano (ADA) in the Upcoming Q4 Altseason

Top Altcoin Primed to Grab Market Share from Cardano (ADA) in the Upcoming Q4 Altseason

As the cryptocurrency market prepares for the potential of a Q4 altseason, investors are shifting their attention to those tokens that are creating tangible utility within the DeFi market. While Cardano (ADA) has been the long-term smart contract challenger for years, a newer player, Mutuum Finance (MUTM) is creating a buzz with its lending and […]
Share
Cryptopolitan2025/09/19 01:30