We present AgriARC, the first adversarial multimodal benchmark designed to evaluate whether agricultural AI systems can reason about novel farming situations beyond their training distribution. Grounded in real Indian farm data from 200+ farmers across 12 states and 8 agro-climatic zones, AgriARC comprises 500 test cases spanning 6 reasoning categories.
Our evaluation reveals that leading agricultural AI systems achieve an average score of 1.47 out of 3.00 on the static track, indicating significant room for improvement in agricultural reasoning. Cross-region transfer and scheme stacking emerge as the most challenging categories, with even the best systems scoring below 1.2. We additionally introduce a Digital Twin simulation track (AgriARC Live) that tests sequential decision-making over a full crop season.