James Ding
Feb 18, 2026 20:38
Harvey’s BigLaw Bench International doubles benchmark measurement, testing AI authorized capabilities throughout jurisdictions as mannequin scores hit 90% on core duties.
Harvey AI launched BigLaw Bench: International on February 18, greater than doubling its public benchmark dataset with new evaluations for UK, Australia, and Spanish authorized techniques. The enlargement marks the primary main replace since Harvey introduced plans to scale BLB fivefold earlier this month.
The timing issues. Main basis fashions now hit roughly 90% on BLB’s core authorized duties—up from round 60% in 2024. However Harvey’s inside analysis reveals efficiency degrades when fashions sort out jurisdiction-specific work. BLB: International goals to quantify precisely the place that localization hole exists.
Six Job Classes Underneath the Microscope
Harvey constructed the benchmark round six workflows its enterprise shoppers truly use: drafting, lengthy doc evaluation, doc comparability, public analysis, multi-document evaluation, and extraction. Every process was designed by native practitioners in collaboration with Mercor, then cross-reviewed by Harvey’s utilized authorized researchers.
The eventualities get particular. One UK process asks fashions to advise on FCA enforcement dangers when a CSO sells shares earlier than a failed drug trial announcement. A Spanish benchmark entails analyzing CNMC antitrust publicity for tech corporations caught in a no-poach settlement. Australian duties embody FIRB approval determinations for infrastructure fund acquisitions.
“The objective of BLB: International is to assist perceive and remediate the place basis fashions wrestle to localize successfully on core AI duties,” Harvey acknowledged within the announcement.
Why This Issues for Enterprise AI Adoption
Legislation companies working throughout borders face an actual downside: an AI assistant that handles Delaware company legislation brilliantly would possibly locate UK monetary laws or Spanish competitors legislation. With out standardized benchmarks, there is not any approach to confirm constant high quality throughout workplaces.
Harvey’s strategy—constructing jurisdiction-specific duties with over two dozen native specialists—creates a baseline for measuring that consistency. The corporate plans to increase BLB: Area, its preference-based analysis system launched in November 2025, to worldwide markets as properly.
Extra nations are coming. Harvey indicated it is going to proceed constructing native knowledgeable cohorts and deepening current datasets primarily based on buyer suggestions. For authorized tech consumers evaluating AI distributors, BLB: International supplies one thing that did not exist earlier than: a standardized approach to examine mannequin efficiency on actual authorized work throughout a number of jurisdictions.
Picture supply: Shutterstock


