LawY is ONLINE: You may have experienced some disruption in service earlier today. Thanks for your patience! →
Evaluations

Evaluating accuracy in LawY

We understand the legal profession demands the highest standards of accuracy. As your trusted AI legal research assistant, we're committed to continually improving. This is why, in addition to manual evaluation by our team of lawyers, we've also introduced an automatic system to evaluate our AI-answers alongside other leading platforms. We're committed to publishing these results to offer you transparency and unparalleled accuracy.

See the results
Published: 1st August 2025

The methodology

We are excited to introduce an automated evaluation system that  benchmarks LawY and other AI platforms against a human-verified source of truth. This helps us deliver continually higher levels of accuracy, shape the direction of LawY's development, and offer our users transparency.

300 sample questions

We randomly selected 300 legal questions spanning all of our major jurisdictions and areas of law.

3 platforms to compare

Each question was answered by 3 different AI solutions: LawY, Gemini 2.5 Pro, and ChatGPT 4.1.

1 source of truth

'Golden Answers' were AI-answers reviewed and corrected by an experienced 'lawyer-in-the-loop'.

Note: To ensure objectivity, we intentionally excluded all golden answers from LawY's knowledge base of verified answers during testing. This prevented any unfair advantage and ensured our evaluations reflect genuine performance.

The results

Our latest evaluation results show that LawY's AI-answers were measured as the most correct when compared to leading platforms including Gemini 2.5 Pro and ChatGPT 4.1. As highlighted above, we used our lawyer-reviewed Golden Answers as the source of truth, but these were excluded for the duration of the testing.

LawY

81% correct
243/300

Gemini 2.5 Pro

58% correct
175/300

ChatGPT 4.1

56% correct
167/300