Why Judges Don’t Agree

Question: Why do two judges that judge the same thing (such as Music Performance) not have similar scores (10 points or more difference) or not even agree on the best band in that category?

Answer: Let me tackle the score question first. While you would expect to see similar numbers for the same caption, it’s not always the case. This can be due to a couple of things.

One judge may always seem to end up sampling the “wrong” people (depending upon your point of view). Just your luck, they end up looking at the clarinets trying to build that linear form with no reference points, and then they move to the color guard who NEVER can get that 16-count phrase together. They follow that with a sampling of the battery percussion who just put those 10 pages of drill on this morning! The other judge, however, focused his/her efforts on the trumpets who worked those diamond cutters to death this week, and then they move to the rifles who proudly displayed the results of three sectionals they held the previous week. At the end of the show, you can easily have two different views of the program which may affect the scoring range.
One judge has a slightly different tolerance than the other. Although all judges strive to be consistent in their evaluations, judges who are regularly seeing bands from around the country may have a slightly different threshold for error vs. someone who is only judging within Kentucky each weekend. A performance level that earns a 65 from one judge may only warrant a 55 from the other. And keep in mind that is always divided by 10, so you don’t really have a 10 point difference, but a SINGLE point (and trust me, most judges know this). If the judge who gave you the 55 has you in first place (as did the judge who gave you the 65), chances are his/her numbers are probably lower for every band throughout the course of the contest.

The rankings (ordinals) are sometimes a little harder to explain. If I have a caption counterpart, I always first look at the ordinals within a class or contest. If I’ve placed a group 1st and my colleague has them 4th, I then look at our spread between those ordinals. If we’ve both stacked our top four bands within .8 of a point, we’ve basically stated that those groups are fairly equal and very minor differences affected our rankings. If those bands went right back out and performed again, we both potentially could have them in a completely different order. In this instance, the ordinals combined with the spreads don’t throw up any “yellow flags”. However, if I had my first place band 2.0 points ahead of the balance of the pack and my colleague had the same band in fourth (3.0 points behind his front runner), we obviously have some differences. We will frequently discuss what we saw in each program and how it affected our score. At the shows that offer critique, we will then discuss our numbers with that band’s staff during that time and what prompted us to assign that score. Yes, there are times when a judge just may “miss it”. We are human, just like anyone else. It’s discussions like these that happen between judges after the show that help drive a better level of consistency.

– Mark Culp