The current gold-standard method for diagnosing exercise-induced laryngeal obstruction (EILO) is continuous laryngoscopy during exercise (CLE), with severity classified by a visual grade scoring system. We evaluated the precision of this approach, by evaluating test–retest reliability of CLE and both inter- and intra-rater variability.

In this prospective case–control study, subjects completed four consecutive treadmill CLE tests under identical conditions. Laryngoscopic video recordings were anonymised and graded by three expert raters. 2 months following initial scoring, videos were re-randomised and rating repeated to assess intra-rater agreement.

20 subjects (16 cases and four controls) completed four CLE tests. The time to exhaustion increased by 30 s (95% CI 0.02–57.8, p<0.05) in the second CLE compared with the first test, but remained identical in the subsequent tests. Only one-third of subjects retained their initial diagnosis in the subsequent three tests. Inter-rater agreement on grade scores (weighted Cohen's ?) was 0.16–0.45, while intra-rater agreement ranged from 0.30 to 0.67.

The CLE test is key in the diagnostic assessment of patients with EILO. However, the widely adopted visual grade scoring system does not appear to be a robust means for reliably classifying severity of EILO.