Intraobserver and interobserver reliability assessment of tibial plateau fracture classification systems
Anıl Taşkesen1, İsmail Demirkale1, Mustafa Caner Okkaoğlu1, Mahmut Özdemir1, Mustafa Gökhan Bilgili2, Murat Altay1
02Department of Orthopedics and Traumatology, University of Health Sciences, Bakırköy Dr. Sadi Konuk
Training and Research Hospital, Istanbul, Turkey
1Department of Orthopedics and Traumatology, University of Health Sciences, Keçiören Training and Research Hospital, Ankara, Turkey
Keywords: Classification; interobserver variation; reliability; tibial plateau fracture
Abstract
ABSTRACT Objectives: This study aims to assess the intra- and interobserver reliability of commonly used tibial plateau fracture classification systems.
Patients and methods: This retrospective cohort study included computed tomography (CT) and plain radiographic images (lateral and anteroposterior X-rays) of 60 patients (40 males, 20 females; mean age 45.9 years; range 18 to 80 years) who presented to two orthopaedic clinics between January 2011 and January 2015 with unilateral tibial plateau fractures. All plain X-rays (XR) and CT images were evaluated by four observers on two separate occasions, 1.5 months apart. All fractures were classified according to the Arbeitsgemeinschaft für Osteosynthesefragen- Orthopaedic Trauma Association (AO-OTA), Schatzker, Hohl and Moore, Luo and revised Duparc systems. Intraobserver reliability was measured with Cohen’s kappa (?) coefficient and interobserver reliability with Fleiss’ kappa coefficient.
Results: When Schatzker classification was performed, interobserver reliability was in moderate level for (?=0.51) for XR and in substantial level for CT (?=0.61). When AO/OTA classification was used, interobserver reliability was in moderate level for both methods of diagnosis (?XR=0.43 and ?CT=0.54, respectively). In the Hohl and Moore classification, the interobserver reliability was also moderate for both methods of diagnosis (?XR=0.45 and ?CT=0.51, respectively). Revised Duparc classification showed the lowest interobserver reliability ranging from fair to moderate level (?XR=0.27-0.55 and ?CT=0.44-0.61). Interobserver reliability for Luo classification was ?CT=0.47. Intraobserver reliability for CT in Luo classification was in substantial level for observers 1, 2 and 3 (?CT=0.67-0.71) and in perfect level for observer 4 (?CT=0.84). Intraobserver reliability was in substantial level in Schatzker classification and in moderate level at the other classifications.
Conclusion: Among the classification systems compared in this study, Schatzker was the most reliable particularly when CT was used. On the other hand, revised Duparc classification presented the worse reliability results due to its complexity and different morphological subtypes.