Individual gestalt is unreliable for the evaluation of quality in medical education blogs: a METRIQ study

abstract

STUDY OBJECTIVE: Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. METHODS: We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. RESULTS: One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). CONCLUSION: The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment.

Link to Article

10.1016/j.annemergmed.2016.12.025

authors

Thoma, B.

Sebok-Syer, S. S.

Krishnan, K.

Siemens, M.

Trueger, N. S.

Colmers-Gray, I.

Woods, R.

Petrusa, E.

Chan, T. M.

METRIQ Study Collaborators-

Ankel, Felix K., MD

Mott, S. E. HealthPartners Author

Paddock, Michael T., DO HealthPartners Author

publication date

2017

published in

Annals of Emergency Medicine Journal