Swarm intelligence: a novel clinical strategy for improving imaging annotation accuracy, using wisdom of the crowds.
Rutwik Shah1, Bruno Astuto Arouche Nunes1, Tyler Gleason1, Justin Banaga1, Kevin Sweetwood1, Allen Ye1, Will Fletcher1, Rina Patel1, Kevin McGill1, Thomas Link1, Valentina Pedoia1, Sharmila Majumdar1, and Jason Crane1
1Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, United States
Swarm predictions for both cohorts (radiologists and residents) were
closer to clinical ground truth, outperformed their own individually graded
labels and the AI predictions. Accuracy of resident performance also
improved with increase in swarm size (three versus five participants).
Figure 1: A) Sagittal cube sequences evaluated for meniscal lesions (arrow
pointing to post. horn tear in medial meniscus). B) Swarm platform interface
used to derive consensus grades for location of lesion. C) Visualization of the
trajectory of decision made by the swarm. While there were individually
divergent opinions, the eventual consensus of the group in this example was for
posterior horn of the medial meniscus.
Figure 2: Resident
versus Ground truth (GT). A) Confusion matrix (CM) for 3 resident majority vote
vs GT (kappa: 0.01) B) CM for 3 resident swarm vs GT. Accuracy improves
compared to majority vote (kappa: 0.24) C) CM for 5 resident majority vote vs
GT (kappa: 0.05) D) CM for 5 resident swarm vs GT. Accuracy improves compared to
majority vote (kappa: 0.37).
Note: 5 resident swarm was unable to obtain a consensus on 1
exam, which was excluded during CM tabulation.