Updated: Oct 14
In imaging diagnostics, deep learning models are already at par with expert radiologists in the controlled laboratory test environments. However, most of these models fail to work just as well when deployed in real clinical settings because they fall prey to issues such as biases due to inadequate (limited) training data and lack of generalizability.
Domain shift is a common challenge to generalizability in machine learning. Domain shift occurs when the distribution of images used for training a model does not match the distribution the model encounters when deployed. Imaging characteristics vary from one hospital to the other due to differences in equipment, imaging settings and protocols, patient position, populations, etc. Even in one hospital, they may vary over time. The images below demonstrate how chest X-rays taken at different hospitals can look very different.
Images from two different datasets and their corresponding pixel intensity histograms
The X-ray images above and their distribution highlight the problem of dataset shift. If we build a model using images from the first distribution and in a practical environment the algorithm receives data with a different distribution, there is absolutely no guarantee what results will come out of the model. In completely automated systems, the shift may even go undetected, leading to a totally unacceptable scenario.
There are several studies that demonstrate a significant drop in model performance due to domain shift. Deep learning models can diagnose a plethora of conditions using radiological data, but they are not always robust to location changes. Most models developed to detect chest ailments from X-rays had an AUC (an indicator of a classifier’s discrimination power) between 0.90 and 0.96 when evaluated on data distributed similarly to training data. On external data, the AUC dropped significantly to the range 0.75-0.89.  . In a prospective study conducted on IDx, a deep learning model developed for detecting diabetic retinopathy, researchers found sensitivity dropped to 0.87 (this model had a sensitivity score of 0.96 on a publicly available dataset) . Segmentation and classification models are at the core of AI-based diagnostics. Architectures like U-Net, a powerful model used for image segmentation, also often face unexpected performance drops in these cases, and classification models based on DenseNet architecture even more so . Models can be trained on data from various settings to mitigate the domain shift issue, but obtaining adequate data from each site for re-training is often tedious and impractical.
Industry experts believe deep learning models can adapt to challenges like domain shift if they actively learn from experience using a feedback loop . Human experts are a part of this continuous feedback process - they evaluate predictions given by the models and either accept or reject the outcome . Through the feedback loop, the model assimilates this information and continuously adapts to new data without requiring re-training on a large dataset. The end result is a fine-tuned model that adapts to new situations quickly and provides accurate results.
The human-in-the-loop deep learning approach combines advantages of both radiologists and machine-learning algorithms. The advantages of artificial intelligence (AI) models (i.e. expeditious automated detection) are leveraged and human experts fill gaps in AI capabilities that arise due to underlying biases and potential lack of generalizability. Studies of automated radiology systems advocate a radiology workflow in which healthcare professionals leverage AI . Human experts augmented by AI are shown to have better performance than simply either human or AI alone . Industry experts consider workflow integration to be an important step in the adoption of AI in radiology.
Radiology departments are busier than ever and AI can help improve efficiency of operations. Deep-learning algorithms can step in to provide automated rapid diagnosis for high confidence cases and allow radiologists to concentrate on more complex cases.
DeepTek’s implementation of the expert-in-the-loop model
At DeepTek, we use the expert-in-the loop approach to fill in the gaps in AI capabilities. The expert is a part of all three stages of model development. In stage three of model development, we introduce several methods to detect domain shift. These include anomaly detection algorithms such as adversarial methods (GANs) and convolutional/variational autoencoders. We address the problem of domain shift with a feedback loop wherein the expert-in-the-loop reviews a small percentage of AI outcomes and helps the AI actively learn from feedback. Models learn on new data and accuracies are not compromised at any stage.
We successfully deployed a deep learning model for screening tuberculosis in one of the first prospective studies in India. For a period of six months, the models were fine-tuned every month on fresh batches using expert radiologist annotations as part of the feedback loop. This led to a continuous improvement in model performance. The resulting model, when finally deployed, remained stable and generalized well across an unfamiliar dataset. It correctly captured 90% of tuberculosis-positive cases and had a 88% accuracy rate.
The expert-in-the-loop approach has the potential to make seamless adoption of AI in radiology practice possible. Our end-to-end radiology workflow optimization platform is working towards this end goal, leveraging both human and artificial intelligence to provide collectively better healthcare.
The majority of industry experts agree that the symbiosis between humans and experts is here to stay. We conducted a poll on LinkedIn and asked if radiologists will replace humans in the near future. The results were in favour of AI being used to augment radiologist capabilities.
1.Yasaka K, Abe O. Deep learning and artificial intelligence in radiology: Current applications and future directions. PLoS Med. 2018;15(11):e1002707. Published 2018 Nov 30. doi:10.1371/journal.pmed.1002707
2. Adarsh Subbaswamy, Suchi Saria, From development to deployment: dataset shift, causality, and shift-stable models in health AI, Biostatistics, Volume 21, Issue 2, April 2020, Pages 345–352, https://doi.org/10.1093/biostatistics/kxz041
3. Ting, Daniel & Pasquale, Louis & Peng, Lily & Campbell, John & Lee, Aaron & Raman, Rajiv & Tan, Gavin & Schmetterer, Leopold & Keane, Pearse & Wong, Tien. (2018). Artificial intelligence and deep learning in ophthalmology. British Journal of Ophthalmology. 103. bjophthalmol-2018. 10.1136/bjophthalmol-2018-313173.
4. Janizek, Joseph & Erion, Gabriel & DeGrave, Alex & Lee, Su-In. (2020). An Adversarial Approach for the Robust Classification of Pneumonia from Chest Radiographs.
5. Chris McIntyre, Radiology Today Magazine (accessed on 28 September 2020).
6. Patel, B.N., Rosenberg, L., Willcox, G. et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. npj Digit. Med. 2, 111 (2019). https://doi.org/10.1038/s41746-019-0189-7
7. Bien N, Rajpurkar P, Ball RL, Irvin J, Park A, et al. (2018) Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLOS Medicine 15(11): e1002699. https://doi.org/10.1371/journal.pmed.1002699.