Abnormality Detection in Musculoskeletal Radiographs Using Deep Learning

Updated: Jul 8, 2020

Rohit Lokwani


A brief overview of Medical image Analysis, techniques to improve your CNN’s performance on real world datasets


Bio-medical image analysis is an interdisciplinary field which includes: biology, physics, medicine and engineering. It deals with application of image processing techniques to biological or medical problems. Medical images to be analyzed contain a lot of information regarding the anatomical structure under investigation to reveal valid diagnosis and thereby helping doctors to choose adequate therapy. Doctors usually analyse these medical images manually through visual interpretation.



Photo from an-introduction-to-biomedical-image-analysis-with-tensorflow-and-dltk showing Examples of medical images
Photo from an-introduction-to-biomedical-image-analysis-with-tensorflow-and-dltk showing Examples of medical images

From top left to bottom right : Multi-sequence brain MRI: T1-weighted, T1 inversion recovery and T2 FLAIR channels; Stitched whole-body MRI; planar cardiac ultrasound; chest X-ray; cardiac cine MRI.


Why computer vision and machine learning?

Computer vision methods have long been employed to automatically analyze biomedical images. The recent advent of deep learning has replaced many orthodox machine learning methods as it avoids the creation of hand-engineering features, thus removing a critical source of error from the process. Additionally, the faster GPU-accelerated networks, allow us to scale analysis to unprecedented amounts of data.


Abnormality Detection

Now, coming to the main topic that we will be focusing on for this blog is the abnormality detection in bones. Diseases and injuries to the bone are the major contributing factors in causing abnormalities in bones. Now, what happens is, whenever there is an injury to the bone, the physician asks you to do an X-Ray, thus when such hundreds of patients visit hospitals everyday there are massive number of X-rays done on regular basis. To be specific with the stats, Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. So, in order to reduce the error rate of the Radiologist and to do the analysis much faster, an AI solution should suffice the purpose.


Dataset Used

So, there was this Deep Learning competition hosted by Stanford last year which expected the participants to detect the bone abnormalities. The dataset is widely known as MURA. MURA is a dataset of musculoskeletal radiographs consisting of 14,863 studies from 12,173 patients, with a total of 40,561 multi-view radiographic images. Each belongs to one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder, and wrist. You can download the dataset from the official contest website here.



Source: Stanford Contest site
Source: Stanford Contest site

As AI in Medical domain is booming, I decided to get a hands-on experience with this dataset. The models I developed performed well and attained the Kappa just equivalent to the kappa attained by the Stanford team and for some bones it did outperform them, I’ll be sharing the comparison table at the end of this blog. The rest of the blog is focused on the techniques I used to improve the performance of my models. Without doubt, you can try these techniques on other datasets as well.


To start with, I built separate models for all seven different organs. The models that worked out for me were the DenseNet 169(The Stanford team used an ensemble of 5 such models) and DenseNet121. My baseline models were DenseNets pretained on the imagenet dataset and were transfer learned on this dataset. But sadly, the results weren’t really close to the Stanford kappa, the average kappa I achieved was around 0.5. And then I started browsing the internet for various tips and tricks to let your CNNs do the talking for you.

Following are the techniques I used to improve my model’s performance and bring the kappa equivalent to 0.705.


Learning Rate Schedulers

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging, as a value too small may result in a long training process that could get stuck, whereas a value too large may result in learning a sub-optimal set of weights too fast or an unstable training process. So what becomes necessary is to find the optimal learning rate. Although, libraries like fastai have built-in functions to find the ideal learning rate, here I use the traditional ones. The two techniques I used were:


1. Step Decay: This technique involves reducing the learning rate after certain predefined number of epochs. You can write your implementation of it. Mine was as follows:


def step_decay(epoch):
   initial_lrate = 0.0001
   drop = 0.1
   epochs_drop = 10.0
   lrate = initial_lrate*          math.pow(drop,math.floor((1+epoch)/epochs_drop))
   return lrate

This function basically reduces the learning rate to 1/10th of its initial value after every 10 epochs.


This function basically reduces the learning rate to 1/10th of its initial value after every 10 epochs. It can be included in the callbacks as follows:


lrate = LearningRateScheduler(step_decay)
callbacks = [lrate]

2. Loss based Decay: This technique reduces the learning rate after a patience of certain predefined epochs once the model stops improving. Keras has a built-in module named ReduceLRonPlateau which is shown in the code below.


reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1,
patience=10, min_delta=0.0001, verbose=1, min_lr=0.0000001)
callbacks = [reduce_lr]

The above function monitors the validation loss and once it stops decreasing for a patience of 10 epochs it reduces the learning rate by a factor of 0.1. Here, my model had an initial learning rate of 0.001 and through this function I could take it to a value min_lr which is 10–6. For me the later technique proved to be more effective. To get a detailed overview of how to optimise the model’s performance with help of learning rate, you could skim through this impressive article.


Class Weights Penalisation


This is one of the techniques I came across while reading about optimizing model’s performance when you have a class imbalance. Although, when the class imbalance is not too high, it’s not that useful. But in my case, the positive cases were approximately 2/3rd that of the negative ones. So, I used this technique to train the model. There are two ways you could do this:


1. Set the class weights manually for each of the classes by calculating the imbalance in the training set.

class_weight = {0: 1, 1: 1.5}model.fit(X_train, Y_train, nb_epoch=50, batch_size=32, class_weight=class_weight)