The Basic Classification of Thyroid Tumors on UltraSound Images using Deep Learning Methods

10 min readJan 9, 2022

Thyroid nodule is one of the most common endocrine carcinomas. Due to its higher reveal ability and ability to distinguish between benign and malignant nodules in pathological features, ultrasonography has become the most widely used modality for finding and diagnosing thyroid cancer when compared to CT and MRI.

In this study, the purpose is the classification of thyroid tumors on ultrasound images with 6 different categories:

• 1 (Benign) • 2 (Benign)

• 4a (Malign) • 4b (Malign)

4c (Malign) • 5 (Malign)

For this end, we will develop a deep learning algorithm using ultrasound images labeled and evaluate the performance of the algorithm.

Colombia National University presented an open access database of thyroid ultrasound images. The dataset consists of a set of B-mode Ultrasound images, including a complete annotation and diagnostic description of suspicious thyroid lesions by expert radiologists. [1]

Firstly, we will import the libraries we will use.

import os
import xml.etree.ElementTree as ET
from natsort import natsorted
import pandas as pd
from PIL import Image
import numpy as np
import requests
from zipfile import ZipFile
from io import BytesIO
import cv2
import matplotlib.pyplot as plt
import tensorflow as tf
import math
import random
from six.moves import xrange
import collections
import string

Then , we will download and prepare the data.

def download_dataset(save_path):
    r = requests.get("http://cimalab.intec.co/applications/thyroid/thyroid.zip")
    print("Downloading...")
    z = ZipFile(BytesIO(r.content))    
    z.extractall(save_path)
    print("Completed...")

# XML and Jpeg     
def to_dataframe(path):
    dirs=natsorted(os.listdir(path))
    xml_list=[]
    img_list=[]
    for i in range(len(dirs)):
        if '.xml' in dirs[i]:
            xml_list.append(dirs[i])
        if not '.xml'  in dirs[i]:
            img_list.append(dirs[i])
    xml_list=natsorted(xml_list)
    img_list=natsorted(img_list)
    tirads=[]
    for j in range(len(xml_list)):
        tree = ET.parse(path+'/'+xml_list[j])
        a=tree.findall("./tirads")
        if a[-1].text!=None:
            case=[xml_list[j],a[-1].text]
            tirads.append(case)
    data=[]
    for k in range(len(tirads)):
        xml=tirads[k][0][:-4]
        for z in range(len(img_list)):
            if xml+'_1.jpg'==img_list[z] or xml+'_2.jpg'==img_list[z] or xml+'_3.jpg'==img_list[z]:
                m=[img_list[z],tirads[k][1]]
                data.append(m)

    df = pd.DataFrame(data,columns =['Jpeg_Name', 'Tirads'])
    return df

The dataset is imbalanced, and several images contain thyroid cancers that are not labeled. In addition, some images have two thyroid ultrasound images from the same subject. Before algorithm training, images were processed. In data preperation ,firstly , we normalized images . Secondly, we cropped images to find to the biggest coutour of image.Also, there are text about classification of thyroid tumors on images. They were deleted because of to prevent the bias.

Figure 1 — Original Image and Cropped and Resized Image which has given to model

The Code :

#Cropp Function
def croping(img,x, y, w, h):
    if abs(w)<abs(h):
        img2=np.zeros([h,h])
        img2[:,h-w:h]=img[y:y+h, x:x+w]
    if abs(h)<abs(w):  
        img2=np.zeros([w,w])
        img2[w-h:w,:]=img[y:y+h, x:x+w]
    else:
        return img
    return img2

def convert_one_channel(img):
    #if some images have 3 channels , although they are grayscale image
    if len(img.shape)>2:
        img=img[:,:,0]
        return img
    else:
        return img

#Remove Fill area from Image and Resizeing
def crop_resize(path,resize_shape):
    img=plt.imread(path)
    img=convert_one_channel(np.asarray(img))    
    kernel =( np.ones((5,5), dtype=np.float32))
    ret,thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY)
    thresh = thresh.astype(np.uint8)
    a1,b1=thresh.shape
    thresh=cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel,iterations=3 )
    thresh=cv2.erode(thresh,kernel,iterations =5)
    contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    c_area=np.zeros([len(contours)])
    for i in range(len(contours)):
        c_area[i]= cv2.contourArea(contours[i]) 
    cnts=contours[np.argmax(c_area)]    
    x, y, w, h = cv2.boundingRect(cnts)
    roi = croping(img, x, y, w, h)
    roi=cv2.resize(roi,(resize_shape),interpolation=cv2.INTER_LANCZOS4)
    return roi


# TO Data Matrix
def to_imgmatrix(resize_shape,path,df):
    path=path+'/'  
    images=crop_resize(path+df["Jpeg_Name"][0],resize_shape)
    for i in range (1,len(df["Jpeg_Name"])):
        img=crop_resize(path+df["Jpeg_Name"][i],resize_shape)
        images=np.concatenate((images,img))
    images=np.reshape(images,(len(df["Jpeg_Name"]),resize_shape[0],resize_shape[1],1))
    return images

def prepare_data(path,resize_shape):
    df=to_dataframe(path)
    data=to_imgmatrix(resize_shape,path,df) 
    return df,data
download_dataset("/content/Data")#We want to resize 256x256 df,data=prepare_data("/content/Data",(256,256))

Let’s see df:

df.head()

We need to y as categorical data to give as an input to model, so the code :

# We need numeric category
def to_categoricalmatrix(df):
    #There are little categories, so i handled manually
    Y=np.zeros([len(df["Tirads"])])
    for i in range(len(df["Tirads"])):
        if df["Tirads"][i]=="2":
          Y[i]=0
        if df["Tirads"][i]=="3":
          Y[i]=1
        if df["Tirads"][i]=="4a":
          Y[i]=2
        if df["Tirads"][i]=="4b":
          Y[i]=3
        if df["Tirads"][i]=="4c":
          Y[i]=4
        if df["Tirads"][i]=="5":
          Y[i]=5
    return Y# to integer
y=to_categoricalmatrix(df)
y=tf.keras.utils.to_categorical(y, dtype='float32')

Befero the training our model , we also need to normalize to the images :

#normalize function
def normalize(data):
  for i in range(len(data)):
    data[i,:,:,:]=data[i,:,:,:]*(1/np.max(data[i,:,:,:]))
  return np.float32(data)
  
# we need noormalize to images
x=normalize(data)

Let’s see , random examples from X :

import random
random_number=random. randint(0,len(df["Tirads"]))
plt.figure(figsize = (20,10))
tit="Classification : "+np.str(df["Tirads"][random_number])
plt.title(tit,fontsize = 40)
plt.imshow(x[random_number,:,:,0],cmap="gray")

Figure 3 — Examples Cropped and Resized Data

Before the training , we need to split dataset. In final , we have 347 images.

#Splitting test ,validation ,and train
x_train=np.copy(x[:300,:,:,:])
x_test=np.copy(x[313:,:,:,:])
x_valid=np.copy(x[300:313,:,:,:])

y_train=np.copy(y[:300,:])
y_valid=np.copy(y[300:313,:])
y_test=np.copy(y[313:,:])

Pre-processing is the initial stage in refining image data, such as removing distortion, so that it may be utilized to process data more effectively. We apply many methods of pre-processing in this study, including augmentation, which tries to avoid overfitting so that even if the device encounters a difficulty with micro variations, the software can still make accurate predictions.

So ,The augmentation used is a random rotation distance of 20, random zooming area in the range of 0.2, a random contrast in the range of 0.1, and, a random horizontal flip.

from tensorflow.keras import layers
#Data Augmention for to prevent Overfitting and to improve accuracy
data_augmentation1 = tf.keras.Sequential([
 layers.experimental.preprocessing.RandomFlip(
    "horizontal"),
layers.experimental.preprocessing.RandomZoom(height_factor=(-0.2, 0.2),fill_mode="constant"),
layers.experimental.preprocessing.RandomRotation(factor=(-0.2, 0.2),fill_mode="constant"),
tf.keras.layers.experimental.preprocessing.RandomContrast(0.1)])

x_train1=data_augmentation1(x_train)
y_train1=np.copy(y_train)
i=1

#22
while(i<22):
  x_aug=data_augmentation1(x)
  x_train1=np.concatenate((x_train1,x_aug),axis=0)
  y_aug=np.copy(y)
  y_train1=np.concatenate((y_train1,y_aug))

  #20
  if i == 20:
    break
  i += 1

During training, the input to our VGG-19 is a fixed-size 256 × 256 gray scale image. The image is passed through a stack of convolutional (conv.) layers, where we use filters with a very small receptive field: 3 × 3 .The padding is same for 3 × 3 convolution layers.Max-pooling is performed over a 2 × 2 pixel window, with stride 2. A stack of convolutional layers is followed by three Fully-Connected (FC) layers: the first two have 1024 channels each.The final layer is the soft-max layer.. All hidden layers are equipped with the rectification (ReLU (Krizhevsky et al., 2012)) non-linearity.

The Code:

def VGG19(input_shape,filters):
    inputs=tf.keras.layers.Input(shape=input_shape)
    
    x = tf.keras.layers.Conv2D(filters//16,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    x=tf.keras.layers.Dropout(0.1)(x)
    x = tf.keras.layers.Conv2D(filters//16,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)

    x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
    x = tf.keras.layers.Conv2D(filters//8,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.Dropout(0.2)(x)
    x = tf.keras.layers.Conv2D(filters//8,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)
    
    x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
    x = tf.keras.layers.Conv2D(filters//4,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.Dropout(0.3)(x)
    x = tf.keras.layers.Conv2D(filters//4,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Conv2D(filters//4,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)

    x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
    x = tf.keras.layers.Conv2D(filters//2,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.Dropout(0.4)(x)
    x = tf.keras.layers.Conv2D(filters//2,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Conv2D(filters//2,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)

    x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
    x = tf.keras.layers.Conv2D(filters,(3,3),activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.Dropout(0.5)(x)
    x = tf.keras.layers.Conv2D(filters,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(x)
    x=tf.keras.layers.BatchNormalization()(x)
    last = tf.keras.layers.Conv2D(filters,(3,3), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal',name='top_conv')(x)
    
    model=tf.keras.Model(inputs,last,name="VGG19")
    return model

In fully connective layers, we will use L1-L2 regulazier to prevent overfitting .

base_model=VGG19(input_shape=(256,256,1),filters=512)
x = base_model.output
f=tf.keras.layers.Flatten(name="flatten")(x)
#To prevent overfitting and unbalancing , used regularizer
d2=tf.keras.layers.Dense(1024,activation="relu",kernel_regularizer=tf.keras.regularizers.l1_l2(0.00001))(f)
dp9=tf.keras.layers.Dropout(0.5)(d2)
d3=tf.keras.layers.Dense(1024,activation="relu")(f)
dp10=tf.keras.layers.Dropout(0.5)(d2)

final=tf.keras.layers.Dense(6,activation="softmax")(dp10)
model = tf.keras.Model( inputs =[ base_model.input], outputs = final)

We will use “Categorical Cross-Entropy Function” for loss function. Also, the model will be trained with 35 epochs and 16 batch size. After each 15 epochs , we want to decrease learning rate in order to slowly converge to the model and prevent overfitting.

metrics=tf.keras.metrics.AUC(
    num_thresholds=200, curve='ROC',
    summation_method='interpolation'
)
#categorical_crossentropy
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), loss="categorical_crossentropy",metrics=metrics)


def lr_scheduler(epoch, lr):
    decay_rate = 0.1
    decay_step = 15
    if epoch % decay_step == 0 and epoch:
        return lr * decay_rate
    return lr

#after each 15 epochs , we want to decrease learning rate for converge to model
lr_call = tf.keras.callbacks.LearningRateScheduler(lr_scheduler)
epochs=35
history=model.fit(x=[x_train1],y=[y_train1],batch_size=16,epochs=epochs,callbacks=[lr_call],validation_data=(x_valid,y_valid))

Lets see our validation and training loss .

plt.figure(figsize = (20,10))
plt.title('Loss')
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()

Figure 3 — Loss and Epochs on Train and Validation Set

Let’s evaluete our model :

import sklearn
predict=model.predict(x_test)
auc = sklearn.metrics.roc_auc_score(y_test, predict)

Our AUC score is 0.734. ROC curve and AUC score are an important metric for the image classification.

Next ROC Curve :

y_test=np.reshape(y_test,(34*6))
predict=np.reshape(predict,(34*6))
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score

# keep probabilities for the positive outcome only
ns_probs = [0 for _ in range(len(y_test))]
# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, predict)
# summarize scores
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('Model: ROC AUC=%.3f' % (lr_auc))
# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, predict)
# plot the roc curve for the model
plt.figure(figsize = (20,10))
plt.title("ROC Curve",fontsize = 40)
plt.plot(ns_fpr, ns_tpr,label='No Skill')
plt.plot(lr_fpr, lr_tpr, label='Model')
# axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.rcParams["font.size"] = "15"

# show the legend
plt.legend()
# show the plot
plt.show()

Grad-CAM is a strict generalization of the Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is broadly applicable to any CNN-based architectures. We also show how GradCAM may be combined with existing pixel-space visualizations to create a high-resolution classdiscriminative visualization (Guided Grad-CAM). In this study ,we used GradCam method to examine results.

Now GradCam :

#The GradCam observes the results
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, classifier_layer_names ):
    # First, we create a model that maps the input image to the activations
    # of the last conv layer
    last_conv_layer = model.get_layer(last_conv_layer_name)
    last_conv_layer_model = keras.Model(model.inputs, last_conv_layer.output)
    # Second, we create a model that maps the activations of the last conv
    # layer to the final class predictions
    classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
    x = classifier_input
    for layer_name in classifier_layer_names:
        x = model.get_layer(layer_name)(x)
    classifier_model = keras.Model(classifier_input, x)
    # Then, we compute the gradient of the top predicted class for our input image
    # with respect to the activations of the last conv layer
    with tf.GradientTape() as tape:
        # Compute activations of the last conv layer and make the tape watch it
        last_conv_layer_output = last_conv_layer_model(img_array)
        tape.watch(last_conv_layer_output)
        # Compute class predictions
        preds = classifier_model(last_conv_layer_output)
        top_pred_index = tf.argmax(preds[0])
        top_class_channel = preds[:, top_pred_index]
    # This is the gradient of the top predicted class with regard to
    # the output feature map of the last conv layer
    grads = tape.gradient(top_class_channel, last_conv_layer_output)

    # This is a vector where each entry is the mean intensity of the gradient
    # over a specific feature map channel
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

    # We multiply each channel in the feature map array
    # by "how important this channel is" with regard to the top predicted class
    last_conv_layer_output = last_conv_layer_output.numpy()[0]
    pooled_grads = pooled_grads.numpy()
    for i in range(pooled_grads.shape[-1]):
        last_conv_layer_output[:, :, i] *= pooled_grads[i]

    # The channel-wise mean of the resulting feature map
    # is our heatmap of class activation
    heatmap = np.mean(last_conv_layer_output, axis=-1)

    # For visualization purpose, we will also normalize the heatmap between 0 & 1
    heatmap = np.maximum(heatmap, 0) / np.max(heatmap)
    return heatmapfrom tensorflow import keras
img_array=x_test[0,:,:,:]

img_array=np.reshape(img_array,(1,256,256,1))
preds = model.predict(img_array)
last_conv_layer_name = "top_conv"
classifier_layer_names = ["flatten"]   
              
# Generate class activation heatmap
heatmap = make_gradcam_heatmap(
    img_array, model, last_conv_layer_name, classifier_layer_names
)
img = keras.preprocessing.image.img_to_array(x_test[0,:,:,:])

Create Heatmap :

from tensorflow import keras
img_array=x_test[0,:,:,:]

img_array=np.reshape(img_array,(1,256,256,1))
preds = model.predict(img_array)
last_conv_layer_name = "top_conv"
classifier_layer_names = ["flatten"]   
              
# Generate class activation heatmap
heatmap = make_gradcam_heatmap(
    img_array, model, last_conv_layer_name, classifier_layer_names
)
img = keras.preprocessing.image.img_to_array(x_test[0,:,:,:])

*Figure 5– Original Image and GradCam Result*

In figure 5 , as can be seen, the model targetted tyhriod tumor for classification. This shows that the model focuses the tyhriod tumor for the classfication

In this study, the thyroid nodules classified 6 different classses which are 1 (Benign) , 2 (Benign) ,4a (Malign) , 4b (Malign),4c (Malign) ,5 (Malign). Data augmentation is a technique used to increase the number of training data artificially by changing the ratio of width to height, changing colors, or using horizontal flip. It is reported to be an essential technique required by deep learning algorithms to achieve good performance . We used data augmention technique in this study becasue we have limited and unbalanced data.Despite that the most study is about the classification of thyroid tumors as bening and malign , in our study , thyroid tumors were classified 6 different classes. In this line , considering 6 classes, the unbalancing issue on dataset occurs. In this case , the most important issue is unbalancing of dataset and not enough data for each class. However, considering these issues ,our proposed model shows promising results with 0.734 AUC score. This score shows that the model has skills to classify thyroid tumors.

The Full Code : https://github.com/SerdarHelli/The-Classification-of-Thyroid-Tumors-on-UltraSound-Images-using-Deep-Learning-Methods

References

[1] Pedraza, Lina & Vargas, Carlos & Narváez, Fabián & Durán, Oscar & Muñoz, Emma & Romero, Eduardo. (2015). An open access thyroid ultrasound-image Database. Progress in Biomedical Optics and Imaging — Proceedings of SPIE. 9287. 10.1117/12.2073532.

[2] Selvaraju, Ramprasaath R., et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization.” Proceedings of the IEEE international conference on computer vision. 2017.

[3] Abadi, Mart\’{\i}n, Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., … others. (2016). Tensorflow: A system for large-scale machine learning. In Symposium on Operating Systems Design and Implementation (pp. 265–283)

The Basic Classification of Thyroid Tumors on UltraSound Images using Deep Learning Methods

Written by S.Serdar Helli