# Loss Functions for Computer Vision Models

Machine learning algorithms are designed so that they can “learn” from their mistakes and “update” themselves using the training data we provide them. But how do they quantify these mistakes? This is done via the usage of “loss functions” that help an algorithm get a sense of how erroneous its predictions are when compared to the ground truth. Choosing an appropriate loss function is important as it affects the ability of the algorithm to produce optimum results as fast as possible.

# L2 LOSS

• This is very sensitive to outliers as the error is squared.

# CROSS-ENTROPY LOSS

• The formula for cross entropy (multi-class error) is as follows. It may also be called as categorical cross entropy. Here c=class_id and o=observation_id, p=probability
• The formula for cross entropy (binary class) is as follows. It may also be called as log loss. Here y = [0,1] and yˆ ε (0,1)

# Semantic Segmentation — PSPNet

Apart from the main branch using softmax loss to train the final classifier, another classifier is applied after the fourth stage, i.e., the res4b22 residue block.

• Here the softmax loss refers to softmax activation function followed by the cross-entropy loss function.
• Loss Calculation (code)
• The above code snippet defines loss on the masks for PSPNet. It is Split into three main section
• Step1: Reshaping the inputs
• Step2: Gathering the indices of interest
• Step3: Computing loss (Softmax Cross Entropy)
`raw_output_up = tf.argmax(raw_output_up, dimension=3)raw_output_up = tf.argmax(raw_output_up, dimension=3)`
• Here we calculate the class_id for each pixel by finding the mask with the max value across dimension=3 (depth)

# Instance Semantic Segmentation — MaskRCNN

The mask branch has a Km 2 — dimensional output for each RoI, which encodes K binary masks of resolution m × m, one for each of the K classes. To this we apply a per-pixel sigmoid, and define L mask as the average binary cross-entropy loss. For an RoI associated with ground-truth class k, L mask is only defined on the k-th mask (other mask outputs do not contribute to the loss).

• Network Architecture
• As can be seen below, Mask-R-CNN splits into three branches — classes, bounding box, and mask. Let’s focus on the mask branch since that’s the one used to create masks for various objects of interest.
• Loss Calculation (code)
• The above code snippet defines loss on the masks for MaskRCNN. It is Split into three main section
• Step1: Reshaping the inputs
• Step2: Gathering the indices of interest
• Step3: Computing loss (Binary CrossEntropy Loss)
• Inference
• The class branch predicts the class id of a region of interest and that mask is accordingly picked out from the prediction

# Conclusion

Recently, other loss functions such as the DICE loss are used in various medical image segmentation tasks as well.

Go ahead and play around with the repositories in the links above!

Originally published on Playment Blog

I'm a PhD Candidate at Leiden University Medical Centre. My research focuses on using deep learning for contour propagation of Organs at Risk in CT images.

## More from Prerak Mody

I'm a PhD Candidate at Leiden University Medical Centre. My research focuses on using deep learning for contour propagation of Organs at Risk in CT images.