Leveraging Different Computer vision types, examples, and applications

-Ilina Navani

AI-based image processing is a key operation in recognizing and extracting useful information from an image. It involves gathering and organizing data to build predictive models that can be used to interpret images. Different models may be deployed to train AI algorithms through computer vision. However, the three main kinds of models used in machine learning are classification, segmentation, and object detection. Each of the models inputs an image, but they all produce different outputs. For example, the classification model would output a label for a detected object, whereas the segmentation model would go one step further to provide the location of the detected object in the image. Hence, we need to weigh the pros and cons of each model against the others to determine which one is most suited for a particular task. Subsequent sections describe three different image processing techniques and their applications to solve real-world problems.

Classification Model

As mentioned above, the classification model labels objects in an image. It requires a dataset of pre-labeled images which is then used to train the model. It is helpful to identify objects in an image but does not give information about object position. It is possible to build large datasets for classification models, such as CheXpert (Irwin et al. 2019), using existing databases, since labels can be extracted from radiograph reports.

Segmentation Model

The segmentation model provides a pixel-level classification for objects in an image. Each pixel in an image is labeled such that those with the same label share certain characteristics (for example, they may be assigned the same color). Therefore, by dividing an image into pixel groupings, the segmentation model simplifies the image and makes its objects easier to analyze.

Object Detection

Similar to the segmentation approach, the object detection approach deals with distinguishing objects in an image on the basis of their location. However, instead of doing so at a pixel level, object detection outputs the coordinates of each identified object. Hence, a typical output of this model would be a rectangle, with a width and height, enclosing the identified object. A highly specific object detection model could produce a polygon that exactly outlines an object; however, this is less common.

Similarities and differences between classification, segmentation, and object detection models

Figure 1. A visual description of the differences between image classification, object detection, and image segmentation models. Image reproduced from Dickinson B., 2021. 

While the basic techniques of each model are fairly straightforward, comparing and contrasting them to determine which one is most useful is tricky. Generally, the classification model is the easiest to use because it simply provides an identifier for an object without regard to localization. Hence, it is often possible to procure larger datasets for training classification models. Additionally, for objects in an image that don’t have well-defined boundaries(for example blurred images), it is best to use image classification. However, segmentation or object detection models almost always outperform classification models in recognizing objects that have a well-defined presence in the image. For example, a face detection is a common form of object detection, where only the faces in an image are identified and isolated within a rectangle.

Segmentation models are widely deployed because they are able to provide the exact outline of an object in an image, in a way that is distinguishable from other objects. The model is best used when one needs the exact area covered by an object in an image, unlike object detection which often incorporates parts of the background into the bounding box of the object. However, segmentation is not always viable because the costs associated with preparing training data are high. Assigning labels pixel-by-pixel is costly in time, money, and resources, but it also provides a more precise identification of objects in terms of classification as well as localization. As a result, segmentation is commonly used in healthcare industries for medical imaging. The model is able to accurately diagnose various types of diseases through computer vision, and predict possible outcomes to help radiologists make faster and more reliable decisions for treatment (Brown, R., 2019).

In conclusion, it is not easy to highlight any one of the three image processing techniques as being the most effective. Each technique has its benefits and costs, and experts must weigh them against each other to make an informed decision on the model best suited for the job at hand. Different scenarios may warrant different aspects of each model, but ultimately, all three models are important for object recognition in machine learning and AI development.

References

Brown, Roger. "What is the Difference Between Image Segmentation and Classification in Image Processing?" Medium, 28 Nov. 2019, medium.com/cogitotech/what-is-the-difference-between-image-segmentation-and-classification-in-image-processing-303d1f660626.

Dickson, Ben. "New Deep Learning Model Brings Image Segmentation to Edge Devices." VentureBeat, 13 May 2021,

venturebeat.com/2021/05/14/new-deep-learning-model-brings-image-segmentation-to-edge-devices/.

Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., … Ng, A. Y. (2019). CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv:1901.07031 [cs.CV]