Single Shot Multi-Box Detection ( SSD ):
- Different from CNN as CNN only classify images into classes whereas it tells the exact location of the object in the image it is Object Detection Technique based on the Convolution Operation.
Object Proposal: It Segments the image into multiple parts and comes up with a way in which it determines which segment has high potential or having an object.
- Single Shot means that the image is looked at only once.
- In terms of training, these single-shot algorithms have the ground truth which is the location of the object in the image and it also has the height and width of the object.
- Image Segmented
- Inside each segment, multiple boxes are created so that it analyzes and get the features of the object if present inside that box.
- Boxes with the Object in Interest are selected for the further operations.
- The Boxes may or may not have the full object but still, they can analyze if there is or there is not an object on it by the method of likelihood and getting the max number of features that they can.
How SSD exactly predicts the object positions?
- Through the training of the images and assessing the ground truth with the predicted output.
SSD Also solves the Scale Problem:
Problem: Objects with different scales or sizes that are present in a particular image cannot be analyzed due to the scale is called the scaling problem.
- Continuous reduction of the size of the image and the training is applied to each size of the image.
- Done by convolution Operation
You Only Look Once ( YOLO ):
- Previous Approach Sliding Window Approach used in DPM ( Deformed Parts Models )
- Reframes Object Detection as a Single Regression Problem.
- Also a single-shot object detection algorithm
- Single CNN used
- Some Facts:
- Runs at 45 FPS
- A Faster Version runs at 150 FPS
- Real-Time streaming video objects can be detected at 25ms of latency.
- Outperforms DPM and R-CNN
- Limitation: Small-Scale Objects.
- YOLO divides the image in S x S grid if the centre of an object falls into that grid cell then that grid cell is responsible for detecting an object.
- Each grid cell predicts B Bounding boxes and confidence scores for those boxes.
- Each bounding box = > 5 Preds = x,y,w,h,confidence.
- Confidence = P(Object) * IOUpredtruth
IOU = Intersection Over Union
- It is between the predicted bounding box and ground truth box.
Class-Specific Confidence given by:
P(Ci | Object ) * P( Object ) * IOUpredtruth = P( Ci ) * IOUpredtrut
SSD VS YOLO:
- YOLO Aspect ratios are fixed.
- SSD allows more aspect ratios ( 6 in total )
RESULT: Tighter Bounding Boxes.
- In YOLO there are 2 fully connected layers.
- In SSD convolution layers can be added after VGG.
RESULT: Multiple Scale Objects can be detected
Research Paper for reference :-