Mask R-CNN for Object Detection using Supervise.ly and AWS

Shankhanil Borthakur
6 min readApr 11, 2021

--

What is Mask R-CNN?

Mask R-CNN has been the new state of the art in terms of instance segmentation. Mask R-CNN is a deep neural network aimed to solve the instance segmentation problem in machine learning or computer vision. In other words, you give it an image and it gives you the object bounding boxes, classes and masks. Mask-R-CNN is a conceptually simple, flexible, and general framework for object instance segmentation.

Description of the model

The Mask R-CNN framework is built on top of Faster R-CNN. So, for a given image, Mask R-CNN, in addition to the class labels and bounding box coordinates for each object, will also return the object mask. Below is the description of how Mask R-CNN works :

  • Faster R-CNN uses a ConvNet to extract feature maps from the images.
  • These feature maps are then passed through a Region Proposal Network (RPN) which returns the candidate bounding boxes.
  • We then apply an RoI pooling layer on these candidate bounding boxes to bring all the candidates to the same size.
  • The proposals are passed to a fully connected layer to classify and output the bounding boxes for the object.
  • Once we have the RoIs based on the IoU values and ROI align, we can add a mask branch to the existing architecture. The regions obtained by ROI align are passed through a fully connected network ( Mask head ), which consists of two convolution layers so that the class label and bounding boxes can be predicted. It generates a mask for each RoI, thus segmenting an image in a pixel-to-pixel manner. The returned mask is of size 28 X 28 for each region which is then scaled up for inference.

What is Supervise.ly?

Supervise.ly is a powerful platform for computer vision development, where individual researchers and large teams can annotate and experiment with datasets and neural networks and build deep learning solutions within a single environment.

Benefits of using supervise.ly

  1. Organize image annotation / data management / manipulation within a single platform at scale.
  2. Integrate custom NNs or user pre-trained models from Model Zoo, perform / track / reproduce tons of experiments.
  3. Use data science workflows out of the box: upload new data and continuously improve the accuracy of your neural networks.
  4. Combine different neural networks together into a single pipeline with post-processing stages and deploy these pipelines as API.
  5. Utilize NNs to speed up the image annotation process: the platform has trainable SmartTool, supports Active Learning and Human in the Loop.

Pre-requisites before starting off :

  1. Create a free account on supervise.ly and AWS cloud.
  2. Collect dataset for annotations and tagging the objects which we want to be segmented.

Let's get started…

Step-1 : Register on supervisely platform and create your own workspace as shown below (For example : name it “First Workspace”)

Step-2 : You will see our newly created workspace as shown below.

Step-3 : Navigate inside your workspace.

Step-4 : Next click on “IMPORT DATA” to import your image dataset to supervisely.

Step-5 : Give a name to your dataset and click on “START IMPORT”.

Step-6 : Navigate inside your uploaded dataset and open the supervisely data annotation tool as shown below.

Step-7 : Click on the polygon icon in the left column and create a class for it as shown below. Let's name it “Brick Kiln”.

Step-8 : Perform annotations for all the interested objects in the image with the help of the polygon tool.

Step-9 : Next, we need to add tags to the annotations so that our trained model can identify which class our test object belongs to.

Step-10 : After performing annotations for all the training images we need to run DTL for data augmentation. We will use DTL language that allows us to fully automate data manipulation which will help in making classes and mappings. For this, click on the datasets folder and then click on Run DTL from scratch.

Step-11 : Next, we need to create a training and test set for training our instance segmentation model.

Code of DTL

[
{
"dst": "$data",
"src": [
"Console Dataset/*"
],
"action": "data",
"settings": {
"classes_mapping": "default"
}
},
{
"dst": "$flip_vert",
"src": [
"$data"
],
"action": "flip",
"settings": {
"axis": "vertical"
}
},
{
"dst": "Console Dataset_Aug",
"src": [
"$data",
"$resized_result",
"$resized_result2",
"$noise_result",
"$flip_vert"
],
"action": "supervisely",
"settings": {}
},
{
"action": "resize",
"src": [
"$data"
],
"dst": "$resized_result",
"settings": {
"width": 800,
"height": -1,
"aspect_ratio": {
"keep": true
}
}
},
{
"action": "noise",
"src": [
"$data"
],
"dst": "$noise_result",
"settings": {
"mean": 10,
"std": 60
}
},
{
"action": "resize",
"src": [
"$data"
],
"dst": "$resized_result2",
"settings": {
"width": 300,
"height": -1,
"aspect_ratio": {
"keep": true
}
}
}

]

Step-12 : Next, go to “Neural Networks”. There search for “Mask RCNN model” and add it to your workspace as shown below.

Step-13 : Next we need to train the model. Supervise.ly doesn’t provide GPU to train models. Hence you have to use Local System or Cloud Services for the same. We have to give supervisely the resources to train the model, hence we need to give the agent machine. Go to the cluster tab and add the agent. To get those resources we can use AWS Cloud. Also, you need to make a request to extend your limit in the Mumbai region. The below video will guide you step by step on how to connect supervisely and AWS.

Step-14 : Add agent, Model title, Project in the plugin then use the RUN button to start training the model. (It is important that you use the 6.0.15 version of the plugin)

Step-15 : Your trained model will be saved as shown below. Next upload your test dataset. Click on “test” next to your trained model and select the test dataset folder and test the model.

Step-16 : Finally, you will see your test results as shown below.

After the model has been trained, upload it or download it for connecting to your web apps and applications as per your need or your client’s needs.

The complete workflow of the machine learning training.

Thank You for Reading!

--

--