R-CNN object detection with Keras, TensorFlow, and Deep Learning

Click here to download the source code to this post

In this tutorial, you will learn how to build an R-CNN object detector using Keras, TensorFlow, and Deep Learning.

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (2)

Today’s tutorial is the final part in our 4-part series on deep learning and object detection:

Part 1: Turning any CNN image classifier into an object detector with Keras, TensorFlow, and OpenCV
Part 2: OpenCV Selective Search for Object Detection
Part 3: Region proposal for object detection with OpenCV, Keras, and TensorFlow
Part 4: R-CNN object detection with Keras and TensorFlow (today’s tutorial)

Last week, you learned how to use region proposals and Selective Search to replace the traditional computer vision object detection pipeline of image pyramids and sliding windows:

Using Selective Search, we generated candidate regions (called “proposals”) that could contain an object of interest.
These proposals were passed in to a pre-trained CNN to obtain the actual classifications.
We then processed the results by applying confidence filtering and non-maxima suppression.

Our method worked well enough — but it raised some questions:

What if we wanted to train an object detection network on our own custom datasets?
How can we train that network using Selective Search search?
And how will using Selective Search change our object detection inference script?

In fact, these are the same questions that Girshick et al. had to consider in their seminal deep learning object detection paper Rich feature hierarchies for accurate object detection and semantic segmentation.

Each of these questions will be answered in today’s tutorial — and by the time you’re done reading it, you’ll have a fully functioning R-CNN, similar (yet simplified) to the one Girshick et al. implemented!

To learn how to build an R-CNN object detector using Keras and TensorFlow, just keep reading.

Looking for the source code to this post?

Jump Right To The Downloads Section

R-CNN object detection with Keras, TensorFlow, and Deep Learning

Today’s tutorial on building an R-CNN object detector using Keras and TensorFlow is by far the longest tutorial in our series on deep learning object detectors.

I would suggest you budget your time accordingly — it could take you anywhere from 40 to 60 minutes to read this tutorial in its entirety. Take it slow, as there are many details and nuances in the blog post (and don’t be afraid to read the tutorial 2-3x to ensure you fully comprehend it).

We’ll start our tutorial by discussing the steps required to implement an R-CNN object detector using Keras and TensorFlow.

From there, we’ll review the example object detection datasets we’ll be using here today.

Next, we’ll implement our configuration file along with a helper utility function used to compute object detection accuracy via Intersection over Union (IoU).

We’ll then build our object detection dataset by applying Selective Search.

Selective Search, along with a bit of post-processing logic, will enable us to identify regions of an input image that do and do not contain a potential object of interest.

We’ll take these regions and use them as our training data, fine-tuning MobileNet (pre-trained on ImageNet) to classify and recognize objects from our dataset.

Finally, we’ll implement a Python script that can be used for inference/prediction by applying Selective Search to an input image, classifying the region proposals generated by Selective Search, and then display the output R-CNN object detection results to our screen.

Let’s get started!

Steps to implementing an R-CNN object detector with Keras and TensorFlow

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (4)

Implementing an R-CNN object detector is a somewhat complex multistep process.

If you haven’t yet, make sure you’ve read the previous tutorials in this series to ensure you have the proper knowledge and prerequisites:

Turning any CNN image classifier into an object detector with Keras, TensorFlow, and OpenCV
OpenCV Selective Search for Object Detection
Region proposal for object detection with OpenCV, Keras, and TensorFlow

I’ll be assuming you have a working knowledge of how Selective Search works, how region proposals can be utilized in an object detection pipeline, and how to fine-tune a network.

With that said, below you can see our 6-step process to implementing an R-CNN object detector:

Step #1: Build an object detection dataset using Selective Search
Step #2: Fine-tune a classification network (originally trained on ImageNet) for object detection
Step #3: Create an object detection inference script that utilizes Selective Search to propose regions that could contain an object that we would like to detect
Step #4: Use our fine-tuned network to classify each region proposed via Selective Search
Step #5: Apply non-maxima suppression to suppress weak, overlapping bounding boxes
Step #6: Return the final object detection results

As I’ve already mentioned earlier, this tutorial is complex and covers many nuanced details.

Therefore, don’t be too hard on yourself if you need to go over it multiple times to ensure you understand our R-CNN object detection implementation.

With that in mind, let’s move on to reviewing our R-CNN project structure.

Our object detection dataset

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (5)

As Figure 2 shows, we’ll be training an R-CNN object detector to detect raccoons in input images.

This dataset contains 200 images with 217 total raccoons (some images contain more than one raccoon).

The dataset was originally curated by esteemed data scientist Dat Tran.

The GitHub repository for the raccoon dataset can be found here; however, for convenience I have included the dataset in the “Downloads” associated with this tutorial.

If you haven’t yet, make sure you use the “Downloads” section of this blog post to download the raccoon dataset and Python source code to allow you to follow along with the rest of this tutorial.

Configuring your development environment

To configure your system for this tutorial, I recommend following either of these tutorials:

How to install TensorFlow 2.0 on Ubuntu
How to install TensorFlow 2.0 on macOS

Either tutorial will help you configure your system with all the necessary software for this blog post in a convenient Python virtual environment.

Please note that PyImageSearch does not recommend or support Windows for CV/DL projects.

Project structure

If you haven’t yet, use the “Downloads” section to grab both the code and dataset for today’s tutorial.

Inside, you’ll find the following:

$ tree --dirsfirst --filelimit 10.├── dataset│ ├── no_raccoon [2200 entries]│ └── raccoon [1560 entries]├── images│ ├── raccoon_01.jpg│ ├── raccoon_02.jpg│ └── raccoon_03.jpg├── pyimagesearch│ ├── __init__.py│ ├── config.py│ ├── iou.py│ └── nms.py├── raccoons│ ├── annotations [200 entries]│ └── images [200 entries]├── build_dataset.py├── detect_object_rcnn.py├── fine_tune_rcnn.py├── label_encoder.pickle├── plot.png└── raccoon_detector.h58 directories, 13 files

As previously discussed, our raccoons/ dataset of images/ and annotations/ was curated and made available by Dat Tran. This dataset is not to be confused with the one that our build_dataset.py script produces — dataset/ — which is for the purpose of fine-tuning our MobileNet V2 model to create a raccoon classifier (raccoon_detector.h5).

The downloads include a pyimagesearch module with the following:

config.py: Holds our configuration settings, which will be used in our selection of Python scripts
iou.py: Computes the Intersection over Union (IoU), an object detection evaluation metric
nms.py: Performs non-maxima suppression (NMS) to eliminate overlapping boxes around objects

The components of the pyimagesearch module will come in handy in the following three Python scripts, which represent the bulk of what we are learning in this tutorial:

build_dataset.py: Takes Dat Tran’s raccoon dataset and creates a separate raccoon/ no_raccoon dataset, which we will use to fine-tune a MobileNet V2 model that is pre-trained on the ImageNet dataset
fine_tune_rcnn.py: Trains our raccoon classifier by means of fine-tuning
detect_object_rcnn.py: Brings all the pieces together to perform rudimentary R-CNN object detection, the key components being Selective Search and classification (note that this script does not accomplish true end-to-end R-CNN object detection by means of a model with a built-in Selective Search region proposal portion of the network)

Note: We will not be reviewing nms.py; please refer to my tutorial on Non-Maximum Suppression for Object Detection in Python as needed.

Implementing our object detection configuration file

Before we get too far in our project, let’s first implement a configuration file that will store key constants and settings, which we will use across multiple Python scripts.

Open up the config.py file in the pyimagesearch module, and insert the following code:

# import the necessary packagesimport os# define the base path to the *original* input dataset and then use# the base path to derive the image and annotations directoriesORIG_BASE_PATH = "raccoons"ORIG_IMAGES = os.path.sep.join([ORIG_BASE_PATH, "images"])ORIG_ANNOTS = os.path.sep.join([ORIG_BASE_PATH, "annotations"])

We begin by defining paths to the original raccoon dataset images and object detection annotations (i.e., bounding box information) on Lines 6-8.

Measuring object detection accuracy with Intersection over Union (IoU)

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (6)

In order to measure how “good” a job our object detector is doing at predicting bounding boxes, we’ll be using the Intersection over Union (IoU) metric.

The IoU method computes the ratio of the area of overlap to the area of the union between the predicted bounding box and the ground-truth bounding box:

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (7)

Examining this equation, you can see that Intersection over Union is simply a ratio:

In the numerator, we compute the area of overlap between the predicted bounding box and the ground-truth bounding box.
The denominator is the area of union, or more simply, the area encompassed by both the predicted bounding box and the ground-truth bounding box.
Dividing the area of overlap by the area of union yields our final score — the Intersection over Union (hence the name).

We’ll use IoU to measure object detection accuracy, including how much a given Selective Search proposal overlaps with a ground-truth bounding box (which is useful when we go to generate positive and negative examples for our training data).

If you’re interested in learning more about IoU, be sure to refer to my tutorial, Intersection over Union (IoU) for object detection.

Otherwise, let’s briefly review our IoU implementation now — open up the iou.py file in the pyimagesearch directory, and insert the following code:

def compute_iou(boxA, boxB):# determine the (x, y)-coordinates of the intersection rectanglexA = max(boxA[0], boxB[0])yA = max(boxA[1], boxB[1])xB = min(boxA[2], boxB[2])yB = min(boxA[3], boxB[3])# compute the area of intersection rectangleinterArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)# compute the area of both the prediction and ground-truth# rectanglesboxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)# compute the intersection over union by taking the intersection# area and dividing it by the sum of prediction + ground-truth# areas - the intersection areaiou = interArea / float(boxAArea + boxBArea - interArea)# return the intersection over union valuereturn iou

The comptue_iou function accepts two parameters, boxA and boxB, which are the ground-truth and predicted bounding boxes for which we seek to compute the Intersection over Union (IoU). Order of the parameters does not matter for the purposes of our computation.

Inside, we begin by computing both the top-right and bottom-left (x, y)-coordinates of the bounding boxes (Lines 3-6).

Using the bounding box coordinates, we compute the intersection (overlapping area) of the bounding boxes (Line 9). This value is the numerator for the IoU forumula.

To determine the denominator, we need to derive the area of both the predicted and ground-truth bounding boxes (Lines 13 and 14).

The Intersection over Union can then be calculated on Line 19 by dividing the intersection area (numerator) by the union area of the two bounding boxes (denominator), taking care to subtract out the intersection area (otherwise the intersection area would be doubly counted).

Line 22 returns the IoU result.

Implementing our object detection dataset builder script

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (8)

Before we can create our R-CNN object detector, we first need to build our dataset, accomplishing Step #1 from our list of six steps for today’s tutorial.

Our build_dataset.py script will:

1. Accept our input raccoons dataset
2. Loop over all images in the dataset
- 2a. Load thea given input image
- 2b. Load and parse the bounding box coordinates for any raccoons in the input image
3. Run Selective Search on the input image
4. Use IoU to determine which region proposals from Selective Search sufficiently overlap with the ground-truth bounding boxes and which ones do not
5. Save region proposals as overlapping (contains raccoon) or not (no raccoon)

Once our dataset is built, we will be able to work on Step #2 — fine-tuning an object detection network.

Now that we understand the dataset builder at a high level, let’s implement it. Open the build_dataset.py file, and follow along:

# import the necessary packagesfrom pyimagesearch.iou import compute_ioufrom pyimagesearch import configfrom bs4 import BeautifulSoupfrom imutils import pathsimport cv2import os

In addition to our IoU and configuration settings (Lines 2 and 3), this script requires Beautifulsoup, imutils, and OpenCV. If you followed the “Configuring your development environment” section above, your system has all of these tools at your disposal.

Now that our imports are taken care of, lets create two empty directories and build a list of all the raccoon images:

# loop over the output positive and negative directoriesfor dirPath in (config.POSITVE_PATH, config.NEGATIVE_PATH):# if the output directory does not exist yet, create itif not os.path.exists(dirPath):os.makedirs(dirPath)# grab all image paths in the input images directoryimagePaths = list(paths.list_images(config.ORIG_IMAGES))# initialize the total number of positive and negative images we have# saved to disk so fartotalPositive = 0totalNegative = 0

Our positive and negative directories will soon contain our raccoon or no raccoon images. Lines 10-13 create these directories if they don’t yet exist.

Then, Line 16 grabs all input image paths in our raccoons dataset directory, storing them in the imagePaths list.

Our totalPositive and totalNegative accumulators (Lines 20 and 21) will hold the final counts of our raccoon or no raccoon images, but more importantly, our filenames will be derived from the count as our loop progresses.

Speaking of such a loop, let’s begin looping over all of the imagePaths in our dataset:

# loop over the image pathsfor (i, imagePath) in enumerate(imagePaths):# show a progress reportprint("[INFO] processing image {}/{}...".format(i + 1,len(imagePaths)))# extract the filename from the file path and use it to derive# the path to the XML annotation filefilename = imagePath.split(os.path.sep)[-1]filename = filename[:filename.rfind(".")]annotPath = os.path.sep.join([config.ORIG_ANNOTS,"{}.xml".format(filename)])# load the annotation file, build the soup, and initialize our# list of ground-truth bounding boxescontents = open(annotPath).read()soup = BeautifulSoup(contents, "html.parser")gtBoxes = []# extract the image dimensionsw = int(soup.find("width").string)h = int(soup.find("height").string)

Inside our loop over imagePaths, Lines 31-34 derive the image path’s associated XML annotation file path in (PASCAL VOC format.) — this file contains the ground-truth object detection annotations for the current image.

From there, Lines 38 and 39 load and parse the XML object.

Our gtBoxes list will soon hold our dataset’s ground-truth bounding boxes (Line 40).

The first pieces of data we extract from our PASCAL VOC XML annotation file are the image dimensions (Lines 43 and 44).

Next, we’ll grab bounding box coordinates from all the <object> elements in our annotation file:

# loop over all 'object' elementsfor o in soup.find_all("object"):# extract the label and bounding box coordinateslabel = o.find("name").stringxMin = int(o.find("xmin").string)yMin = int(o.find("ymin").string)xMax = int(o.find("xmax").string)yMax = int(o.find("ymax").string)# truncate any bounding box coordinates that may fall# outside the boundaries of the imagexMin = max(0, xMin)yMin = max(0, yMin)xMax = min(w, xMax)yMax = min(h, yMax)# update our list of ground-truth bounding boxesgtBoxes.append((xMin, yMin, xMax, yMax))

Looping over all <object> elements from the XML file (i.e,. the actual ground-truth bounding boxes), we:

Extract the label as well as the bounding box coordinates (Lines 49-53)
Ensure bounding box coordinates do not fall outside bounds of image spatial dimensions by truncating them accordingly (Lines 57-60)
Update our list of ground-truth bounding boxes (Line 63)

At this point, we need to load an image and perform Selective Search:

# load the input image from diskimage = cv2.imread(imagePath)# run selective search on the image and initialize our list of# proposed boxesss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()ss.setBaseImage(image)ss.switchToSelectiveSearchFast()rects = ss.process()proposedRects= []# loop over the rectangles generated by selective searchfor (x, y, w, h) in rects:# convert our bounding boxes from (x, y, w, h) to (startX,# startY, startX, endY)proposedRects.append((x, y, x + w, y + h))

Here, we load an image from the dataset (Line 66), perform Selective Search to find region proposals (Lines 70-73), and populate our proposedRects list with the results (Lines 74-80).

Now that we have (1) ground-truth bounding boxes and (2) region proposals generated by Selective Search, we will use IoU to determine which regions overlap sufficiently with the ground-truth boxes and which do not:

# initialize counters used to count the number of positive and# negative ROIs saved thus farpositiveROIs = 0negativeROIs = 0# loop over the maximum number of region proposalsfor proposedRect in proposedRects[:config.MAX_PROPOSALS]:# unpack the proposed rectangle bounding box(propStartX, propStartY, propEndX, propEndY) = proposedRect# loop over the ground-truth bounding boxesfor gtBox in gtBoxes:# compute the intersection over union between the two# boxes and unpack the ground-truth bounding boxiou = compute_iou(gtBox, proposedRect)(gtStartX, gtStartY, gtEndX, gtEndY) = gtBox# initialize the ROI and output pathroi = NoneoutputPath = None

We will refer to:

positiveROIs as the number of region proposals for the current image that (1) sufficiently overlap with ground-truth annotations and (2) are saved to disk in the path contained in config.POSITIVE_PATH
negativeROIs as the number of region proposals for the current image that (1) fail to meet our IoU threshold of 70% and (2) are saved to disk to the config.NEGATIVE_PATH

We initialize both of these counters on Lines 84 and 85.

Beginning on Line 88, we loop over region proposals generated by Selective Search (up to our defined maximum proposal count). Inside, we:

Unpack the current bounding box generated by Selective Search (Line 90).
Loop over all the ground-truth bounding boxes (Line 93).
Compute the IoU between the region proposal bounding box and the ground-truth bounding box (Line 96). This iou value will serve as our threshold to determine if a region proposal is a positive ROI or negative ROI.
Initialize the roi along with its outputPath (Lines 100 and 101).

Let’s determine if this proposedRect and gtBox pair is a positive ROI:

# check to see if the IOU is greater than 70% *and* that# we have not hit our positive count limitif iou > 0.7 and positiveROIs <= config.MAX_POSITIVE:# extract the ROI and then derive the output path to# the positive instanceroi = image[propStartY:propEndY, propStartX:propEndX]filename = "{}.png".format(totalPositive)outputPath = os.path.sep.join([config.POSITVE_PATH,filename])# increment the positive counterspositiveROIs += 1totalPositive += 1

Assuming this particular region passes the check to see if we have an IoU > 70% and we have not yet hit our limit on positive examples for the current image (Line 105), we simply:

Extract the positive roi via NumPy slicing (Line 108)
Construct the outputPath to where the ROI will be exported (Lines 109-111)
Increment our positive counters (Lines 114 and 115)

In order to determine if this proposedRect and gtBox pair is a negative ROI, we first need to check whether we have a full overlap:

# determine if the proposed bounding box falls *within*# the ground-truth bounding boxfullOverlap = propStartX >= gtStartXfullOverlap = fullOverlap and propStartY >= gtStartYfullOverlap = fullOverlap and propEndX <= gtEndXfullOverlap = fullOverlap and propEndY <= gtEndY

If the region proposal bounding box (proposedRect) falls entirely within the ground-truth bounding box (gtBox), then we have what I call a fullOverlap.

The logic on Lines 119-122 inspects the (x, y)-coordinates to determine whether we have such a fullOverlap.

We’re now ready to handle the case where our proposedRect and gtBox are considered a negative ROI:

# check to see if there is not full overlap *and* the IoU# is less than 5% *and* we have not hit our negative# count limitif not fullOverlap and iou < 0.05 and \negativeROIs <= config.MAX_NEGATIVE:# extract the ROI and then derive the output path to# the negative instanceroi = image[propStartY:propEndY, propStartX:propEndX]filename = "{}.png".format(totalNegative)outputPath = os.path.sep.join([config.NEGATIVE_PATH,filename])# increment the negative countersnegativeROIs += 1totalNegative += 1

Here, our conditional (Lines 127 and 128) checks to see if all of the following hold true:

There is not full overlap
The IoU is sufficiently small
Our limit on the number of negative examples for the current image is not exceeded

If all checks pass, we:

Extract the negative roi (Line 131)
Construct the path to where the ROI will be stored (Lines 132-134)
Increment the negative counters (Lines 137 and 138)

At this point, we’ve reached our final task for building the dataset: exporting the current roi to the appropriate directory:

# check to see if both the ROI and output path are validif roi is not None and outputPath is not None:# resize the ROI to the input dimensions of the CNN# that we'll be fine-tuning, then write the ROI to# diskroi = cv2.resize(roi, config.INPUT_DIMS,interpolation=cv2.INTER_CUBIC)cv2.imwrite(outputPath, roi)

Assuming both the ROI and associated output path are not None, (Line 141), we simply resize the ROI according to our CNN input dimensions and write the ROI to disk (Lines 145-147).

Recall that each ROI’s outputPath is based on either the config.POSITIVE_PATH or config.NEGATIVE_PATH as well as the current totalPositive or totalNegative count.

Therefore, our ROIs are sorted according to the purpose of this script as either dataset/raccoon or dataset/no_raccoon.

In the next section, we’ll put this script to work for us!

Preparing our image dataset for object detection

We are now ready to build our image dataset for R-CNN object detection.

Fine-tuning a network for object detection with Keras and TensorFlow

With our dataset created via the previous two sections (Step #1), we’re now ready to fine-tune a classification CNN to recognize both of these classes (Step #2).

When we combine this classifier with Selective Search, we’ll be able to build our R-CNN object detector.

For the purposes of this tutorial, I’ve chosen to fine-tune the MobileNet V2 CNN, which is pre-trained on the 1,000-class ImageNet dataset. I recommend that you read up on the concepts of transfer learning and fine-tuning if you are not familiar with them:

Transfer Learning with Keras and Deep Learning (be sure to read from the beginning through the “Two types of transfer learning: feature extraction and fine tuning” section at a minimum)
Fine-tuning with Keras and Deep Learning (I highly recommend reading this tutorial in its entirety)

The result of fine-tuning MobileNet will be a classifier that distinguishes between our raccoon and no_raccoon classes.

When you’re ready, open the fine_tune_rcnn.py file in your project directory structure, and let’s get started:

# import the necessary packagesfrom pyimagesearch import configfrom tensorflow.keras.preprocessing.image import ImageDataGeneratorfrom tensorflow.keras.applications import MobileNetV2from tensorflow.keras.layers import AveragePooling2Dfrom tensorflow.keras.layers import Dropoutfrom tensorflow.keras.layers import Flattenfrom tensorflow.keras.layers import Densefrom tensorflow.keras.layers import Inputfrom tensorflow.keras.models import Modelfrom tensorflow.keras.optimizers import Adamfrom tensorflow.keras.applications.mobilenet_v2 import preprocess_inputfrom tensorflow.keras.preprocessing.image import img_to_arrayfrom tensorflow.keras.preprocessing.image import load_imgfrom tensorflow.keras.utils import to_categoricalfrom sklearn.preprocessing import LabelBinarizerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportfrom imutils import pathsimport matplotlib.pyplot as pltimport numpy as npimport argparseimport pickleimport os

Phew! That’s a metric ton of imports we’ll be using for this script. Let’s break them down:

config: Our Python configuration file consisting of paths and constants.
ImageDataGenerator: For the purposes of data augmentation.
MobileNetV2: The MobileNet CNN architecture is common, so it is built-in to TensorFlow/Keras. For the purposes of fine-tuning, we’ll load the network with pre-trained ImageNet weights, chop off the network’s head and replace it, and tune/train until our network is performing well.
tensorflow.keras.layers: A selection of CNN layer types are used to build/replace the head of MobileNet V2.
Adam: An optimizer alternative to Stochastic Gradient Descent (SGD).
LabelBinarizer and to_categorical: Used in conjunction to perform one-hot encoding of our class labels.
train_test_split: Conveniently helps us segment our dataset into training and testing sets.
classification_report: Computes a statistical summary of our model evaluation results.
matplotlib: Python’s de facto plotting package will be used to generate accuracy/loss curves from our training history data.

With our imports ready to go, let’s parse command line arguments and set our hyperparameter constants:

# construct the argument parser and parse the argumentsap = argparse.ArgumentParser()ap.add_argument("-p", "--plot", type=str, default="plot.png",help="path to output loss/accuracy plot")args = vars(ap.parse_args())# initialize the initial learning rate, number of epochs to train for,# and batch sizeINIT_LR = 1e-4EPOCHS = 5BS = 32

The --plot command line argument defines the path to our accuracy/loss plot (Lines 27-30).

We then establish training hyperparameters including our initial learning rate, number of training epochs, and batch size (Lines 34-36).

Loading our dataset is straightforward, since we did all the hard work already in Step #1:

# grab the list of images in our dataset directory, then initialize# the list of data (i.e., images) and class labelsprint("[INFO] loading images...")imagePaths = list(paths.list_images(config.BASE_PATH))data = []labels = []# loop over the image pathsfor imagePath in imagePaths:# extract the class label from the filenamelabel = imagePath.split(os.path.sep)[-2]# load the input image (224x224) and preprocess itimage = load_img(imagePath, target_size=config.INPUT_DIMS)image = img_to_array(image)image = preprocess_input(image)# update the data and labels lists, respectivelydata.append(image)labels.append(label)

Recall that our new dataset lives in the path defined by config.BASE_PATH. Line 41 grabs all the imagePaths located in the base path and its class subdirectories.

From there, we seek to populate our data and labels lists (Lines 42 and 43). To do so, we define a loop over the imagePaths (Line 46) and proceed to:

Extract the particular image’s class label directly from the path (Line 48)
Load and pre-process the image, specifying the target_size according to the input dimensions of the MobileNet V2 CNN (Lines 51-53)
Append the image and label to the data and labels lists

We have a few more steps to take care of to prepare our data:

# convert the data and labels to NumPy arraysdata = np.array(data, dtype="float32")labels = np.array(labels)# perform one-hot encoding on the labelslb = LabelBinarizer()labels = lb.fit_transform(labels)labels = to_categorical(labels)# partition the data into training and testing splits using 75% of# the data for training and the remaining 25% for testing(trainX, testX, trainY, testY) = train_test_split(data, labels,test_size=0.20, stratify=labels, random_state=42)# construct the training image generator for data augmentationaug = ImageDataGenerator(rotation_range=20,zoom_range=0.15,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.15,horizontal_flip=True,fill_mode="nearest")

Here we:

Convert the data and label lists to NumPy arrays (Lines 60 and 61)
One-hot encode our labels (Lines 64-66)
Construct our training and testing data splits (Lines 70 and 71)
Initialize our data augmentation object with settings for random mutations of our data to improve our model’s ability to generalize (Lines 74-81)

Now that our data is ready, let’s prepare MobileNet V2 for fine-tuning:

# load the MobileNetV2 network, ensuring the head FC layer sets are# left offbaseModel = MobileNetV2(weights="imagenet", include_top=False,input_tensor=Input(shape=(224, 224, 3)))# construct the head of the model that will be placed on top of the# the base modelheadModel = baseModel.outputheadModel = AveragePooling2D(pool_size=(7, 7))(headModel)headModel = Flatten(name="flatten")(headModel)headModel = Dense(128, activation="relu")(headModel)headModel = Dropout(0.5)(headModel)headModel = Dense(2, activation="softmax")(headModel)# place the head FC model on top of the base model (this will become# the actual model we will train)model = Model(inputs=baseModel.input, outputs=headModel)# loop over all layers in the base model and freeze them so they will# *not* be updated during the first training processfor layer in baseModel.layers:layer.trainable = False

To ensure our MobileNet V2 CNN is ready to be fine-tuned, we use the following approach:

Load MobileNet pre-trained on the ImageNet dataset, leaving off fully-connect (FC) head
Construct a new FC head
Append the new FC head to the MobileNet base resulting in our model
Freeze the base layers of MobileNet (i.e., set them as not trainable)

Take a step back to consider what we’ve just accomplished in this code block. The MobileNet base of our network has pre-trained weights that are frozen. We will only train the head of the network. Notice that the head of our network has a Softmax Classifier with 2 outputs corresponding to our raccoon and no_raccoon classes.

So far, in this script, we’ve loaded our data, initialized our data augmentation object, and prepared for fine tuning. We’re now ready to fine-tune our model:

# compile our modelprint("[INFO] compiling model...")opt = Adam(lr=INIT_LR)model.compile(loss="binary_crossentropy", optimizer=opt,metrics=["accuracy"])# train the head of the networkprint("[INFO] training head...")H = model.fit(aug.flow(trainX, trainY, batch_size=BS),steps_per_epoch=len(trainX) // BS,validation_data=(testX, testY),validation_steps=len(testX) // BS,epochs=EPOCHS)

We compile our model with the Adam optimizer and binary crossentropy loss.

Note: If you are using this script as a basis for training with a dataset of three or more classes, ensure you do the following: (1) Use "categorical_crossentropy" loss on Lines 109 and 110, and (2) set your Softmax Classifier outputs accordingly on Line 95 (we’re using 2 in this tutorial because we have two classes).

Training launches via Lines 114-119. Since TensorFlow 2.0 was released, the fit method can handle data augmentation generators, whereas previously we relied on the fit_generator method. For more details on these two methods, be sure to read my updated tutorial: How to use Keras fit and fit_generator (a hands-on tutorial).

Once training draws to a close, our model is ready for evaluation on the test set:

# make predictions on the testing setprint("[INFO] evaluating network...")predIdxs = model.predict(testX, batch_size=BS)# for each image in the testing set we need to find the index of the# label with corresponding largest predicted probabilitypredIdxs = np.argmax(predIdxs, axis=1)# show a nicely formatted classification reportprint(classification_report(testY.argmax(axis=1), predIdxs,target_names=lb.classes_))

Line 123 makes predictions on our testing set, and then Line 127 grabs all indices of the labels with the highest predicted probability.

We then print our classification_report to the terminal for statistical analysis (Lines 130 and 131).

Let’s go ahead and export both our (1) trained model and (2) label encoder:

# serialize the model to diskprint("[INFO] saving mask detector model...")model.save(config.MODEL_PATH, save_format="h5")# serialize the label encoder to diskprint("[INFO] saving label encoder...")f = open(config.ENCODER_PATH, "wb")f.write(pickle.dumps(lb))f.close()

Line 135 serializes our model to disk. For TensorFlow 2.0+, I recommend explicitly setting the save_format="h5" (HDF5 format).

Our label encoder is serialized to disk in Python’s pickle format (Lines 139-141).

To close out, we’ll plot our accuracy/loss curves from our training history:

# plot the training loss and accuracyN = EPOCHSplt.style.use("ggplot")plt.figure()plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")plt.title("Training Loss and Accuracy")plt.xlabel("Epoch #")plt.ylabel("Loss/Accuracy")plt.legend(loc="lower left")plt.savefig(args["plot"])

Using matplotlib, we plot the accuracy and loss curves for inspection (Lines 144-154). We export the resulting figure to the path contained in the --plot command line argument.

Training our R-CNN object detection network with Keras and TensorFlow

We are now ready to fine-tune our mobile such that we can create an R-CNN object detector!

If you haven’t yet, go to the “Downloads” section of this tutorial to download the source code and sample dataset.

From there, open up a terminal, and execute the following command:

$ time python fine_tune_rcnn.py[INFO] loading images...[INFO] compiling model...[INFO] training head...Train for 94 steps, validate on 752 samplesTrain for 94 steps, validate on 752 samplesEpoch 1/594/94 [==============================] - 77s 817ms/step - loss: 0.3072 - accuracy: 0.8647 - val_loss: 0.1015 - val_accuracy: 0.9728Epoch 2/594/94 [==============================] - 74s 789ms/step - loss: 0.1083 - accuracy: 0.9641 - val_loss: 0.0534 - val_accuracy: 0.9837Epoch 3/594/94 [==============================] - 71s 756ms/step - loss: 0.0774 - accuracy: 0.9784 - val_loss: 0.0433 - val_accuracy: 0.9864Epoch 4/594/94 [==============================] - 74s 784ms/step - loss: 0.0624 - accuracy: 0.9781 - val_loss: 0.0367 - val_accuracy: 0.9878Epoch 5/594/94 [==============================] - 74s 791ms/step - loss: 0.0590 - accuracy: 0.9801 - val_loss: 0.0340 - val_accuracy: 0.9891[INFO] evaluating network... precision recall f1-score support no_raccoon 1.00 0.98 0.99 440 raccoon 0.97 1.00 0.99 312 accuracy 0.99 752 macro avg 0.99 0.99 0.99 752weighted avg 0.99 0.99 0.99 752[INFO] saving mask detector model...[INFO] saving label encoder...real6m37.851suser31m43.701ssys 33m53.058s

Fine-tuning MobileNet on my 3Ghz Intel Xeon W processor took ~6m30 seconds, and as you can see, we are obtaining ~99% accuracy.

And as our training plot shows, there are little signs of overfitting:

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (10)

With our MobileNet model fine-tuned for raccoon prediction, we’re ready to put all the pieces together and create our R-CNN object detection pipeline!

Putting the pieces together: Implementing our R-CNN object detection inference script

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (11)

So far, we’ve accomplished:

Step #1: Build an object detection dataset using Selective Search
Step #2: Fine-tune a classification network (originally trained on ImageNet) for object detection

At this point, we’re going to put our trained model to work to perform object detection inference on new images.

Accomplishing our object detection inference script accounts for Step #3 – Step #6. Let’s review those steps now:

Step #3: Create an object detection inference script that utilized Selective Search to propose regions that could contain an object that we would like to detect
Step #4: Use our fine-tuned network to classify each region proposed via Selective Search
Step #5: Apply non-maxima suppression to suppress weak, overlapping bounding boxes
Step #6: Return the final object detection results

We will take Step #6 a bit further and display the results so we can visually verify that our system is working.

Let’s implement the R-CNN object detection pipeline now — open up a new file, name it detect_object_rcnn.py, and insert the following code:

# import the necessary packagesfrom pyimagesearch.nms import non_max_suppressionfrom pyimagesearch import configfrom tensorflow.keras.applications.mobilenet_v2 import preprocess_inputfrom tensorflow.keras.preprocessing.image import img_to_arrayfrom tensorflow.keras.models import load_modelimport numpy as npimport argparseimport imutilsimport pickleimport cv2# construct the argument parser and parse the argumentsap = argparse.ArgumentParser()ap.add_argument("-i", "--image", required=True,help="path to input image")args = vars(ap.parse_args())

Most of this script’s imports should look familiar by this point if you’ve been following along. The one that sticks out is non_max_suppression (Line 2). Be sure to read my tutorial on Non-Maximum Suppression for Object Detection in Python if you want to study what NMS entails.

Our script accepts the --image command line argument, which points to our input image path (Lines 14-17).

From here, let’s (1) load our model, (2) load our image, and (3) perform Selective Search:

# load the our fine-tuned model and label binarizer from diskprint("[INFO] loading model and label binarizer...")model = load_model(config.MODEL_PATH)lb = pickle.loads(open(config.ENCODER_PATH, "rb").read())# load the input image from diskimage = cv2.imread(args["image"])image = imutils.resize(image, width=500)# run selective search on the image to generate bounding box proposal# regionsprint("[INFO] running selective search...")ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()ss.setBaseImage(image)ss.switchToSelectiveSearchFast()rects = ss.process()

Lines 21 and 22 load our fine-tuned raccoon model and associated label binarizer.

We then load our input --image and resize it to a known width (Lines 25 and 26).

Next, we perform Selective Search on our image to generate our region proposals (Lines 31-34).

At this point, we’re going to extract each of our proposal ROIs and pre-process them:

# initialize the list of region proposals that we'll be classifying# along with their associated bounding boxesproposals = []boxes = []# loop over the region proposal bounding box coordinates generated by# running selective searchfor (x, y, w, h) in rects[:config.MAX_PROPOSALS_INFER]:# extract the region from the input image, convert it from BGR to# RGB channel ordering, and then resize it to the required input# dimensions of our trained CNNroi = image[y:y + h, x:x + w]roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)roi = cv2.resize(roi, config.INPUT_DIMS,interpolation=cv2.INTER_CUBIC)# further preprocess the ROIroi = img_to_array(roi)roi = preprocess_input(roi)# update our proposals and bounding boxes listsproposals.append(roi)boxes.append((x, y, x + w, y + h))

First, we initialize a list to hold our ROI proposals and another to hold the (x, y)-coordinates of our bounding boxes (Lines 38 and 39).

We define a loop over the region proposal bounding boxes generated by Selective Search (Line 43). Inside the loop, we extract the roi via NumPy slicing and pre-process using the same steps as in our build_dataset.py script (Lines 47-54).

Both the roi and (x, y)-coordinates are then added to the proposals and boxes lists (Lines 57 and 58).

Next, we’ll classify all of our proposals:

# convert the proposals and bounding boxes into NumPy arraysproposals = np.array(proposals, dtype="float32")boxes = np.array(boxes, dtype="int32")print("[INFO] proposal shape: {}".format(proposals.shape))# classify each of the proposal ROIs using fine-tuned modelprint("[INFO] classifying proposals...")proba = model.predict(proposals)

Lines 61 and 62 convert our proposals and boxes into NumPy arrays with the specified datatype.

Calling the predict method on our batch of proposals performs inference and returns the predictions (Line 67).

Keep in mind that we have used a classifier on our Selective Search region proposals here. We’re using a combination of classification and Selective Search to conduct object detection. Our boxes contain the locations (i.e., coordinates) of our original input --image for where our objects (either raccoon or no_raccoon) are. The remaining code blocks localize and annotate our raccoon predictions.

Let’s go ahead and filter for all the raccoon predictions, dropping the no_raccoon results:

# find the index of all predictions that are positive for the# "raccoon" classprint("[INFO] applying NMS...")labels = lb.classes_[np.argmax(proba, axis=1)]idxs = np.where(labels == "raccoon")[0]# use the indexes to extract all bounding boxes and associated class# label probabilities associated with the "raccoon" classboxes = boxes[idxs]proba = proba[idxs][:, 1]# further filter indexes by enforcing a minimum prediction# probability be metidxs = np.where(proba >= config.MIN_PROBA)boxes = boxes[idxs]proba = proba[idxs]

To filter for raccoon results, we:

Extract all predictions that are positive for raccoon (Lines 72 and 73)
Use indices to extract all bounding boxes and class label probabilities associated with the raccoon class (Lines 77 and 78)
Further filter indexes by enforcing a minimum probability (Lines 82-84)

We’re now going to visualize the results without applying NMS:

# clone the original image so that we can draw on itclone = image.copy()# loop over the bounding boxes and associated probabilitiesfor (box, prob) in zip(boxes, proba):# draw the bounding box, label, and probability on the image(startX, startY, endX, endY) = boxcv2.rectangle(clone, (startX, startY), (endX, endY),(0, 255, 0), 2)y = startY - 10 if startY - 10 > 10 else startY + 10text= "Raccoon: {:.2f}%".format(prob * 100)cv2.putText(clone, text, (startX, y),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)# show the output after *before* running NMScv2.imshow("Before NMS", clone)

Looping over bounding boxes and probabilities that are predicted to contain raccoons (Line 90), we:

Extract the bounding box coordinates (Line 92)
Draw the bounding box rectangle (Lines 93 and 94)
Draw the label and probability text at the top-left corner of the bounding box (Lines 95-98)

From there, we display the before NMS visualization (Line 101).

Let’s apply NMS and see how the result compares:

# run non-maxima suppression on the bounding boxesboxIdxs = non_max_suppression(boxes, proba)# loop over the bounding box indexesfor i in boxIdxs:# draw the bounding box, label, and probability on the image(startX, startY, endX, endY) = boxes[i]cv2.rectangle(image, (startX, startY), (endX, endY),(0, 255, 0), 2)y = startY - 10 if startY - 10 > 10 else startY + 10text= "Raccoon: {:.2f}%".format(proba[i] * 100)cv2.putText(image, text, (startX, y),cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)# show the output image *after* running NMScv2.imshow("After NMS", image)cv2.waitKey(0)

We apply non-maxima suppression (NMS) via Line 104, effectively eliminating overlapping rectangles around objects.

From there, Lines 107-119 draw the bounding boxes, labels, and probabilities and display the after NMS results until a key is pressed.

Great job implementing your elementary R-CNN object detection script using TensorFlow/Keras, OpenCV, and Python.

R-CNN object detection results using Keras and TensorFlow

At this point, we have fully implemented a bare-bones R-CNN object detection pipeline using Keras, TensorFlow, and OpenCV.

Are you ready to see it in action?

Start by using the “Downloads” section of this tutorial to download the source code, example dataset, and pre-trained R-CNN detector.

From there, you can execute the following command:

$ python detect_object_rcnn.py --image images/raccoon_01.jpg[INFO] loading model and label binarizer...[INFO] running selective search...[INFO] proposal shape: (200, 224, 224, 3)[INFO] classifying proposals...[INFO] applying NMS...

Here, you can see that two raccoon bounding boxes were found after applying our R-CNN object detector:

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (12)

By applying non-maxima suppression, we can suppress the weaker one, leaving with the one correct bounding box:

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (13)

Let’s try another image:

$ python detect_object_rcnn.py --image images/raccoon_02.jpg[INFO] loading model and label binarizer...[INFO] running selective search...[INFO] proposal shape: (200, 224, 224, 3)[INFO] classifying proposals...[INFO] applying NMS...

Again, here we have two bounding boxes:

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (14)

Applying non-maxima suppression to our R-CNN object detection output leaves us with the final object detection:

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (15)

Let’s look at one final example:

$ python detect_object_rcnn.py --image images/raccoon_03.jpg[INFO] loading model and label binarizer...[INFO] running selective search...[INFO] proposal shape: (200, 224, 224, 3)[INFO] classifying proposals...[INFO] applying NMS...

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (16)

As you can see, only one bounding box was detected, so the output of the before/after NMS is identical.

So there you have it, building a simple R-CNN object detector isn’t as hard as it may seem!

We were able to build a simplified R-CNN object detection pipeline using Keras, TensorFlow, and OpenCV in only 427 lines of code, including comments!

I hope that you can use this pipeline when you start to build basic object detectors of your own.

What's next? We recommend PyImageSearch University.

Course information:
84 total classes • 114+ hours of on-demand code walkthrough videos • Last updated: February 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

&check; 84 courses on essential computer vision, deep learning, and OpenCV topics
&check; 84 Certificates of Completion
&check; 114+ hours of on-demand video
&check; Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
&check; Pre-configured Jupyter Notebooks in Google Colab
&check; Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
&check; Access to centralized code repos for all 536+ tutorials on PyImageSearch
&check; Easy one-click downloads for code, datasets, pre-trained models, etc.
&check; Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned how to implement a basic R-CNN object detector using Keras, TensorFlow, and deep learning.

Our R-CNN object detector was a stripped-down, bare-bones version of what Girshick et al. may have created during the initial experiments for their seminal object detection paper Rich feature hierarchies for accurate object detection and semantic segmentation.

The R-CNN object detection pipeline we implemented was a 6-step process, including:

Step #1: Building an object detection dataset using Selective Search
Step #2: Fine-tuning a classification network (originally trained on ImageNet) for object detection
Step #3: Creating an object detection inference script that utilizes Selective Search to propose regions that could contain an object that we would like to detect
Step #4: Using our fine-tuned network to classify each region proposed via Selective Search
Step #5: Applying non-maxima suppression to suppress weak, overlapping bounding boxes
Step #6: Returning the final object detection results

Overall, our R-CNN object detector performed quite well!

I hope you can use this implementation as a starting point for your own object detection projects.

And if you would like to learn more about implementing your own custom deep learning object detectors, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I cover object detection in detail.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

R-CNN object detection with Keras, TensorFlow, and Deep Learning - PyImageSearch (2024)

Looking for the source code to this post?

R-CNN object detection with Keras, TensorFlow, and Deep Learning

Steps to implementing an R-CNN object detector with Keras and TensorFlow

Our object detection dataset

Configuring your development environment

Project structure

Implementing our object detection configuration file

Measuring object detection accuracy with Intersection over Union (IoU)

Implementing our object detection dataset builder script

Preparing our image dataset for object detection

Fine-tuning a network for object detection with Keras and TensorFlow

Training our R-CNN object detection network with Keras and TensorFlow

Putting the pieces together: Implementing our R-CNN object detection inference script

R-CNN object detection results using Keras and TensorFlow

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide