Intspirit/Neural Networks: basics

Neural Networks: the best open-source library for object detection & classification

Intspirit Ltd
5 min readApr 6, 2022

In the previous article, we showed a way how to start diving into the amazing world of neural networks. But what if you, for some reason, don’t have either time or resources to build and train your own neural network? Well, neural networks are a rapidly developing field, and with “googling a lot” you can find wonderful open-source solutions for almost any task. Let’s pull back the Neural curtain and help you to solve it by using a good open-source solution in this article.


ImageAI is an open source Python library by Moses Olafenwa written for finding and classifying objects across videos and images. But what exactly makes it so cool? The library can work with already trained models, using which you can recognize 80 different kinds of objects and classify over 1000.

As you will soon see, in order to recognize or classify objects in a video or image, you need no more than 10 lines of code. This library also allows you to train your own model to recognize the specific objects you need.

Object Detection

Fig.1 shows the objects that you can recognize with ImageAI according to its documentation:

Objects that can be detected on image or video using ImageAI library: person, dog, boat, car, umbrella, etc.
Fig.1 Objects that can be detected on image or video using ImageAI library.

For Object Recognition, you can use one of the three pre-trained models that the library can work with: RetinaNet, YOLOv3, and TinyYOLOv3. These models are already trained to recognize certain objects using large training datasets. Each of these models has its own pros and cons. You can choose the model that best suits your application, or use your own model. We will cover training your own model in the next article, but for now let’s take a closer look at the standard models that ImageAI can work with:

  1. RetinaNet is one of the best one-stage Object Detection models that has proven to work well with dense and small scale objects. For this reason, it has become a popular object detection model to be used with aerial and satellite imagery.
  2. YOLO (You Only Look Once) is one of the most popular series of Object Detection models. Its advantage has been in providing real-time detections while approaching the accuracy of state-of-the-art object detection models. For more information on how it works under the hood, click here.
  3. TinyYOLOv3 is another version of the YOLOv3 model. It has a much higher Object Recognition speed (about 442% faster than other YOLO variants), but lower detection accuracy.

How to detect objects using ImageAI

First things, you need to decide which model you want to use, and download it. We will put the path to the model in our program. The models can be downloaded using the links below:

Download RetinaNet Model — resnet50_coco_best_v2.1.0.h5
Download YOLOv3 Model — yolo.h5
Download TinyYOLOv3 Model — yolo-tiny.h5

To detect objects in any image or video file using pre-trained models, ImageAI provides the ObjectDetection class.

Image Object Detection

To find objects in the image, we only need 7 lines of code:

Image object detection using ImageAI library.

After executing this code, the detections variable will contain an array of objects, each containing:

  • name (string)
  • percentage_probability (float)
  • box_points (list of x1,y1,x2 and y2 coordinates)

As well as an image with the found objects being highlighted with squares and signed:

Result of image object detection: the source image with the found objects being highlighted with squares and signed with names and coordinates
Fig. 2 — Result of Object Detection using ImageAI

Video Object Detection

To detect objects in a video, create an instance of the ObjectDetection class and select a model in the same way as for detecting objects in an image. Then use the detectObjectsFromVideo method:

Video object detection using ImageAI library.

Besides per_second_function and video_complete_function the method can accept other functions that are triggered when processing a certain section of the video. Here is their complete list:

- per_frame_function — triggers once a frame is processed
- per_second_function triggers once a second is processed
- per_minute_function — triggers once a minute is processed
- video_complete_function — triggers once the entire video is processed

The functions above can be useful for getting parts of the result in stages before the processing is finished:

Print per second result.

It is recommended to pass the correct value as frames_per_second. You can get this value using the cv2 library:

Calculate frames per second using cv2 library.

The fps variable will contain the frames per second of your video file.

Be careful when trying to recognize objects in a longer video, as the library does not allow to programmatically interrupt this process. An issue regarding this has been opened, but not answered yet:

Object Classification

To classify objects, you can use one of the four pre-trained models that the library can work with: MobileNetV2, ResNet50, InceptionV3 and DenseNet121. These models have been trained on the ImageNet-1000 dataset and can recognize up to 1000 different objects in an image. Let’s take a closer look at each of them.

  1. MobileNetV2 is a convolutional neural network architecture that seeks to perform well on mobile devices. MobileNets support any input size greater than 32x32, with larger image sizes offering better performance.
  2. ResNet-50 model is a convolutional neural network (CNN) that is 50 layers deep. In a typical convolutional neural network, several layers are stacked on top of each other, with each layer learning features from the output of the preceding layer. However, the ResNet50 model uses residual learning, where instead of the features from the input being learnt, the residuals (the features learned subtracted from the input), are learnt, taking care of the issue of degrading accuracy.
  3. InceptionV3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset. For maximum accuracy, images should be 299x299x3 pixels in size.
  4. DenseNet121 — Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. DenseNets have several advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. The required minimum input size of the model is 29x29.
How to confuse Machine Learning: dog or muffin?
Fig. 3 Dog or muffin?

How to classify objects using ImageAI

Once again, first you need to decide which model you want to use, and download it. We will put the path to the model in our program. The models can be downloaded using the links below:

Download MobileNetV2 Model
Download ResNet50 Model
Download InceptionV3 Model
Download DenseNet121 Model

To classify objects in an image, use the ImageClassification class, as shown in this example:

Image classification using ImageAI.

Not that hard, right?

In the next article, we’ll describe how to train your own model using ImageAI and write an Angular interface for preparing data for training.

Subscribe, so you don’t miss out!