8 APIs to Detect Objects From Video

Improve your video intelligence with these robust AI-driven services

Machine learning has made profound leaps in recent years, offering up new capabilities like business AI, emotion recognition, image detection, and more. In conjunction with AI, APIs are becoming smarter — they can now perform incredibly difficult and resource-intensive tasks.

One of these newly accelerated domains of AI power is object detection. A wide range of APIs and libraries, some open-source, some commercial, promise to detect objects from both video and images with incredible precision. These video object recognition APIs utilize pre-trained data sets, custom data training, and real-time data stream efforts to create accurate video intelligence for a variety of applications.

Today, we’re going to look at eight offerings in this space. They are not ranked in any particular order, nor are they “one size fits all.” That said, all of these solutions present impressive offerings and represent a promising frontier for the API industry.

What is Machine Learning Anyways?

Before we discuss the specific services, let’s review what machine learning is. Artificial Intelligence (AI) solutions carry out certain tasks with human-like intelligence. But what powers these offerings?

Simply put, machine learning. Machine learning is the process of taking data sets and “testing” hundreds, thousands, sometimes millions of permutations. The service is trained against known entities and told to look for qualities that may be replicated elsewhere. By learning through testing many objects, a solution can be generated to detect objects.

These systems are only as good as their data sets. For this reason, large data set solutions are often favored, with juggernauts like Amazon and Google all but dominating the space. This is where custom data set training comes into play. Solutions like Amazon and Google are necessarily broad-spectrum. Other solutions, especially those that focus on machine learning training on custom data sets with custom labels, can often perform in very specific, scope-limited endeavors just as well, if not better, than the bigger competitors.

All of this is to say – when considering the following list, consider both the trust you are placing in the system (after all, these systems will touch an incredible amount of content wherever you implement it) and your confidence in the efficacy of the algorithms being used.

8 Detection Services

Video Intelligence API

The Video Intelligence API utilizes a REST API to generate a diverse set of metadata and annotations for video and images utilizing the Google AI and machine learning systems. There’s a wide range of things the API can do, including detecting logos, deciphering text within video, analyzing objects within the video, and more.

While the Video Intelligence API can, in theory, be used to train on custom data sets, most of its users are likely to use the pre-trained machine learning models as they are designed to be pretty comprehensive. This API is used across many Google core products, and as such, it is consistently iterating – for this reason, it may be a strong choice for long-term, low-resource-cost usage, as the algorithm will only get better with time.


Rekognition is a powerful offering from Amazon that leverages their machine learning efforts in recent years to deliver an extensible, scalable offering. Its current users include the NFL, National Geographic, CBS, and others, demonstrating that a diverse set of use cases can be deployed via the AWS system.

Rekognition uses two distinct systems as of 2020. Pre-trained algorithms include detecting celebrities in photos, generating pathing for people in photos, and other such pre-trained content. The other system is based upon training the SearchFaces and Face-based user verification systems on custom data sets, which may work well for security applications, such as verification for accessing a server room.

TensorFlow Object Detection API

  • Vendor: Google Brain Team
  • Website

The TensorFlow Object Detection API is an excellent open-source framework designed for object detection systems. Interestingly, this implementation is focused more on enabling the creation of object detection models than providing the perfect model “out of the box”. For this reason, it is perhaps one of the strongest contenders for custom solutions in this list.

Not every implementation will utilize the same data sets or ask for the same things — and for this reason, custom-built, custom-trained, custom-scope implementations may, at times, the better option. For instance, what if we wanted to build a detection API that looks for both rock climbers and the ropes they’re using for tracking during a sports event? This is a very particular use case. While a general solution may not be best in this case, a framework like the TensorFlow Object Detection API might be the perfect implementation.

Also read: How to Train a Machine Learning Algorithm With TensorFlow

Ximilar Visual Automation

Ximilar Visual Automation is an interesting offering, as it has targeted a specific niche of content utilizing a more general offering. While the offering from Ximilar can create custom solutions, it’s quite clear that their focus is in detecting qualities about objects in a business purview. For instance, their marketing heavily leans on the idea of detecting specific outfits, different types of architectural attributes (such as finding many houses that are all the same style), etc.

Of course, as with any machine learning system, a custom implementation is not impossible. That said, with such a specific niche, Ximilar looks to be a great offering for developers within an industry with very specific elements that need processing (such as manufacturing, shipping, etc.).

Video Recognition API

The Video Recognition API offering from Valossa, while it can be utilized for a broader range of machine learning purposes, was mainly developed for usage on cloud, on-premises, and live camera systems. These focuses tend to be based around live video production, marketing, brand awareness, etc. The core offering seems to be widely spread within that space, covering as much ground as possible.

Valossa offers custom solutions too. That considered, developers must decide between a very specifically-built function or a generalist approach. If your data use case is tied very closely to what Valossa has been designed for, it can be an excellent offering — if it’s not, other offerings on this list may prove more useful out of the box.

Watson Visual Recognition

Watson is most famous for its performance in Jeopardy! The machine learning algorithm that powered those performances can also be leveraged for visual recognition. Watson was trained on highly detailed models and has added support for custom modeling. It roughly groups its model data into three domains — “general model,” which allows for general classification, “explicit model,” which determines whether an image is appropriate, and “food model,” which detects food items.

Watson is very focused upon the generation of additional modeling, and as such, this is perhaps a stronger offering for those coming into this with unique data sets. From a security perspective, IBM is a trusted entity —trusting IBM with your data may sit well than with other partners.


  • Vendor: ImageAI
  • Website

ImageAI is a machine learning library for Python that has a highly active development landscape on GitHub. It’s composed of thousands of contributors and users. ImageAI was designed to be simple, and because of this, it is still a somewhat-specific implementation as of 2020. Currently, it offers image prediction, object detection, and tracking, and video detection through the use of several datasets (notably, ImageNet–1000 and COCO).

ImageAI has limited plans for expansion, however. While they currently offer a custom data set option, they intend to develop both more general data training and more specialized data training sets to capture a wider variety of use cases and users.


TorchVision is part of PyTorch, an open-source machine learning system. It offers a variety of datasets, including Cityscapes, Flickr images, and more. PyTorch is highly scalable, with a distributed backend utilizing multiple concurrent training systems to optimize training results and output a more accurate model.

As with any framework, however, there is a certain amount of cost in implementing PyTorch compared to other offerings. Additionally, such a wide range of options may be inappropriate for specific use cases. It’s possibly a better choice to be more specific in such situations and choose a machine learning offering that has trained more robustly on a distinct type of content. PyTorch is, nonetheless, a highly powerful option.

Your Datasets Should Determine AI Solutions

With machine learning, your implementation will only be as good as its appropriateness to your solution and the quality of the data you give it. For this reason, this list is presented merely as a collection of offerings without much statement as to relative quality. All of these options are good in their own use cases and should be considered within the context of your specific data offerings.

What do you think about our selections? Let us know in the comments below!