10 Optical Character Recognition (OCR) APIs

10 Optical Character Recognition (OCR) APIs

Posted in

APIs are driving business workflows across the globe, enabling more efficient processes and reducing friction in the market. One place where APIs and machine learning have seen long-term interest is optical character recognition (OCR). OCR is a type of artificial intelligence that analyzes media to digitize text and create structured data.

Many programmers have long aspired to perfect OCR, which has captured the imaginations of generations of developers. While error-free OCR isn’t here, it’s awfully close. We’ve reviewed ten amazing OCR APIs to showcase the technology’s current state.

1. Taggun

Taggun‘s solution is pretty industry-specific, offering an OCR API backed by machine learning for receipt scanning. That said, Taggun is an excellent example of the use of APIs in this industry segment. By hyper-specializing, Taggun makes for a highly efficient and quick solution for specific use cases. In this case, Taggun bills itself as a solid anti-fraud system due to its insight-generating structured data approach.

Best For: Receipt and Transaction OCR

Example Request

curl --request POST \
     --url https://api.taggun.io/api/receipt/v1/verbose/url \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data'
{
  "headers": {
    "x-custom-key": "string"
  },
  "refresh": false,
  "incognito": false,
  "extractTime": false
}'

2. ABBYY FineReader Engine

ABBYY FineReader Engine is an OCR platform with an SDK and API that offers scalable AI-driven OCR offerings. Notably, the platform converts documents into a searchable and editable format. It provides substantial document conversion options to allow consumers to extract, document, structure, and convert in a single workflow. ABBYY is a strong offering for enterprise solutions, but those looking for a singular solution and not a platform might find the offering a bit heavy.

Best For: High Conversion Portability

Example Request

curl --request POST https://<PROCESSING_LOCATION_ID>.ocrsdk.com/v2/processImage

3. SentiSight

SentiSight is a relatively powerful OCR API offering with quick response times. It really shines in its support for Asian characters. Many OCR software solutions struggle once you start processing non-Latin character languages, so SentiSight’s AI and ML-driven support stands out in the market. Notably, SentiSight also allows integrators to train their own models in addition to using pre-trained models, granting a higher flexibility that could make it a standout offering in some specific use cases — especially if those use cases are with Asian content in non-standard formatting.

Best For: Non-Western Language Standards Support

Example Request

TOKEN="your_token"
PROJECT_ID="your_project_id"
MODEL="your_model_name"
IMAGE_FILENAME="your_image_path"
curl -H "X-Auth-token: $TOKEN" --data-binary @"$IMAGE_FILENAME" \
  -H "Content-Type: application/octet-stream" \
  -X POST "https://platform.sentisight.ai/api/predict/$PROJECT_ID/$MODEL"

4. Amazon Textract

Amazon Textract is a tool from Amazon focused on extracting text and converting this content into structured data. It’s billed as a “fully managed” service due to its bundling with AWS. Its leveraging of AI and LLMs gives it quite a bit of power, with AWS serving as a strong backbone, delivering efficiency and accuracy. Additional security and compliance across the AWS environment make this a strong contender for data sources that must adhere to legal frameworks such as HIPAA and GDPR.

Best For: SaaS Preference and Legal Compliance in OCR Processes

Example Request

{
   "AdaptersConfig":{
      "Adapters":[
         {
            "AdapterId":"string",
            "Pages":[
               "string"
            ],
            "Version":"string"
         }
      ]
   },
   "Document":{
      "Bytes":"blob",
      "S3Object":{
         "Bucket":"string",
         "Name":"string",
         "Version":"string"
      }
   },
   "FeatureTypes":[
      "string"
   ],
   "HumanLoopConfig":{
      "DataAttributes":{
         "ContentClassifiers":[
            "string"
         ]
      },
      "FlowDefinitionArn":"string",
      "HumanLoopName":"string"
   },
   "QueriesConfig":{
      "Queries":[
         {
            "Alias":"string",
            "Pages":[
               "string"
            ],
            "Text":"string"
         }
      ]
   }
}

5. Hive’s Optical Character Recognition API

Hive’s OCR API offers much of the same functionality as other solutions on this list but with a few key differences. First and foremost, Hive’s solution provides support in more than 15 languages, allowing for an international data processing solution. One of the big issues with traditional OCR solutions has been the focus on English and Latin languages, so this is important. Secondarily, Hive offers extraction from images and video, including complex structures such as emojis, handwritten text, and multi-directional text rotations on the same document. It’s a well-trained API with a strong machine learning backend that enables pretty complex data flows.

Best For: Multi-Language Support

Example Request

# submit a task with media with url curl --request POST \ 
--url https://api.thehive.ai/api/v2/task/sync \ # this is a sync example, see API reference for async 
--header 'accept: application/json' \ 
--header 'authorization: token <API_KEY>' \ 
--form 'url=http://hive-public.s3.amazonaws.com/demo_request/gun1.jpg' 

# submit a task with media with local media file 
curl --request POST \ 
--url https://api.thehive.ai/api/v2/task/sync \ # this is a sync example, see API reference for async 
--header 'Authorization: Token <token>' \ 
--form 'media=@"<absolute/path/to/file>"'

6. Klippa OCR API

Klippa OCR API is a solution focused on business OCR and document extraction. Paired with Klippa’s other business offerings, including conversion, it stands as a solid business offering for complex data sets requiring high accuracy. Klippa boasts a 99% extraction accuracy, leveraging a machine learning backend for quick response time and iterative detection.

Best For: High Accuracy

Example Request

curl -X POST \\
  -H "x-api-key: {your-api-key}" \\
  -H "Content-Type: application/json" \\
  -d '{"documents": [{"data": "document data encoded as base64"}]}' \\
  https://dochorizon.klippa.com/api/services/document_capturing/v1/generic

7. Base64.ai

Base64.ai is a highly capable OCR AI-driven API. Base64’s main value proposition is speed and ease of implementation. It boasts hundreds of integration systems that connect in less than one hour and an average service time post-integration of around three seconds. Base64 is also pretty wide in its target form. Support across both typed and handwritten documents provides a strong value proposition that is only made stronger with the additional human verification services on offer. Notably, Base64 also offers redaction, which makes it a strong offering for organizations looking to take OCR to the next step with security and privacy validation and servicing.

Best For: Quick Iteration and Deployment

Example Request

curl --location 'https://base64.ai/api/scan' \
--header 'Content-Type: application/json' \
--header 'Authorization: ApiKey email:secret' \
--data '{
    "document": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...",
    "settings": {
        "redactions": {
            "fields": ["name", "issueDate"],
            "faces": true,
            "signatures": true
        }
    }
}'

8. Clarifai’s Computer Vision VTR

Clarifai’s Visual Text Recognition solution extracts information from different types of unstructured data. What sets it apart from others on this list is that it can do this across multiple languages and can process video in addition to images. This makes it suitable for a wide variety of circumstances and applications. Clarifai also offers other LLM and AI-driven systems as part of a collective offering. Notably, it allows you to import your own models, offering users the best of both worlds.

Best For: Desire for a Platform Solution

Example Request

curl -X POST "https://api.clarifai.com/v2/users/clarifai/apps/main/models/general-image-recognition/versions/aa7f35c01e0642fda5cf400f543e7c40/outputs" \
    -H "Authorization: Key YOUR_PAT_HERE" \
    -H "Content-Type: application/json" \
    -d '{
      "inputs": [
        {
          "data": {
            "image": {
              "url": "https://samples.clarifai.com/metro-north.jpg"
            }
          }
        }
      ]
    }'

9. FormX.ai

FormX.ai is primarily concerned with structured data. In essence, FormX’s approach is to utilize data models (extractors) to generate JSON outputs for various document types, including invoices, bills, receipts, and more. FormX is an excellent option for those without their own models, as the preconfigured models allow for rapid deployment without the need to bring your own model or training set. The good news in adopting a JSON standardized system is that it will look very familiar to many users, and the web portal on offer helps to reduce friction for those who prefer low-or-no-code solutions or those who lack such familiarity.

Best For: Low-Code or No-Code Solutions

Example Request

curl --request POST \
     --url https://worker.formextractorai.com/detect-documents \
     --header 'X-WORKER-ENCODING: raw' \
     --header 'X-WORKER-PDF-DPI: 150' \
     --header 'accept: application/json:' \
     --header 'content-type: image/*'

10. Mindee

Mindee is a quite interesting solution. While it provides much of the same functionality as other “document to machine-readable” solutions, its main selling point is the variety of integrative options provided to end users. Mindee provides relatively robust SDKs for both Vue.js and React for UI integration into native apps, and the base level API provides framework and language agnostic support for broad adoption. Notably, Mindee provides an “API store” for plug-and-play APIs that can be deployed for specific functions, including receipt scanning, passport OCR, and more.

Best For: Solutions Requiring High Integration and Control

Example Request

curl -X POST \\ https://api.mindee.net/v1/products/Mindee/bank_account_details/v2/predict \\ -H 'Authorization: Token my-api-key-here' \\ -H 'content-type: multipart/form-data' \\ -F document=@/path/to/your/file.png

Other Options For OCR APIs

There are many unique options that you may find useful. These include the following, but the world of OCR is diverse and extensive, so feel free to dig in deeper!

  • Tesseract: This is an OCR engine with both a neural net (Tesseract 4) and model-based system (Tesseract 3). This solution was initially developed at Hewlett-Packard Laboratories Bristol, UK, and the Hewlett-Packard Co offices in Colorado and is now entirely open source. This is a strong solution for organizations desiring full control over the entire OCR process.
  • Google Cloud OCR API: Offered by Google as part of their Google Cloud offering, this OCR API leverages Google Vision to deliver accurate and fast OCR extraction. This is a strong offering for organizations already using Google Vision or Google Cloud in their business processes.
  • OCR API by OCRSpace: This OCR API provides accurate and quick OCR extraction via a freemium model, promising quicker return guaranteed uptimes via “pro” plans. This can provide business flexibility to organizations looking to integrate OCR without significant upfront cost.

Conclusion

While this list is not exhaustive — many more offerings are constantly coming onto the market — these services represent a good snapshot of the current OCR API offerings with AI and machine learning backends. Any of these solutions would be a great addition to your workflow. Have we missed any excellent examples in the market? Let us know, and we’ll take a look.

If you would like to recommend another API for this list, please comment below, and we’ll consider reviewing it when we update this list in the future.