How to Process Handwritten Text Using Python and Cloud Vision

With the size and scope of collected data expanding with each passing day, it is very important to continually analyze available data to drive better-informed business and policy decisions. To a large extent, handwritten data remains unexplored and unanalyzed. If we can analyze handwritten text data, we can minimize the hurdles and save the manpower involved in digitizing handwritten data.

In this blog, we cover how handwritten text data can be processed using Python and Google Cloud Vision. Cloud vision offers pre-trained ML models which are very powerful, and we do not need to do any pre-training.

Below is an example of converting simple handwritten text into digital words, which can be easily ingested into a CSV file or database.

45789 should be converted to 45789

casino should be converted to casino

Using the applications listed in the instructions below, we are able to convert the scanned images or PDF files into digital text. Our approach for this solution requires us to first convert PDF documents to an image format. As we are processing images, it is better to convert all the images to the same size. We then need to define the source from which we are extracting the data from the image. This requires knowing the coordinates of the data to be extracted.

Steps to Convert Handwritten Text into Digital Data

  1. The prerequisites for this exercise are to install Google Cloud Vision, Python 3, Handprint, Keras, NumPy, pandas, pdf2image and cv2.
  2. Convert the PDF document to an image using the following python command.
    images = convert_from_path(file)
  3. Save the images to a directory so that this will be used for data extraction.
    for img in images:
       img.save(path + "\\" + fileName + '.jpg', 'JPEG')
  4. Read the image using following command.
    image = cv2.imread(filename)

    If it is a digital document or data, we can use Tesseract or pytesseract to convert the digital data into fields. As we are dealing with handwritten text, Tesseract will not work for us to extract data, but we can use it to get coordinates. Let us get the coordinates for the fields using python commands instead of using UI application.

  5. Pass the image to the pytesseract to convert the image to data item.

    data = pytesseract.image_to_data(image, output_type=Output.DICT)

  6. To view the keys, run the below command.

    keys = list(data.keys())

    print(keys)

  7. Check the extracted text data.

    print(data['text'])

  8. Extract the coordinates for the required text using the following command.

    if re.match(accountIdReg, d['text'][i].lower()):
       (x, y, w, h) = (
           d['left'][i], d['top'][i], d['width'][i], d['height'][i])

  9. Repeat the same process for all the required text fields.
  10. Use the extracted coordinates to crop the image for the required field and save the cropped image to a specific location.

    idImageBox = (
       x + w + 65, y - 20, x, y + h + 5)
    # (1662, 2342)
    idImage = img.crop(idImageBox)
    # accountIdImage1.show()
    idImage.save(cropped_dir + '\\id_Extracted_' + fileName + '.png')

    At this point, we are going to use Handprint, a Python package to convert handwritten images into digital data which later can be saved to CSV data or a database. It also shows annotated images with the text recognized. Install Python’s Handprint application on your server and configuration is also needed to set up Google Cloud Vision and set up credentials to access the API.

  11. Download the credentials file after setting up the Cloud Vision account on Google.

    handprint -a SERVICENAME CREDENTIALSFILE.json

    os.system('handprint /a google C:\\Users\\ocr-keras\\cred.json')

  12. Process these cropped images using Google’s Cloud Vision to extract handwritten values.

    images = glob.glob(cropped_dir + "\\" + "*.png")
    for image in images:
       os.system('handprint /s google /e ' + image)

    Let us review the annotated images with the extracted information. 

    45789 annotated   casino annotated

    With this, we can confirm that extracted data is accurate.  The process above also generates Json files with the extracted data.

  13. We then extract the required data from the Json files.

    dictFields = {}
    for field in fileds:
       jsonFiles = glob.glob(cropped_dir + "\\" + field + "*.json")
       for jsonFile in jsonFiles:
           with open(jsonFile) as f:
               temp = json.load(f)
               distros_dict = json.loads(temp)
               print(distros_dict['text_annotations'][0]['description'])
               dictFields[field] = distros_dict['text_annotations'][0]['description']

  14. We have created a Python dictionary to store the extracted information. Convert the dictionary to dataframe for easy processing.

    dataFrame = pd.DataFrame([dictFields])
    print(dataFrame)

    Below are the dataframe contents.

    Id            business

    0     45789   casino

    Save this dataframe to CSV format using the following.

    dataFrame.to_csv(path + '\\' + fileName + '.csv', index=False)

As this example deals with only two images, the instructions did not include many validations. For a real-world application, analysts should validate the extracted data based on confidence levels and filter out the accurate data.

In Conclusion

Handwritten image and text processing is very useful across many industries, including healthcare, insurance, and utilities. Various use cases include invoice processing, processing of inspection forms, employee onboarding, analyzing reviews and survey data and so on. One prescient application for utilities and other organizations relying on distributed field services is the digitization of handwritten notes taken by field technicians when completing repairs or regularly scheduled maintenance. By digitizing these notes, and even incorporating process automation to do so, utilities are better situated to retain critical handwritten information regarding asset health, maintenance history, and more.

About the Author

Manikanth Koora

Manikanth Koora

Manikanth Koora is a Senior Software Developer for Advanced Analytics at HEXstream. He has many years of experience in various programming languages, big data, machine learning, blockchain, and real-time data analytics.  He completed his Master of Science in Information Technology at Southern New Hampshire University and has completed many certifications for big data and cloud technologies.  Manikanth enjoys learning new technologies and watching TV shows.

Back to Blog

Related Articles

Partnering with Kinetica for Advanced Analytics Excellence

Kinetica, one of HEXstream’s partners, is an advanced GPU-based database which enables advanced...

An Introduction to Process Automation for Utilities

Process automation can help utilities work more efficiently across the organization. Today,...

An Unmatched Utility Analytics Database: HEXstream’s Exasol Partnership

Real-time analytics are becoming essential to the transformation of utility operations (we explore...