Need Help Integrating Python Script with Document AI for Data Extraction Tool

dgl
New Member

Hi everyone,

I am working on an application aimed at extracting data from multiple PDF files and exporting it to an Excel sheet using Python. I have successfully created the script and have been training a processor in Document AI. However, I'm facing challenges connecting the two components, and I suspect there might be an issue with my API setup.

Here's a redacted version of my script where I'm trying to connect and process PDFs using Document AI and then export the extracted data to Excel:

 

from tkinter import *
from tkinter import filedialog, messagebox, simpledialog
import customtkinter
from PIL import Image
import platform
import glob
import os
import pandas as pd
from google.oauth2 import service_account
from google.cloud import documentai_v1beta3 as documentai
from typing import Optional
from google.api_core.client_options import ClientOptions
import base64

# Credentials and IDs are redacted for security
credentials_path = "path_to_credentials.json"
credentials = service_account.Credentials.from_service_account_file(credentials_path)
client = documentai.DocumentProcessorServiceClient(credentials=credentials)
project_id = "your_project_id"
processor_id = "your_processor_id"

# Main function to load, process documents and handle exceptions
def extract_and_process_document(project_id: str, file_path: str, processor_id: str):
    # Function body...
    pass

# GUI setup and other functions
# Functions for selecting source folder, handling exports, etc.
# Full tkinter GUI code and event handling

root.mainloop()

 

I would appreciate any advice or suggestions on ensuring proper connectivity between my Python script and Document AI API, specifically around authenticating and making API calls. Are there best practices I should follow, or common pitfalls to avoid?

Thanks in advance for your help!

3 1 85
1 REPLY 1

Hi @dgl

Thank you for joining our community.

The code snippet that you've shared looks like a good starting point for integrating your Python script with Document AI. This article provides insight for authenticating to Document AI through Application Default Credentials.

Adding a couple more references to further your research.