Document AI , Document type classifier


I'm embarking on training a custom classifier within Google Cloud's Document AI to identify document types uploaded by users. This classification is crucial for directing documents to the appropriate processors (e.g., for payslips, bank statements, etc.). My main query revolves around the necessity of training the Document AI for this purpose.

  1. Existing Solution: Does Document AI offer a pre-trained processor for document classification ? or is training a custom classifier necessary?

  2. Efficiency in Training: Assuming training a custom classifier is required, I'm faced with the challenge of efficiently labelling documents for training. Currently, the process involves individually labelling each document within the Document AI workbench. However, this becomes laborious, especially when dealing with numerous document types (up to 40 labels), each requiring a considerable number of samples for training and testing (e.g., 10 for training, 2 for testing); Is there an API available that allows for batch uploading of documents to Cloud Storage, followed by the unified labeling of all documents under a specific document type (label)? This streamlined approach would significantly enhance efficiency and reduce the manual effort required in the labelling process.

0 0 60