Document Classifier Usage Guide#
Document Classifier checks content of input PDF,JPG,PNG or TIFF using /pdf/classifier endpoint. Then uses AI to automatically determine class of this document (for example, finance
, invoice
etc) and returns to user. Can also use custom defined rules for classification rules.
Use the Document Classifier to quickly build a workflow for sorting out input documents and PDF files.
How to Create and test custom classification rules#
Classification rules are storred in CSV format with one line per class in the following format:
className, logicType, keyword1, keyword2, keyword3 ..
where:
className
- a name of a class. It will be returned if rules from this class matched the documentlogicType
- (optional) the logic to use for keywords. Can beOR
(default).OR
means that to identify the class it should match 1 or more keywords from the list).AND
means that all keywords must be found. If logic column is not set then the app usesOR
logic be default (i.e. one of keywords listed should be found to determine a class)keyword1
(alsokeyword2
,keyword3
etc) - is the keyword or a phrase to check. Can use regular expression, for example/\d+/
or/Medical Report|Med Report/i
Sample Rules#
Invoice,OR,Invoice Number,Invoice #,Invoice No,Tax Invoice,,
Purchase Order,OR,PO Number,Order Number,Order No,,,
Bill,OR,Bill Date,Billing Period,Bill Number,,,
Bank Statement,OR,/Account Statement/i,/Statement of Account/i,Business Checking,Accounts Payable,/Statement No/i,
Income Statement,OR,/Income Statement/i,,,,,
Has US Number,OR,"/\b-?(\d+,?)+(\.\d\d)\b/",,,,,
Medical Report,AND,/Medical Report|Med Report/i