Document Classifier#
Use Document Classifier endpoint (see below) to automatically sort / detect the class of the document based on keywords-based rules. For example, you can define rules to find which vendor provided the document to find which template to apply accordingly.
Auto classification Of Incoming Documents#
Note
To quickly create and test classification rules, download and install ByteScout PDF Multitool. Run it and check PDF Classifier
at the left sidebar. Test rules and export them as a JSON request for PDF.co PDF Classifier.
Available Methods#
/pdf/classifier#
Document Classifier can automatically find class of input PDF, JPG, PNG document by analyzing its content using the built-in AI or custom defined classification rules.
The best way to develop, test and maintain classification rules is to use Classifier Tester Tool
from PDF.co Document Classifier UI . Use this tool to quickly edit and test rules on single PDFs and on folders.
Method: POST
Endpoint: /v1/pdf/classifier
Attributes#
Note
Attributes are case-sensitive and should be inside JSON for POST request, for example:
{
"url": "https://example.com/file1.pdf"
}
Attribute |
Description |
Required |
---|---|---|
|
URL to the source file. 1 |
yes |
|
HTTP auth user name if required to access source |
no |
|
HTTP auth password if required to access source |
no |
|
Define custom classification rules in CSV format. Rules are in CSV format where each row contains:
|
no |
|
Instead of inline CSV you can use this parameter and set the URL to a CSV file with classification rules. This is useful if you have a separate developer working on CSV rules. |
no |
|
Defines if keywords in rules are case-sensitive or not. (default |
no |
|
Set to |
no |
|
Password of PDF file, the input must be in string format. |
no |
|
Set |
no |
|
File name for the generated output, the input must be in string format. |
no |
|
Set the expiration time for the output link in minutes (default is |
no |
|
Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more. |
no |
Query parameters#
No query parameters accepted.
Payload#
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"async": false,
"inline": "true",
"password": "",
"profiles": ""
}
Response 2#
{
"body": {
"classes": [
{
"class": "invoice"
},
{
"class": "finance"
},
{
"class": "documents"
}
]
},
"pageCount": 1,
"error": false,
"status": 200,
"credits": 42,
"duration": 353,
"remainingCredits": 98019328
}
CURL#
curl --location --request POST 'https://api.pdf.co/v1/pdf/classifier' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"async": false,
"inline": "true",
"password": "",
"profiles": ""
} '
Code samples#
Footnotes
- 1
Supports links from Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API check out the File Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add
cache:
to enable built-in URL caching. (e.gcache:https://example.com/file1.pdf
) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.- 2
Response codes as follows:
Code
Description
200
The request has succeeded
400
Bad input parameters
401
Unauthorized
403
Not enough credits
405
Timeout error. To process large documents or files please use asynchronous mode (set the
async
parameter totrue
) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using thepages
parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.