Document Parser#

Document Parser can automatically parse PDF, JPG, and PNG documents to extract fields, tables, values, and barcodes from invoices, statements, orders, and other PDF and scanned documents.

Built-in document parser templates#

General Invoice Template can parse invoices (English only) to invoice id, invoice date, extract total, tax, and line items. Set the templateId parameter to 1 to use this template.

How to classify incoming documents before parsing them?#

Use the /pdf/classifier endpoint (see below) to automatically sort/detect the class of the document based on AI or on custom keywords-based rules.

For example, you can easily define rules to find which vendor provided the document to find which template to apply accordingly. See Document Classifier for more details.

Additional Information and Tools#

/pdf/documentparser#

This API method extracts data from documents based on a document parser extraction template. With this API method, you can extract data from custom areas by searching form fields, tables, multiple pages, and more.

Method: POST
Endpoint: /v1/pdf/documentparser

Attributes#

Note

Attributes are case-sensitive and should be inside JSON for POST request, for example:

{
    "url": "https://example.com/file1.pdf"
}

Attribute	Description	Required
`url`	URL to the source file. 1	yes
`httpusername`	HTTP auth user name if required to access source `url`.	no
`httppassword`	HTTP auth password if required to access source `url`.	no
`templateId`	Set ID of document parser template to be used. View and manage your templates at Document Parser.	no
`template`	You can pass the code of the document parser template to be used directly.	no
`inline`	Set to `true` to return results inside the response. Otherwise, the endpoint will return a link to the output file generated. Note: only applies if `async` mode is `true`.	no
`outputFormat`	Default is `JSON`. You can override the default output format to `CSV` or `XML` to generate CSV or XML output accordingly.	no
`password`	Password of PDF file, the input must be in string format.	no
`async`	Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with the Background Job Check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks.	no
`pages`	Specify page indices as comma-separated values or ranges to process (e.g. `"0, 1, 2-"` or `"1, 2, 3-7"`). The first-page index is `0`, Use `"!"` before a number for inverted page numbers (e.g. `"!0"` for the last page). If not specified or defined in the template, the default configuration processes all `pages`. The input must be in string format.	no
`name`	File name for the generated output, the input must be in string format.	no
`expiration`	Set the expiration time for the output link in minutes (default is `60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co Temporary Files Storage. The maximum duration for link expiration varies based on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf templates, documents) consider using PDF.co Built-In Files Storage.	no
`profiles`	Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more.	no

Query parameters#

No query parameters accepted.

Payload 3 #

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "outputFormat": "JSON",
    "templateId": "1",
    "async": false,
    "inline": "true",
    "password": "",
    "profiles": ""
}

Response 2 #

{
    "body": {
        "objects": [
            {
                "name": "companyName",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "companyName2",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "invoiceId",
                "objectType": "field",
                "value": "123456789",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateIssued",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateDue",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "bankAccount",
                "objectType": "field",
                "value": "123456789012",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "total",
                "objectType": "field",
                "value": 6.58,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "subTotal",
                "objectType": "field",
                "value": ""
            },
            {
                "name": "tax",
                "objectType": "field",
                "value": 1.01,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "objectType": "table",
                "name": "table",
                "rows": []
            }
        ],
        "templateName": "Generic Invoice [en]",
        "templateVersion": "4",
        "timestamp": "2020-08-21T19:23:31"
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.json",
    "remainingCredits": 60803
}

CURL#

curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: {{x-api-key}}' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "outputFormat": "JSON",
    "templateId": "1",
    "async": false,
    "inline": "true",
    "password": "",
    "profiles": ""
}'

/pdf/documentparser/templates#

Returns all Document Parser data extraction templates for the current user.

Use the PDF.co dashbaord to manage your Document Parser Templates.

Method: GET
Endpoint: /v1/pdf/documentparser/templates

Query parameters#

No query parameters accepted.

Body payload#

No body parameters accepted.

Response 2 #

{
    "templates": [
        {
            "id": 40,
            "type": "user",
            "title": "Untitled",
            "description": "Untitled"
        },
        {
            "id": 1,
            "type": "system",
            "title": "Invoice Parser",
            "description": "Parses invoices and extracts invoice number, company name, due date, amount, tax"
        }
    ],
    "remainingCredits": 94229
}

CURL#

curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates' \
--header 'Content-Type: application/json' \
--header 'x-api-key: {{x-api-key}}'

/pdf/documentparser/templates/:id#

Returns detailed information for document parser template by template’s id.

Use the PDF.co dashbaord to manage your Document Parser Templates.

Method: GET
Endpoint: /v1/pdf/documentparser/templates/:id

Query parameters#

No query parameters accepted.

Body payload#

No body parameters accepted.

CURL#

curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates/1' \
--header 'Content-Type: application/json' \
--header 'x-api-key: {{x-api-key}}' \
--data-raw ''

Code samples#

JavaScript / Node.js

/*jshint esversion: 6 */
  var https = require("https");
  var path = require("path");
  var fs = require("fs");

  // `request` module is required for file upload.
  // Use "npm install request" command to install.
  var request = require("request");

  // The authentication key (API Key).
  // Get your own by registering at https://app.pdf.co
  const API_KEY = "***********************************";

  // Source PDF file
  const SourceFile = "./MultiPageTable.pdf";
  // PDF document password. Leave empty for unprotected documents.
  const Password = "";
  // Destination PDF file name
  const DestinationFile = "./result.json";

  // 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.
  getPresignedUrl(API_KEY, SourceFile)
      .then(([uploadUrl, uploadedFileUrl]) => {
          // 2. UPLOAD THE FILE TO CLOUD.
          uploadFile(API_KEY, SourceFile, uploadUrl)
              .then(() => {
                  // 3. OPTIMIZE UPLOADED PDF FILE
                  parsePdf(API_KEY, uploadedFileUrl, Password, DestinationFile);
              })
              .catch(e => {
                  console.log(e);
              });
      })
      .catch(e => {
          console.log(e);
      });


  function getPresignedUrl(apiKey, localFile) {
      return new Promise(resolve => {
          // Prepare request to `Get Presigned URL` API endpoint
          let queryPath = `/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=${path.basename(SourceFile)}`;
          let reqOptions = {
              host: "api.pdf.co",
              path: encodeURI(queryPath),
              headers: { "x-api-key": API_KEY }
          };
          // Send request
          https.get(reqOptions, (response) => {
              response.on("data", (d) => {
                  let data = JSON.parse(d);
                  if (data.error == false) {
                      // Return presigned url we received
                      resolve([data.presignedUrl, data.url]);
                  }
                  else {
                      // Service reported error
                      console.log("getPresignedUrl(): " + data.message);
                  }
              });
          })
              .on("error", (e) => {
                  // Request error
                  console.log("getPresignedUrl(): " + e);
              });
      });
  }

  function uploadFile(apiKey, localFile, uploadUrl) {
      return new Promise(resolve => {
          fs.readFile(SourceFile, (err, data) => {
              request({
                  method: "PUT",
                  url: uploadUrl,
                  body: data,
                  headers: {
                      "Content-Type": "application/octet-stream"
                  }
              }, (err, res, body) => {
                  if (!err) {
                      resolve();
                  }
                  else {
                      console.log("uploadFile() request error: " + e);
                  }
              });
          });
      });
  }

  function parsePdf(apiKey, uploadedFileUrl, password, destinationFile) {

      // Template text. Use Document Parser (https://pdf.co/document-parser, https://app.pdf.co/document-parser)
      // to create templates.
      // Read template from file:
      var templateText = fs.readFileSync("./MultiPageTable-template1.yml", "utf-8");

      // URL for `Document Parser` API call
      var query = `https://api.pdf.co/v1/pdf/documentparser`;
      var jsonRequestObject = {
          url: uploadedFileUrl,
          template: templateText
      };

      request(
          {
              url: query,
              headers: { "x-api-key": API_KEY },
              method: "POST",
              json: true,
              body: jsonRequestObject
          },
          function (error, response, body) {

              if (error) {
                  return console.error("Error: ", error);
              }

              // Parse JSON response
              let data = JSON.parse(JSON.stringify(body));

              if (data.error == false) {
                  //Download generated file
                  var file = fs.createWriteStream(destinationFile);
                  https.get(data.url, (response2) => {
                      response2.pipe(file)
                          .on("close", () => {
                              console.log(`Generated result file saved as "${destinationFile}" file.`);
                          });
                  });
              }
              else {
                  // Service reported error
                  console.log("Error: " + data.message);
              }
          }
      );
  }

Python

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "*************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"

# Source PDF file
SourceFile = ".\\MultiPageTable.pdf"

# Destination JSON file name
DestinationFile = ".\\result.json"

// Template text. Use Document Parser (https://pdf.co/document-parser, https://app.pdf.co/document-parser)
# to create templates.
# Read template from file:
file_read = open(".\\MultiPageTable-template1.yml", mode='r', encoding="utf-8",errors="ignore")
Template = file_read.read()
file_read.close()

def main(args = None):
    uploadedFileUrl = uploadFile(SourceFile)

    if (uploadedFileUrl != None):
        PerformDocumentParser(uploadedFileUrl, Template, DestinationFile)

def PerformDocumentParser(uploadedFileUrl, template, destinationFile):

    # Content
    data = {
        'url': uploadedFileUrl,
        'template': template
    }

    # Prepare URL for 'Document Parser' API request
    url = "{}/pdf/documentparser".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data= data, headers={ "x-api-key": API_KEY })

    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")


def uploadFile(fileName):
    """Uploads file to the cloud"""

    # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.

    # Prepare URL for 'Get Presigned URL' API request
    url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format(
        BASE_URL, os.path.basename(fileName))

    # Execute request and get response as JSON
    response = requests.get(url, headers={"x-api-key": API_KEY})
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            # URL to use for file upload
            uploadUrl = json["presignedUrl"]
            # URL for future reference
            uploadedFileUrl = json["url"]

            # 2. UPLOAD FILE TO CLOUD.
            with open(fileName, 'rb') as file:
                requests.put(uploadUrl, data=file,
                            headers={"x-api-key": API_KEY, "content-type": "application/octet-stream"})

            return uploadedFileUrl
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

    return None


if __name__ == '__main__':
    main()

using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;

namespace PDFcoApiExample
{
    class Program
    {
        // The authentication key (API Key).
        // Get your own by registering at https://app.pdf.co
        const String API_KEY = "***********************************";

        // Source PDF file
        const string SourceFile = @".\MultiPageTable.pdf";
        // PDF document password. Leave empty for unprotected documents.
        const string Password = "";
        // Destination TXT file name
        const string DestinationFile = @".\result.json";

        static void Main(string[] args)
        {
            // Template text. Use Document Parser (https://pdf.co/document-parser, https://app.pdf.co/document-parser)
            // to create templates.
            // Read template from file:
            String templateText = File.ReadAllText(@".\MultiPageTable-template1.yml");

            // Create standard .NET web client instance
            WebClient webClient = new WebClient();

            // Set API Key
            webClient.Headers.Add("x-api-key", API_KEY);

            // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
            // * If you already have a direct file URL, skip to the step 3.

            // Prepare URL for `Get Presigned URL` API call
            string query = Uri.EscapeUriString(string.Format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
                Path.GetFileName(SourceFile)));

            try
            {
                // Execute request
                string response = webClient.DownloadString(query);

                // Parse JSON response
                JObject json = JObject.Parse(response);

                if (json["error"].ToObject<bool>() == false)
                {
                    // Get URL to use for the file upload
                    string uploadUrl = json["presignedUrl"].ToString();
                    string uploadedFileUrl = json["url"].ToString();

                    // 2. UPLOAD THE FILE TO CLOUD.

                    webClient.Headers.Add("content-type", "application/octet-stream");
                    webClient.UploadFile(uploadUrl, "PUT", SourceFile); // You can use UploadData() instead if your file is byte[] or Stream
                    webClient.Headers.Remove("content-type");

                    // 3. PARSE UPLOADED PDF DOCUMENT

                    // URL of `Document Parser` API call
                    string url = "https://api.pdf.co/v1/pdf/documentparser";

                    Dictionary<string, string> requestBody = new Dictionary<string, string>();
                    requestBody.Add("template", templateText);
                    requestBody.Add("name", Path.GetFileName(DestinationFile));
                    requestBody.Add("url", uploadedFileUrl);

                    // Convert dictionary of params to JSON
                    string jsonPayload = JsonConvert.SerializeObject(requestBody);

                    // Execute request
                    response = webClient.UploadString(url, "POST", jsonPayload);

                    // Parse response
                    json = JObject.Parse(response);

                    if (json["error"].ToObject<bool>() == false)
                    {
                        // Get URL of generated JSON file
                        string resultFileUrl = json["url"].ToString();

                        // Download JSON file
                        webClient.DownloadFile(resultFileUrl, DestinationFile);

                        Console.WriteLine("Generated JSON file saved as \"{0}\" file.", DestinationFile);
                    }
                    else
                    {
                        Console.WriteLine(json["message"].ToString());
                    }
                }
                else
                {
                    Console.WriteLine(json["message"].ToString());
                }
            }
            catch (WebException e)
            {
                Console.WriteLine(e.ToString());
            }

            webClient.Dispose();

            Console.WriteLine();
            Console.WriteLine("Press any key...");
            Console.ReadKey();
        }
    }
}

Java

package com.company;

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import com.google.gson.JsonPrimitive;
import okhttp3.*;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main
{
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    final static String API_KEY = "***********************************";

    public static void main(String[] args) throws IOException
    {
        // Source PDF file
        final Path SourceFile = Paths.get(".\\MultiPageTable.pdf");
        // PDF document password. Leave empty for unprotected documents.
        final String Password = "";
        // Destination JSON file name
        final Path DestinationFile = Paths.get(".\\result.json");

        // Template text. Use Document Parser (https://pdf.co/document-parser, https://app.pdf.co/document-parser)
        // to create templates.
        // Read template from file:
        String templateText = new String(Files.readAllBytes(Paths.get(".\\MultiPageTable-template1.yml")), StandardCharsets.UTF_8);

        // Create HTTP client instance
        OkHttpClient webClient = new OkHttpClient();

        // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
        // * If you already have a direct file URL, skip to the step 3.

        // Prepare URL for `Get Presigned URL` API call
        String query = String.format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=%s",
                SourceFile.getFileName());

        // Prepare request
        Request request = new Request.Builder()
                .url(query)
                .addHeader("x-api-key", API_KEY) // (!) Set API Key
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL to use for the file upload
                String uploadUrl = json.get("presignedUrl").getAsString();
                // Get URL of uploaded file to use with later API calls
                String uploadedFileUrl = json.get("url").getAsString();

                // 2. UPLOAD THE FILE TO CLOUD.

                if (uploadFile(webClient, API_KEY, uploadUrl, SourceFile))
                {
                    // 3. PARSE UPLOADED PDF DOCUMENT

                    ParseDocument(webClient, API_KEY, DestinationFile, Password, uploadedFileUrl, templateText);
                }
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static void ParseDocument(OkHttpClient webClient, String apiKey, Path destinationFile,
        String password, String uploadedFileUrl, String templateText) throws IOException
    {
        // Prepare POST request body in JSON format
        JsonObject jsonBody = new JsonObject();
        jsonBody.add("url", new JsonPrimitive(uploadedFileUrl));
        jsonBody.add("template", new JsonPrimitive(templateText));

        RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonBody.toString());

        // Prepare request to `Document Parser` API
        Request request = new Request.Builder()
                .url("https://api.pdf.co/v1/pdf/documentparser")
                .addHeader("x-api-key", API_KEY) // (!) Set API Key
                .addHeader("Content-Type", "application/json")
                .post(body)
                .build();

        // Execute request
        Response response = webClient.newCall(request).execute();

        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL of generated JSON file
                String resultFileUrl = json.get("url").getAsString();

                // Download JSON file
                downloadFile(webClient, resultFileUrl, destinationFile.toFile());

                System.out.printf("Generated JSON file saved as \"%s\" file.", destinationFile.toString());
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static boolean uploadFile(OkHttpClient webClient, String apiKey, String url, Path sourceFile) throws IOException
    {
        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/octet-stream"), sourceFile.toFile());

        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .addHeader("x-api-key", apiKey) // (!) Set API Key
                .addHeader("content-type", "application/octet-stream")
                .put(body)
                .build();

        // Execute request
        Response response = webClient.newCall(request).execute();

        return (response.code() == 200);
    }

    public static void downloadFile(OkHttpClient webClient, String url, File destinationFile) throws IOException
    {
        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        byte[] fileBytes = response.body().bytes();

        // Save downloaded bytes to file
        OutputStream output = new FileOutputStream(destinationFile);
        output.write(fileBytes);
        output.flush();
        output.close();

        response.close();
    }
}

PHP

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document Parse Results</title>
</head>
<body>

<?php


// Get submitted form data
$apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co


// 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
// * If you already have the direct PDF file link, go to the step 3.

// Create URL
$url = "https://api.pdf.co/v1/file/upload/get-presigned-url" .
    "?contenttype=application/octet-stream";

// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Execute request
$result = curl_exec($curl);

if (curl_errno($curl) == 0)
{
    $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

    if ($status_code == 200)
    {
        $json = json_decode($result, true);

        // Get URL to use for the file upload
        $uploadFileUrl = $json["presignedUrl"];
        // Get URL of uploaded file to use with later API calls
        $uploadedFileUrl = $json["url"];

        // 2. UPLOAD THE FILE TO CLOUD.

        $localFile = $_FILES["fileInput"]["tmp_name"];
        $fileHandle = fopen($localFile, "r");

        curl_setopt($curl, CURLOPT_URL, $uploadFileUrl);
        curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream"));
        curl_setopt($curl, CURLOPT_PUT, true);
        curl_setopt($curl, CURLOPT_INFILE, $fileHandle);
        curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile));

        // Execute request
        curl_exec($curl);

        fclose($fileHandle);

        if (curl_errno($curl) == 0)
        {
            $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

            if ($status_code == 200)
            {
                // Read all template texts
                $templateText = file_get_contents($_FILES["fileTemplate"]["tmp_name"]);

                // 3. PARSE UPLOADED PDF DOCUMENT
                ParseDocument($apiKey, $uploadedFileUrl, $templateText);
            }
            else
            {
                // Display request error
                echo "<p>Status code: " . $status_code . "</p>";
                echo "<p>" . $result . "</p>";
            }
        }
        else
        {
            // Display CURL error
            echo "Error: " . curl_error($curl);
        }
    }
    else
    {
        // Display service reported error
        echo "<p>Status code: " . $status_code . "</p>";
        echo "<p>" . $result . "</p>";
    }

    curl_close($curl);
}
else
{
    // Display CURL error
    echo "Error: " . curl_error($curl);
}

function ParseDocument($apiKey, $uploadedFileUrl, $templateText)
{
    // (!) Make asynchronous job
    $async = TRUE;

    // Prepare URL for Document parser API call.
    // See documentation: https://developer.pdf.co/api/document-parser/index.html
    $url = "https://api.pdf.co/v1/pdf/documentparser";

    // Prepare requests params
    $parameters = array();
    $parameters["url"] = $uploadedFileUrl;
    $parameters["template"] = $templateText;
    $parameters["async"] = $async;

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);
    echo $result . "<br/>";

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
                // URL of generated JSON file that will available after the job completion
                $resultFileUrl = $json["url"];
                // Asynchronous job ID
                $jobId = $json["jobId"];

                // Check the job status in a loop
                do
                {
                    $status = CheckJobStatus($jobId, $apiKey); // Possible statuses: "working", "failed", "aborted", "success".

                    // Display timestamp and status (for demo purposes)
                    echo "<p>" . date(DATE_RFC2822) . ": " . $status . "</p>";

                    if ($status == "success")
                    {
                        // Display link to JSON file with information about parsed fields
                        echo "<div><h2>Parsing Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>";
                        break;
                    }
                    else if ($status == "working")
                    {
                        // Pause for a few seconds
                        sleep(3);
                    }
                    else
                    {
                        echo $status . "<br/>";
                        break;
                    }
                }
                while (true);
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }
}

function CheckJobStatus($jobId, $apiKey)
{
    $status = null;

    // Create URL
    $url = "https://api.pdf.co/v1/job/check";

    // Prepare requests params
    $parameters = array();
    $parameters["jobid"] = $jobId;

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
                $status = $json["status"];
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);

    return $status;
}

?>

</body>
</html>

On Github#

Template samples#

Find templates to use with Document Parser here:

Template samples

Footnotes

1

Supports publicly accessible links from any source, including Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API, check out the File Upload section. Note: If you experience intermittent Access Denied or Too Many Requests errors, please try adding cache: to enable built-in URL caching (e.g., cache:https://example.com/file1.pdf). For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.

2(1,2)

Main response codes as follows:

Code	Description
`200`	Success
`400`	Bad request. Typically happens because of bad input parameters, or because the input URLs can’t be reached, possibly due to access restrictions like needing a login or password.
`401`	Unauthorized
`402`	Not enough credits
`445`	Timeout error. To process large documents or files please use asynchronous mode (set the `async` parameter to `true`) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using the `pages` parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.

Note

For more see the complete list of available response codes.

3

PDF.co Request size: API requests do not support request sizes of more than 4 megabytes in size. Please ensure that request sizes do not exceed this limit.

Was this page helpful?

Document Parser#

Built-in document parser templates#

How to classify incoming documents before parsing them?#

Additional Information and Tools#

Available Methods#

/pdf/documentparser#

Attributes#

Query parameters#

Payload 3#

Response 2#

CURL#

/pdf/documentparser/templates#

Query parameters#

Body payload#

Response 2#

CURL#

/pdf/documentparser/templates/:id#

Query parameters#

Body payload#

CURL#

Code samples#

On Github#

Template samples#

Are you a human?

Payload 3 #

Response 2 #

Response 2 #