AI Invoice Parser#
Process invoices faster than ever by extracting data and structuring it automatically with our advanced AI. Get quick and accurate data from any invoice, no matter the layout.
The AI Invoice parser automatically detects invoice layouts without the manual effort previously required to supply document parsing templates for reference.
Note
This method extracts data from your PDF invoices and returns a well-structured JSON format for your use.
Available Methods#
/ai-invoice-parser#
Method: POST
Endpoint: /v1/ai-invoice-parser
Attributes#
Note
Attributes are case-sensitive and should be inside JSON for POST request, for example:
{
"url": "https://example.com/file1.pdf",
"callback": "https://example.com/callback/url/you/provided"
}
Attribute |
Description |
Required |
---|---|---|
|
URL to the source file. 1 |
yes |
|
The callback URL (or Webhook) used to receive the |
no |
Payload 3#
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"callback": "https://example.com/callback/url/you/provided"
}
Response 2#
Note
You can use jobId
to identify the corresponding callback response. Use the /job/check API to poll the job status.
{
"error":false,
"status":"created",
"jobId":"7830deca-2e66-11ef-9ad3-8eff830e7461",
"credits":100,
"remainingCredits":106674,
"duration":33
}
Callback Response#
{
"status": "success",
"message": "Success",
"pageCount": 1,
"body": {
"vendor": {
"name": "ACME Inc.",
"address": {
"streetAddress": "1540 Long Street",
"city": "Jacksonville",
"state": "FL",
"postalCode": "32099",
"country": "US"
},
"contactInformation": {
"phone": "352-200-0371",
"fax": "904-787-9468"
}
},
"customer": {
"billTo": {
"name": "Lanny Lane Ltd.",
"address": {
"streetAddress": "82 Gorby Lane",
"city": "Columbia",
"state": "IN",
"postalCode": "39429",
"country": "US"
}
},
"shipTo": {
"name": "Same as recipient"
}
},
"invoice": {
"invoiceNo": "67893566",
"invoiceDate": "JAN 5, 2025"
},
"paymentDetails": {
"total": "$1,272.35",
"subtotal": "$1,262.35",
"tax": "$10.00",
"shipping": "$0.00"
},
"lineItems": [
[
{
"quantity": "2",
"description": "Item 1",
"unit_price": "9.95",
"total": "19.90"
},
{
"quantity": "5",
"description": "Item 2",
"unit_price": "20.00",
"total": "100.00"
},
{
"quantity": "1",
"description": "Item 3",
"unit_price": "19.95",
"total": "19.95"
},
{
"quantity": "1",
"description": "Item 4",
"unit_price": "123.00",
"total": "123.00"
},
{
"quantity": "10",
"description": "Item 5",
"unit_price": "99.95",
"total": "999.50"
}
]
]
},
"jobId": "7830deca-2e66-11ef-9ad3-8eff830e7461",
"credits": 100,
"remainingCredits": 106472,
"duration": 33
}
CURL#
curl -X POST \
-H "content-type: application/json" \
-H "x-api-key: <your_api_key>" \
-d '{ "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf", "callback": "https://example.com/callback/url/you/provided" }' \
https://api.pdf.co/v1/ai-invoice-parser
Invoice Schema#
The body
object contains all the metadata you need to understand your invoice content and contains the following attributes:
"body": {
"vendor": { .... },
"customer": { .... },
"invoice": { .... },
"paymentDetails": { .... },
"others": { .... },
"lineItems": { .... }
The vendor
Object#
An object
containing the customer details.
Attribute |
Type |
Description |
---|---|---|
|
string |
A |
|
object |
An |
|
object |
An |
|
object |
A |
The customer
object#
An object
containing the customer details.
Attribute |
Type |
Description |
---|---|---|
|
object |
An |
|
object |
An |
customer.billTo
#
Attribute |
Type |
Description |
---|---|---|
|
string |
The customer name |
|
object |
See the address object |
|
object |
See the contactInformation object |
|
string |
A |
customer.shipTo
#
Attribute |
Type |
Description |
---|---|---|
|
string |
The customer name |
|
object |
See the address object |
The invoice
object#
An object
containing the invoice details.
Attribute |
Type |
Description |
---|---|---|
|
string |
Invoice number |
|
string |
Date of invoice |
|
string |
Purchase order number |
|
string |
Sales order number |
The paymentDetails
object#
An object
containing the payment details.
Attribute |
Type |
Description |
---|---|---|
|
string |
Terms of payment |
|
string |
Payment due date |
|
string |
Total amount due |
|
string |
Subtotal amount |
|
string |
Tax amount |
|
string |
Discount amount |
|
string |
Shipping amount |
|
object |
An |
paymentDetails.bankingInformation
#
Attribute |
Type |
Description |
---|---|---|
|
string |
Name of the bank |
|
string |
Name of the account holder |
|
string |
Bank account number |
|
string |
International Bank Account Number (IBAN) |
|
string |
SWIFT/BIC code of the bank |
|
object |
See the address object |
|
string |
Routing code for domestic payments, such as the US routing number, UK’s sort code, and Australia’s BSB |
|
string |
Institution number within the Canadian banking network |
|
string |
Branch-specific code, such as Canada’s Transit Number, Brazil’s Branch code, and Israel’s Branch Code |
|
string |
Specifies the transaction’s intent |
|
string |
Payment instructions or other notes |
The others
object#
An object
containing the additional notes.
Attribute |
Type |
Description |
---|---|---|
|
string |
Additional notes such as delivery instructions or other notes |
The lineItems
object#
An object
detailing the line items in an invoice.
Important
Note: there is no common structure due to significant variability between invoices!
A typical invoice might list purchase items with details such as name
, quantity
or price
of each individual item.
Common objects#
There are a couble of objects which are commonly used in the schema in a few places, these are as follows.
address
object#
Attribute |
Type |
Description |
---|---|---|
|
string |
Street address |
|
string |
City name |
|
string |
State/county name |
|
string |
Postal/ZIP code |
|
string |
Country code/name |
contactInformation
object#
Attribute |
Type |
Description |
---|---|---|
|
string |
Phone number of the vendor |
|
string |
Fax number of the vendor |
|
string |
Email address of the vendor |
Setting up the Callback URL#
The callback URL should be a webhook which listens to responses from the parsing results. You can setup your own webhook or use one from a provider.
Note
If you are unsure about webhooks or callbacks, please read this Wikipedia article to get started.
Code samples#
var https = require("https");
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "YOUR_API_KEY_HERE";
// Direct URL of the source PDF file
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
const SourceFileUrl = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf";
// Prepare request to `AI Invoice Parser` API endpoint
var queryPath = `/v1/ai-invoice-parser`;
// JSON payload for api request
var jsonPayload = JSON.stringify({
url: SourceFileUrl
});
var reqOptions = {
host: "api.pdf.co",
method: "POST",
path: queryPath,
headers: {
"x-api-key": API_KEY,
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
}
};
var postRequest = https.request(reqOptions, (response) => {
let responseData = '';
response.on("data", (chunk) => {
responseData += chunk;
});
response.on("end", () => {
try {
// Parse JSON response
var data = JSON.parse(responseData);
if (data.error == false) {
console.log(`Job #${data.jobId} has been created!`);
checkIfJobIsCompleted(data.jobId, data.url);
}
else {
// Service reported error
console.log(data.message);
}
} catch (error) {
console.error("Error parsing JSON response:", error);
}
});
}).on("error", (e) => {
// Request error
console.log(e);
});
// Write request data
postRequest.write(jsonPayload);
postRequest.end();
function checkIfJobIsCompleted(jobId, resultFileUrl) {
let queryPath = `/v1/job/check`;
// JSON payload for api request
let jsonPayload = JSON.stringify({
jobid: jobId
});
let reqOptions = {
host: "api.pdf.co",
path: queryPath,
method: "POST",
headers: {
"x-api-key": API_KEY,
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
}
};
// Send request
var postRequest = https.request(reqOptions, (response) => {
let responseData = '';
response.setEncoding("utf8");
response.on("data", (chunk) => {
responseData += chunk;
});
response.on("end", () => {
try {
// Parse JSON response
let data = JSON.parse(responseData);
console.log(`Checking Job #${jobId}, Status: ${data.status}, Time: ${new Date().toLocaleString()}`);
if (data.status == "working") {
// Check again after 3 seconds
setTimeout(function(){ checkIfJobIsCompleted(jobId, resultFileUrl);}, 3000);
}
else if (data.status == "success") {
console.log("** Response **")
console.log(data);
}
else {
console.log(`Operation ended with status: "${data.status}".`);
}
} catch (error) {
console.error("Error parsing JSON response:", error);
}
});
});
// Write request data
postRequest.write(jsonPayload);
postRequest.end();
}
import os
import requests # pip install requests
import time
import datetime
# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "******************************************"
# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"
# Direct URL of source PDF file.
# You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
SourceFileURL = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf"
def main(args = None):
getParsedInvoice(SourceFileURL)
def getParsedInvoice(uploadedFileUrl):
"""AI Invoice Parser using PDF.co Web API"""
# Prepare requests params as JSON
# See documentation: https://apidocs.pdf.co
parameters = {}
parameters["url"] = uploadedFileUrl
# Prepare URL for 'AI Invoice Parser' API request
url = "{}/ai-invoice-parser".format(BASE_URL)
# Execute request and get response as JSON
response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY })
if (response.status_code == 200):
json = response.json()
if json["error"] == False:
# Asynchronous job ID
jobId = json["jobId"]
# Check the job status in a loop.
# If you don't want to pause the main thread you can rework the code
# to use a separate thread for the status checking and completion.
while True:
status = checkJobStatus(jobId) # Possible statuses: "working", "failed", "aborted", "success".
# Display timestamp and status (for demo purposes)
print(datetime.datetime.now().strftime("%H:%M.%S") + ": " + status)
if status == "success":
break
elif status == "working":
# Pause for a few seconds
time.sleep(3)
else:
print(status)
break
else:
# Show service reported error
print(json["message"])
else:
print(f"Request error: {response.status_code} {response.reason}")
def checkJobStatus(jobId):
"""Checks server job status"""
url = f"{BASE_URL}/job/check?jobid={jobId}"
response = requests.get(url, headers={ "x-api-key": API_KEY })
if (response.status_code == 200):
json = response.json()
if(json["status"]):
print("** Response **")
print(json)
return json["status"]
else:
print(f"Request error: {response.status_code} {response.reason}")
return None
if __name__ == '__main__':
main()
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System;
using System.Collections.Generic;
using System.Net;
using System.Threading;
namespace PDFcoApiExample
{
class Program
{
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const String API_KEY = "***********************************";
// Direct URL of Source PDF file
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
const string SourceFileURL = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf";
static void Main(string[] args)
{
// Create standard .NET web client instance
WebClient webClient = new WebClient();
// Set API Key
webClient.Headers.Add("x-api-key", API_KEY);
// URL for `AI Invoice Parser` API call
string url = "https://api.pdf.co/v1/ai-invoice-parser";
// Prepare requests params as JSON
Dictionary<string, object> parameters = new Dictionary<string, object>();
parameters.Add("url", SourceFileURL);
// Convert dictionary of params to JSON
string jsonPayload = JsonConvert.SerializeObject(parameters);
try
{
// Execute POST request with JSON payload
string response = webClient.UploadString(url, jsonPayload);
// Parse JSON response
JObject json = JObject.Parse(response);
if (json["error"].ToObject<bool>() == false)
{
// Asynchronous job ID
string jobId = json["jobId"].ToString();
// Check the job status in a loop.
// If you don't want to pause the main thread you can rework the code
// to use a separate thread for the status checking and completion.
do
{
string job_response = "";
string status = CheckJobStatus(jobId, out job_response); // Possible statuses: "working", "failed", "aborted", "success".
// Display timestamp and status (for demo purposes)
Console.WriteLine(DateTime.Now.ToLongTimeString() + ": " + status);
if (status == "success")
{
Console.WriteLine("** Final Response **");
Console.WriteLine(job_response);
break;
}
else if (status == "working")
{
// Pause for a few seconds
Thread.Sleep(3000);
}
else
{
Console.WriteLine(status);
break;
}
}
while (true);
}
else
{
Console.WriteLine(json["message"].ToString());
}
}
catch (WebException e)
{
Console.WriteLine(e.ToString());
}
webClient.Dispose();
Console.WriteLine();
Console.WriteLine("Press any key...");
Console.ReadKey();
}
static string CheckJobStatus(string jobId, out string response)
{
using (WebClient webClient = new WebClient())
{
// Set API Key
webClient.Headers.Add("x-api-key", API_KEY);
string url = "https://api.pdf.co/v1/job/check?jobid=" + jobId;
response = webClient.DownloadString(url);
JObject json = JObject.Parse(response);
return Convert.ToString(json["status"]);
}
}
}
}
package com.company;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import com.google.gson.JsonPrimitive;
import okhttp3.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
public class Main {
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
final static String API_KEY = "********************************";
// (!) Make asynchronous job
final static boolean Async = true;
public static void main(String[] args) throws IOException {
// Source PDF file
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
final String SourceFileUrl = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf";
// Create HTTP client instance
OkHttpClient webClient = new OkHttpClient();
// AI PARSE INVOICE
ParseInvoice(webClient, SourceFileUrl);
}
public static void ParseInvoice(OkHttpClient webClient, String uploadedFileUrl) throws IOException {
// Prepare POST request body in JSON format
JsonObject jsonBody = new JsonObject();
jsonBody.add("url", new JsonPrimitive(uploadedFileUrl));
RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonBody.toString());
// Prepare URL for AI Invoice Parser API call.
// See documentation: https://developer.pdf.co/api/ai-invoice-parser/index.html
String query = "https://api.pdf.co/v1/ai-invoice-parser";
DateTimeFormatter dtf = DateTimeFormatter.ofPattern("MM/dd/yyyy HH:mm:ss");
// Prepare request to `Document Parser` API
Request request = new Request.Builder()
.url(query)
.addHeader("x-api-key", API_KEY) // (!) Set API Key
.addHeader("Content-Type", "application/json")
.post(body)
.build();
// Execute request
Response response = webClient.newCall(request).execute();
if (response.code() == 200) {
// Parse JSON response
JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();
boolean error = json.get("error").getAsBoolean();
if (!error) {
// Asynchronous job ID
String jobId = json.get("jobId").getAsString();
System.out.println("Job#" + jobId + ": has been created. - " + dtf.format(LocalDateTime.now()));
// Check the job status in a loop.
// If you don't want to pause the main thread you can rework the code
// to use a separate thread for the status checking and completion.
do {
String status = CheckJobStatus(webClient, jobId); // Possible statuses: "working", "failed", "aborted", "success"
System.out.println("Job#" + jobId + ": " + status + " - " + dtf.format(LocalDateTime.now()));
if (status.compareToIgnoreCase("success") == 0) {
break;
} else if (status.compareToIgnoreCase("working") == 0) {
// Pause for a few seconds
try {
Thread.sleep(3000);
} catch (InterruptedException ex) {
Thread.currentThread().interrupt(); // restore interrupted status
}
} else {
System.out.println(status);
break;
}
} while (true);
} else {
// Display service reported error
System.out.println(json.get("message").getAsString());
}
} else {
// Display request error
System.out.println(response.code() + " " + response.message());
}
}
// Check Job Status
private static String CheckJobStatus(OkHttpClient webClient, String jobId) throws IOException {
String url = "https://api.pdf.co/v1/job/check?jobid=" + jobId;
String status = "";
// Prepare request
Request request = new Request.Builder()
.url(url)
.addHeader("x-api-key", API_KEY) // (!) Set API Key
.build();
// Execute request
Response response = webClient.newCall(request).execute();
if (response.code() == 200) {
// Parse JSON response
JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();
status = json.get("status").getAsString();
if(status.equals("success")){
System.out.println(json);
}
return status;
} else {
// Display request error
System.out.println(response.code() + " " + response.message());
}
return "Failed";
}
}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>AI Invoice Parser example.</title>
</head>
<body>
<?php
// PDF.co "AI Invoice Parser" code snippet.
// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
$apiKey = "YOUR_API_KEY";
// Direct URL of Source PDF file
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
$sourceFileUrl = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf";
// Prepare URL for `AI Invoice Parser` API call
$url = "https://api.pdf.co/v1/ai-invoice-parser";
// Prepare requests params
$parameters = array();
$parameters["url"] = $sourceFileUrl;
// Create Json payload
$data = json_encode($parameters);
// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
// Execute request
$result = curl_exec($curl);
echo $result . "<br/>";
if (curl_errno($curl) == 0)
{
$status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($status_code == 200)
{
$json = json_decode($result, true);
if (!isset($json["error"]) || $json["error"] == false)
{
// Asynchronous job ID
$jobId = $json["jobId"];
// Check the job status in a loop
do
{
$status = CheckJobStatus($jobId, $apiKey); // Possible statuses: "working", "failed", "aborted", "success".
// Display timestamp and status (for demo purposes)
echo "<p>" . date(DATE_RFC2822) . ": " . $status . "</p>";
if ($status == "success")
{
break;
}
else if ($status == "working")
{
// Pause for a few seconds
sleep(3);
}
else
{
echo $status . "<br/>";
break;
}
}
while (true);
}
else
{
// Display service reported error
echo "<p>Error: " . $json["message"] . "</p>";
}
}
else
{
// Display request error
echo "<p>Status code: " . $status_code . "</p>";
echo "<p>" . $result . "</p>";
}
}
else
{
// Display CURL error
echo "Error: " . curl_error($curl);
}
// Cleanup
curl_close($curl);
function CheckJobStatus($jobId, $apiKey)
{
$status = null;
// Create URL
$url = "https://api.pdf.co/v1/job/check";
// Prepare requests params
$parameters = array();
$parameters["jobid"] = $jobId;
// Create Json payload
$data = json_encode($parameters);
// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
// Execute request
$result = curl_exec($curl);
if (curl_errno($curl) == 0)
{
$status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($status_code == 200)
{
$json = json_decode($result, true);
if (!isset($json["error"]) || $json["error"] == false)
{
$status = $json["status"];
echo "<br/><br/><p>== Final Response ==</p>";
echo $result;
}
else
{
// Display service reported error
echo "<p>Error: " . $json["message"] . "</p>";
}
}
else
{
// Display request error
echo "<p>Status code: " . $status_code . "</p>";
echo "<p>" . $result . "</p>";
}
}
else
{
// Display CURL error
echo "Error: " . curl_error($curl);
}
// Cleanup
curl_close($curl);
return $status;
}
?>
</body>
</html>
On Github#
Footnotes
- 1
Supports publicly accessible links from any source, including Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API, check out the File Upload section. Note: If you experience intermittent Access Denied or Too Many Requests errors, please try adding
cache:
to enable built-in URL caching (e.g.,cache:https://example.com/file1.pdf
). For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.- 2
Main response codes as follows:
Code
Description
200
Success
400
Bad request. Typically happens because of bad input parameters, or because the input URLs can’t be reached, possibly due to access restrictions like needing a login or password.
401
Unauthorized
402
Not enough credits
445
Timeout error. To process large documents or files please use asynchronous mode (set the
async
parameter totrue
) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using thepages
parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.Note
For more see the complete list of available response codes.
- 3
PDF.co Request size: API requests do not support request sizes of more than
4
megabytes in size. Please ensure that request sizes do not exceed this limit.