PDF to XML

`POST /v1/pdf/convert/to/xml`

Attributes

Attributes are case-sensitive and should be inside JSON for POST request. for example: { "url": "https://example.com/file1.pdf" }

Attribute	Type	Required	Default	Description
`url`	string	Yes	-	URL to the source file `url` attribute
`callback`	string	No	-	The callback URL (or Webhook) used to receive the POST data. see Webhooks & Callbacks. This is only applicable when `async` is set to `true`.
`httpusername`	string	No	-	HTTP auth user name if required to access source URL.
`httppassword`	string	No	-	HTTP auth password if required to access source URL.
`pages`	string	No	all pages	Specify page indices as comma-separated values or ranges to process (e.g. “0, 1, 2-” or “1, 2, 3-7”). The first-page index is 0. Use ”!” before a number for inverted page numbers (e.g. “!0” for the last page). If not specified, the default configuration processes all pages. The input must be in string format.
`unwrap`	boolean	No	`false`	Unwrap lines into a single line within table cells in provided PDF documents. This is only applicable when `lineGrouping` is set to `1`.
`rect`	string	No	-	Defines coordinates for extraction. Use`PDF Edit Add Helper`to get or measure PDF coordinates. The format is `{x} {y} {width} {height}`.
`lang`	string	No	`eng`	Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. see Language Support. You can also use 2 languages simultaneously like this: `eng+deu` (any combination).
`inline`	boolean	No	`false`	Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated.
`lineGrouping`	string	No	-	Controls how lines of text are grouped when extracting data from a PDF. Line grouping within table cells. The available modes are: `1`, `2`, `3`. For more information, see Line Grouping.
`password`	string	No	-	Password for the PDF file.
`async`	boolean	No	`false`	Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks
`name`	string	No	-	File name for the generated output, the input must be in string format.
`expiration`	integer	No	`60`	Set the expiration time for the output link in minutes. After this specified duration, any generated output file(s) will be automatically deleted from PDF.co Temporary Files Storage. The maximum duration for link expiration varies based on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf templates, documents) consider using PDF.co Built-In Files Storage.
`profiles`	object	No	-	See Profiles for more information.
`OCRMode`	string	No	`Auto`	Specifies how OCR (Optical Character Recognition) should process input content, offering various modes to tailor text extraction based on content type such as images, fonts, and vector graphics. For more information, see OCR Extraction Modes.
`OCRResolution`	integer	No	`300`	Use this parameter to change the OCR resolution from the default 300 dpi. The range is from `72` to `1200` dpi.
`RotationAngle`	integer	No	-	Use manual rotation to handle PDFs with vertically drawn text. Normally, OCR automatically detects page rotation in PDFs and extracts text accurately. However, in some cases, the PDF might not have an actual rotated page --- Rather, the text itself is drawn vertically. In such scenarios, auto-detection may fail. You can use this parameter to manually set the page rotation. The available angles are: `0`, `1`, `2`, `3`.
`LineGroupingMode`	string	No	`None`	Controls line grouping in PDF text extraction. Modes: `None` (no grouping), `GroupByRows` (merge rows if all cells align), `GroupByColumns` (merge cells by column), `JoinOrphanedRows` (merge single-cell rows to above if no separator).
`ConsiderFontColors`	boolean	No	`false`	Controls whether font colors should be considered when detecting table structure and merging text objects during PDF extraction. Set to true to consider font colors.
`DetectNewColumnBySpacesRatio`	string	No	`1.2`	Controls how spaces between words are interpreted for column detection in PDF text extraction. It defines the ratio of space width that determines when text should be treated as being in separate columns.
`AutoAlignColumnsToHeader`	boolean	No	`true`	Controls how columns are detected and aligned during table extraction from PDF documents. It affects both table structure detection and text extraction with formatting preservation. Set to true to automatically align columns to the header row. When set to true (default), the row with the most columns is used as the header, and all other rows are aligned to this structure --- ideal for well-structured tables. When set to false, columns are analyzed independently across all rows to build the structure, which works better for inconsistent or irregular tables.
`OCRImagePreprocessingFilters.AddGammaCorrection()`	array[string (float format)]	No	`["1.4"]`	Adds a gamma correction filter to the image preprocessing pipeline used during OCR (Optical Character Recognition). This filter adjusts the brightness and contrast of an image by applying a non-linear gamma correction to improve text recognition quality.
`OCRImagePreprocessingFilters.AddGrayscale()`	boolean	No	`false`	Set to true to preprocessing filter that converts a colored document/image to grayscale before performing OCR
`SaveVectors`	boolean	No	`false`	Controls whether to save vector graphics during PDF to HTML conversion. Set to true to save vector graphics.
`SaveImages`	string	No	`None`	Controls how images are saved during PDF to HTML conversion. Modes: `None` (no images), `OuterFile` (save to sub-folder), `Embed` (embed as Base64 data:URI).
`ConsiderFontSizes`	boolean	No	`false`	Set to true to this parameter makes the converter consider font size differences in document text when detecting and parsing table structures. This can be helpful in cases where tables are formatted using different font sizes to distinguish between headers, data cells, or other structural elements.
`ExtractionArea`	array[numbe]	No	-	Extract text in a specific area by defining the extraction area - set with points in the format [x, y, width, height].
`ExtractShadowLikeText`	boolean	No	`true`	Controls whether to extract invisible text from a PDF document. Set to false to skip over invisible text during extraction. This is particularly useful when dealing with PDFs that contain hidden text layers or when you only want to extract visible content. When this value is set to false, OCRMode must be set to `Auto` to properly apply the shadow text filtering effect.
`DataEncryptionAlgorithm`	string	No	-	Controls the encryption algorithm used for data encryption. See User-Controlled Encryption for more information. The available algorithms are: `AES128`, `AES192`, `AES256`.
`DataEncryptionKey`	string	No	-	Controls the encryption key used for data encryption. See User-Controlled Encryption for more information.
`DataEncryptionIV`	string	No	-	Controls the encryption IV used for data encryption. See User-Controlled Encryption for more information.
`DataDecryptionAlgorithm`	string	No	-	Controls the decryption algorithm used for data decryption. See User-Controlled Encryption for more information. The available algorithms are: `AES128`, `AES192`, `AES256`.
`DataDecryptionKey`	string	No	-	Controls the decryption key used for data decryption. See User-Controlled Encryption for more information.
`DataDecryptionIV`	string	No	-	Controls the decryption IV used for data decryption. See User-Controlled Encryption for more information.

You can use profiles to control the convert process and output of the CSV file.

Line Grouping Options

"1": GroupByRows – Each row is checked against the next row to see if they can be grouped together. Rows will only be grouped if all cells in the current row can be grouped with all cells in the next row. Useful when merging related content that spans multiple lines but belongs to the same logical row.
"2": GroupByColumns – Each cell is checked against the cell below it in the next row to determine if they can be grouped. Cells are grouped within the same column even if others can’t be grouped. Useful for columnar data where content in each column might span multiple lines.
"3": JoinOrphanedRows – Joins a row with a single cell to the previous row if there is no separator between them. Useful for handling cases with orphaned or misaligned content.

Query parameters

No query parameters accepted.

Responses

Parameter	Type	Description
`url`	string	Direct URL to the final PDF file stored in S3.
`outputLinkValidTill`	string	Timestamp indicating when the output link will expire
`pageCount`	integer	Number of pages in the PDF document.
`error`	boolean	Indicates whether an error occurred (`false` means success)
`status`	string	Status code of the request (200, 404, 500, etc.). For more information, see Response Codes.
`name`	string	Name of the output file
`credits`	integer	Number of credits consumed by the request
`remainingCredits`	integer	Number of credits remaining in the account
`duration`	integer	Time taken for the operation in milliseconds

`Example` Payload

To see the request size limits, please refer to the Request Size Limits.

{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
  "async": false
}

`Example` Response

To see the main response codes, please refer to the Response Codes page.

{
  "body": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<document>\r\n <page index=\"0\">\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"24.0\" fontStyle=\"Bold\" color=\"#538DD3\" x=\"36.00\" y=\"34.44\" width=\"242.81\" height=\"24.00\">Your Company Name</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"76.94\" width=\"66.62\" height=\"11.04\">Your Address</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"91.46\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"461.02\" y=\"115.94\" width=\"98.42\" height=\"11.04\">Invoice No. 123456</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"436.54\" y=\"130.46\" width=\"122.90\" height=\"11.04\">Invoice Date 01/01/2016</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"154.94\" width=\"63.62\" height=\"11.04\">Client Name</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"169.70\" width=\"40.34\" height=\"11.04\">Address</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"184.22\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"233.30\" width=\"28.70\" height=\"11.04\">Notes</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"316.25\" width=\"22.58\" height=\"11.04\">Item</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"247.61\" y=\"316.25\" width=\"44.64\" height=\"11.04\">Quantity</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"398.95\" y=\"316.25\" width=\"26.91\" height=\"11.04\">Price</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"533.14\" y=\"316.25\" width=\"26.30\" height=\"11.04\">Total</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"341.33\" width=\"30.62\" height=\"11.04\">Item 1</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"341.33\" width=\"6.12\" height=\"11.04\">1</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"341.33\" width=\"27.51\" height=\"11.04\">40.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"341.33\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"362.45\" width=\"30.62\" height=\"11.04\">Item 2</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"362.45\" width=\"6.12\" height=\"11.04\">2</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"362.45\" width=\"27.51\" height=\"11.04\">30.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"362.45\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"383.57\" width=\"30.62\" height=\"11.04\">Item 3</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"383.57\" width=\"6.12\" height=\"11.04\">3</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"383.57\" width=\"27.51\" height=\"11.04\">20.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"383.57\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"404.93\" width=\"30.62\" height=\"11.04\">Item 4</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"404.93\" width=\"6.12\" height=\"11.04\">4</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"404.93\" width=\"27.51\" height=\"11.04\">10.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"404.93\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"389.11\" y=\"425.83\" width=\"36.75\" height=\"11.04\">TOTAL</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"525.82\" y=\"425.83\" width=\"33.62\" height=\"11.04\">200.00</text>\r\n </column>\r\n </row>\r\n </page>\r\n</document>",
  "pageCount": 1,
  "error": false,
  "status": 200,
  "name": "sample.xml",
  "remainingCredits": 60563
}

Code Samples

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xml' \
--header 'x-api-key: *******************' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
"async": false
}'

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xml' \
--header 'x-api-key: *******************' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
"async": false
}'

var https = require("https");
var path = require("path");
var fs = require("fs");

// `request` module is required for file upload.
// Use "npm install request" command to install.
var request = require("request");

// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "***********************************";


// Source PDF file
const SourceFile = "./sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const Pages = "";
// PDF document password. Leave empty for unprotected documents.
const Password = "";
// Destination XML file name
const DestinationFile = "./result.xml";


// 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.
getPresignedUrl(API_KEY, SourceFile)
    .then(([uploadUrl, uploadedFileUrl]) => {
        // 2. UPLOAD THE FILE TO CLOUD.
        uploadFile(API_KEY, SourceFile, uploadUrl)
            .then(() => {
                // 3. CONVERT UPLOADED PDF FILE TO XML
                convertPdfToXml(API_KEY, uploadedFileUrl, Password, Pages, DestinationFile);
            })
            .catch(e => {
                console.log(e);
            });
    })
    .catch(e => {
        console.log(e);
    });


function getPresignedUrl(apiKey, localFile) {
    return new Promise(resolve => {
        // Prepare request to `Get Presigned URL` API endpoint
        let queryPath = `/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=${path.basename(SourceFile)}`;
        let reqOptions = {
            host: "api.pdf.co",
            path: encodeURI(queryPath),
            headers: { "x-api-key": API_KEY }
        };
        // Send request
        https.get(reqOptions, (response) => {
            response.on("data", (d) => {
                let data = JSON.parse(d);
                if (data.error == false) {
                    // Return presigned url we received
                    resolve([data.presignedUrl, data.url]);
                }
                else {
                    // Service reported error
                    console.log("getPresignedUrl(): " + data.message);
                }
            });
        })
            .on("error", (e) => {
                // Request error
                console.log("getPresignedUrl(): " + e);
            });
    });
}

function uploadFile(apiKey, localFile, uploadUrl) {
    return new Promise(resolve => {
        fs.readFile(SourceFile, (err, data) => {
            request({
                method: "PUT",
                url: uploadUrl,
                body: data,
                headers: {
                    "Content-Type": "application/octet-stream"
                }
            }, (err, res, body) => {
                if (!err) {
                    resolve();
                }
                else {
                    console.log("uploadFile() request error: " + e);
                }
            });
        });
    });
}

function convertPdfToXml(apiKey, uploadedFileUrl, password, pages, destinationFile) {
    // Prepare request to `PDF To XML` API endpoint
    var queryPath = `/v1/pdf/convert/to/xml`;

    // JSON payload for api request
    var jsonPayload = JSON.stringify({
        name: path.basename(destinationFile), password: password, pages: pages, url: uploadedFileUrl
    });

    var reqOptions = {
        host: "api.pdf.co",
        method: "POST",
        path: queryPath,
        headers: {
            "x-api-key": apiKey,
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
        }
    };
    // Send request
    var postRequest = https.request(reqOptions, (response) => {
        response.on("data", (d) => {
            response.setEncoding("utf8");
            // Parse JSON response
            let data = JSON.parse(d);
            if (data.error == false) {
                // Download XML file
                var file = fs.createWriteStream(destinationFile);
                https.get(data.url, (response2) => {
                    response2.pipe(file)
                        .on("close", () => {
                            console.log(`Generated XML file saved as "${destinationFile}" file.`);
                        });
                });
            }
            else {
                // Service reported error
                console.log("convertPdfToXml(): " + data.message);
            }
        });
    })
        .on("error", (e) => {
            // Request error
            console.log("convertPdfToXml(): " + e);
        });

    // Write request data
    postRequest.write(jsonPayload);
    postRequest.end();

}

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "******************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"

# Source PDF file
SourceFile = ".\\sample.pdf"
# Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
Pages = ""
# PDF document password. Leave empty for unprotected documents.
Password = ""
# Destination XML file name
DestinationFile = ".\\result.xml"


def main(args = None):
    uploadedFileUrl = uploadFile(SourceFile)
    if (uploadedFileUrl != None):
        convertPdfToXml(uploadedFileUrl, DestinationFile)


def convertPdfToXml(uploadedFileUrl, destinationFile):
    """Converts PDF To XML using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://developer.pdf.co/api/pdf-to-xml
    parameters = {}
    parameters["name"] = os.path.basename(destinationFile)
    parameters["password"] = Password
    parameters["pages"] = Pages
    parameters["url"] = uploadedFileUrl

    # Prepare URL for 'PDF To XML' API request
    url = "{}/pdf/convert/to/xml".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")


def uploadFile(fileName):
    """Uploads file to the cloud"""

    # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.

    # Prepare URL for 'Get Presigned URL' API request
    url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format(
        BASE_URL, os.path.basename(fileName))

    # Execute request and get response as JSON
    response = requests.get(url, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            # URL to use for file upload
            uploadUrl = json["presignedUrl"]
            # URL for future reference
            uploadedFileUrl = json["url"]

            # 2. UPLOAD FILE TO CLOUD.
            with open(fileName, 'rb') as file:
                requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" })

            return uploadedFileUrl
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

    return None


if __name__ == '__main__':
    main()

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace PDFcoApiExample
{
  class Program
  {
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    const String API_KEY = "***********************************";

    // Source PDF file
    const string SourceFile = @".\sample.pdf";
    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    const string Pages = "";
    // PDF document password. Leave empty for unprotected documents.
    const string Password = "";
    // Destination XML file name
    const string DestinationFile = @".\result.xml";

    static void Main(string[] args)
    {
      // Create standard .NET web client instance
      WebClient webClient = new WebClient();

      // Set API Key
      webClient.Headers.Add("x-api-key", API_KEY);

      // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
      // * If you already have a direct file URL, skip to the step 3.

      // Prepare URL for `Get Presigned URL` API call
      string query = Uri.EscapeUriString(string.Format(
        "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
        Path.GetFileName(SourceFile)));

      try
      {
        // Execute request
        string response = webClient.DownloadString(query);

        // Parse JSON response
        JObject json = JObject.Parse(response);

        if (json["error"].ToObject<bool>() == false)
        {
          // Get URL to use for the file upload
          string uploadUrl = json["presignedUrl"].ToString();
          string uploadedFileUrl = json["url"].ToString();

          // 2. UPLOAD THE FILE TO CLOUD.

          webClient.Headers.Add("content-type", "application/octet-stream");
          webClient.UploadFile(uploadUrl, "PUT", SourceFile); // You can use UploadData() instead if your file is byte[] or Stream
          webClient.Headers.Remove("content-type");

          // 3. CONVERT UPLOADED PDF FILE TO XML

          // URL for `PDF To XML` API call
          var url = "https://api.pdf.co/v1/pdf/convert/to/xml";

          // Prepare requests params as JSON
          Dictionary<string, object> parameters = new Dictionary<string, object>();
          parameters.Add("name", Path.GetFileName(DestinationFile));
          parameters.Add("password", Password);
          parameters.Add("pages", Pages);
          parameters.Add("url", uploadedFileUrl);

          // Convert dictionary of params to JSON
          string jsonPayload = JsonConvert.SerializeObject(parameters);

          // Execute POST request with JSON payload
          response = webClient.UploadString(url, jsonPayload);

          // Parse JSON response
          json = JObject.Parse(response);

          if (json["error"].ToObject<bool>() == false)
          {
            // Get URL of generated XML file
            string resultFileUrl = json["url"].ToString();

            // Download XML file
            webClient.DownloadFile(resultFileUrl, DestinationFile);

            Console.WriteLine("Generated XML file saved as \"{0}\" file.", DestinationFile);
          }
          else
          {
            Console.WriteLine(json["message"].ToString());
          }
        }
        else
        {
          Console.WriteLine(json["message"].ToString());
        }
      }
      catch (WebException e)
      {
        Console.WriteLine(e.ToString());
      }

      webClient.Dispose();

      Console.WriteLine();
      Console.WriteLine("Press any key...");
      Console.ReadKey();
    }
  }
}

package com.company;

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import okhttp3.*;

import java.io.*;
import java.net.*;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main
{
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    final static String API_KEY = "***********************************";

    // Source PDF file
    final static Path SourceFile = Paths.get(".\\sample.pdf");
    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    final static String Pages = "";
    // PDF document password. Leave empty for unprotected documents.
    final static String Password = "";
    // Destination XML file name
    final static Path DestinationFile = Paths.get(".\\result.xml");


    public static void main(String[] args) throws IOException
    {
        // Create HTTP client instance
        OkHttpClient webClient = new OkHttpClient();

        // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
        // * If you already have a direct file URL, skip to the step 3.

        // Prepare URL for `Get Presigned URL` API call
        String query = String.format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=%s",
                SourceFile.getFileName());

        // Prepare request
        Request request = new Request.Builder()
                .url(query)
                .addHeader("x-api-key", API_KEY) // (!) Set API Key
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL to use for the file upload
                String uploadUrl = json.get("presignedUrl").getAsString();
                // Get URL of uploaded file to use with later API calls
                String uploadedFileUrl = json.get("url").getAsString();

                // 2. UPLOAD THE FILE TO CLOUD.

                if (uploadFile(webClient, API_KEY, uploadUrl, SourceFile))
                {
                    // 3. CONVERT UPLOADED PDF FILE TO XML

                    PdfToXml(webClient, API_KEY, DestinationFile, Password, Pages, uploadedFileUrl);
                }
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static void PdfToXml(OkHttpClient webClient, String apiKey, Path destinationFile,
        String password, String pages, String uploadedFileUrl) throws IOException
    {
        // Prepare URL for `PDF To XML` API call
        String query = "https://api.pdf.co/v1/pdf/convert/to/xml";

        // Make correctly escaped (encoded) URL
        URL url = null;
        try
        {
            url = new URI(null, query, null).toURL();
        }
        catch (URISyntaxException e)
        {
            e.printStackTrace();
        }

        // Create JSON payload
    String jsonPayload = String.format("{\"name\": \"%s\", \"password\": \"%s\", \"pages\": \"%s\", \"url\": \"%s\"}",
                destinationFile.getFileName(),
                password,
                pages,
                uploadedFileUrl);

        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload);

        // Prepare request
        Request request = new Request.Builder()
            .url(url)
            .addHeader("x-api-key", API_KEY) // (!) Set API Key
            .addHeader("Content-Type", "application/json")
            .post(body)
            .build();

        // Execute request
        Response response = webClient.newCall(request).execute();


        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL of generated XML file
                String resultFileUrl = json.get("url").getAsString();

                // Download XML file
                downloadFile(webClient, resultFileUrl, destinationFile.toFile());

                System.out.printf("Generated XML file saved as \"%s\" file.", destinationFile.toString());
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static boolean uploadFile(OkHttpClient webClient, String apiKey, String url, Path sourceFile) throws IOException
    {
        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/octet-stream"), sourceFile.toFile());

        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .addHeader("x-api-key", apiKey) // (!) Set API Key
                .addHeader("content-type", "application/octet-stream")
                .put(body)
                .build();

        // Execute request
        Response response = webClient.newCall(request).execute();

        return (response.code() == 200);
    }

    public static void downloadFile(OkHttpClient webClient, String url, File destinationFile) throws IOException
    {
        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        byte[] fileBytes = response.body().bytes();

        // Save downloaded bytes to file
        OutputStream output = new FileOutputStream(destinationFile);
        output.write(fileBytes);
        output.flush();
        output.close();

        response.close();
    }
}

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>PDF To XML Extraction Results</title>
</head>
<body>

<?php
// Note: If you have input files large than 200kb we highly recommend to check "async" mode example.

// Get submitted form data
$apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co
$pages = $_POST["pages"];


// 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
// * If you already have the direct PDF file link, go to the step 3.

// Create URL
$url = "https://api.pdf.co/v1/file/upload/get-presigned-url" .
    "?name=" . urlencode($_FILES["file"]["name"]) .
    "&contenttype=application/octet-stream";

// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Execute request
$result = curl_exec($curl);

if (curl_errno($curl) == 0)
{
    $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

    if ($status_code == 200)
    {
        $json = json_decode($result, true);

        // Get URL to use for the file upload
        $uploadFileUrl = $json["presignedUrl"];
        // Get URL of uploaded file to use with later API calls
        $uploadedFileUrl = $json["url"];

        // 2. UPLOAD THE FILE TO CLOUD.

        $localFile = $_FILES["file"]["tmp_name"];
        $fileHandle = fopen($localFile, "r");

        curl_setopt($curl, CURLOPT_URL, $uploadFileUrl);
        curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream"));
        curl_setopt($curl, CURLOPT_PUT, true);
        curl_setopt($curl, CURLOPT_INFILE, $fileHandle);
        curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile));

        // Execute request
        curl_exec($curl);

        fclose($fileHandle);

        if (curl_errno($curl) == 0)
        {
            $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

            if ($status_code == 200)
            {
                // 3. CONVERT UPLOADED PDF FILE TO XML

                ExtractXML($apiKey, $uploadedFileUrl, $pages);
            }
            else
            {
                // Display request error
                echo "<p>Status code: " . $status_code . "</p>";
                echo "<p>" . $result . "</p>";
            }
        }
        else
        {
            // Display CURL error
            echo "Error: " . curl_error($curl);
        }
    }
    else
    {
        // Display service reported error
        echo "<p>Status code: " . $status_code . "</p>";
        echo "<p>" . $result . "</p>";
    }

    curl_close($curl);
}
else
{
    // Display CURL error
    echo "Error: " . curl_error($curl);
}

function ExtractXML($apiKey, $uploadedFileUrl, $pages)
{
    // Create URL
    $url = "https://api.pdf.co/v1/pdf/convert/to/xml";

    // Prepare requests params
    $parameters = array();
    $parameters["url"] = $uploadedFileUrl;
    $parameters["pages"] = $pages;

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
                $resultFileUrl = $json["url"];

                // Display link to the file with conversion results
                echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>";
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);
}

?>

</body>
</html>

Welcome

Extraction

Editing

PDF Conversion

Excel Conversion

PDF Merging & Splitting

Forms

Find & Search

Document, File & System

Pages

Barcodes

Glossary

`POST /v1/pdf/convert/to/xml`

Attributes

Line Grouping Options

Query parameters

Responses

`Example` Payload

`Example` Response

Code Samples

Welcome

Extraction

Editing

PDF Conversion

Excel Conversion

PDF Merging & Splitting

Forms

Find & Search

Document, File & System

Pages

Barcodes

Glossary

​POST /v1/pdf/convert/to/xml

​Attributes

​Line Grouping Options

​Query parameters

​Responses

​Example Payload

​Example Response

​Code Samples

`POST /v1/pdf/convert/to/xml`

Attributes

Line Grouping Options

Query parameters

Responses

`Example` Payload

`Example` Response

Code Samples