PDF to Excel#

Convert PDF to spreadsheet with layout and fonts preserved.

Available Methods#

/pdf/convert/to/xls
/pdf/convert/to/xlsx

/pdf/convert/to/xls#

Method: POST
Endpoint: /v1/pdf/convert/to/xls

/pdf/convert/to/xlsx#

Method: POST
Endpoint: /v1/pdf/convert/to/xlsx

Attributes#

Note

Attributes are case-sensitive and should be inside JSON for POST request, for example:

{
    "url": "https://example.com/file1.pdf"
}

Attribute	Description	Required
`url`	URL to the source file. 1	yes
`httpusername`	HTTP auth user name if required to access source `url`.	no
`httppassword`	HTTP auth password if required to access source `url`.	no
`pages`	Specify page indices as comma-separated values or ranges to process (e.g. `"0, 1, 2-"` or `"1, 2, 3-7"`). The first-page index is `0`, Use `"!"` before a number for inverted page numbers (e.g. `"!0"` for the last page). If not specified, the default configuration processes all `pages`. The input must be in string format.	no
`unwrap`	Unwrap lines into a single line within table cells when `lineGrouping` is enabled. Must be one of: `true`, or `false`.	no
`rect`	Defines coordinates for extraction, e.g. `51.8, 114.8, 235.5, 204.0`. Use PDF Edit Add Helper to get or measure PDF coordinates. The input must be in string format.	no
`lang`	Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. The default is `eng`. Other languages are also supported: `deu`, `spa`, `chi_sim`, `jpn`, and many others, see Language Support. You can also use 2 languages simultaneously like this: `eng+deu` or `jpn+kor` (any combination).	no
`inline`	Set to `true` to return results inside the response. Otherwise, the endpoint will return a link to the output file generated.	no
`lineGrouping`	Line grouping within table cells. Set to `1` to enable the grouping. The input must be in string format.	no
`password`	Password of PDF file, the input must be in string format.	no
`async`	Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with the Background Job Check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks.	no
`name`	File name for the generated output, the input must be in string format.	no
`expiration`	Set the expiration time for the output link in minutes (default is `60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co Temporary Files Storage. The maximum duration for link expiration varies based on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf templates, documents) consider using PDF.co Built-In Files Storage.	no
`profiles`	Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more.	no

Query parameters#

No query parameters accepted.

Payload 3 #

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-excel/sample.pdf",
    "async": false
}

Response 2 #

{
    "url": "https://pdf-temp-files.s3.amazonaws.com/60c6b9f50280495a9567f73a0a394252/sample.xlsx",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample.xlsx",
    "remainingCredits": 60568
}

CURL#

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xlsx?=' \
--header 'x-api-key: *******************' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-excel/sample.pdf",
    "async": false
}'

Code samples (PDF to XLS)#

JavaScript / Node.js

var https = require("https");
var path = require("path");
var fs = require("fs");


// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "***********************************";


// Direct URL of source PDF file.
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
const SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-excel/sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const Pages = "";
// PDF document password. Leave empty for unprotected documents.
const Password = "";
// Destination XLS file name
const DestinationFile = "./result.xls";


// Prepare request to `PDF To XLS` API endpoint
var queryPath = `/v1/pdf/convert/to/xls`;

// JSON payload for api request
var jsonPayload = JSON.stringify({
    name: path.basename(DestinationFile), password: Password, pages: Pages, url: SourceFileUrl
});

var reqOptions = {
    host: "api.pdf.co",
    method: "POST",
    path: queryPath,
    headers: {
        "x-api-key": API_KEY,
        "Content-Type": "application/json",
        "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
    }
};
// Send request
var postRequest = https.request(reqOptions, (response) => {
    response.on("data", (d) => {
        // Parse JSON response
        var data = JSON.parse(d);
        if (data.error == false) {
            // Download XLS file
            var file = fs.createWriteStream(DestinationFile);
            https.get(data.url, (response2) => {
                response2.pipe(file)
                    .on("close", () => {
                        console.log(`Generated XLS file saved as "${DestinationFile}" file.`);
                    });
            });
        }
        else {
            // Service reported error
            console.log(data.message);
        }
    });
}).on("error", (e) => {
    // Request error
    console.log(e);
});

// Write request data
postRequest.write(jsonPayload);
postRequest.end();

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace PDFcoApiExample
{
    class Program
    {
        // The authentication key (API Key).
        // Get your own by registering at https://app.pdf.co
        const String API_KEY = "***********************************";

        // Direct URL of source PDF file.
        // You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
        const string SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-excel/sample.pdf";
        // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
        const string Pages = "";
        // PDF document password. Leave empty for unprotected documents.
        const string Password = "";
        // Destination XLS file name
        const string DestinationFile = @".\result.xls";

        static void Main(string[] args)
        {
            // Create standard .NET web client instance
            WebClient webClient = new WebClient();

            // Set API Key
            webClient.Headers.Add("x-api-key", API_KEY);

            // URL for `PDF To XLS` API call
            string url = "https://api.pdf.co/v1/pdf/convert/to/xls";

            // Prepare requests params as JSON
            Dictionary<string, object> parameters = new Dictionary<string, object>();
            parameters.Add("name", Path.GetFileName(DestinationFile));
            parameters.Add("password", Password);
            parameters.Add("pages", Pages);
            parameters.Add("url", SourceFileUrl);

            // Convert dictionary of params to JSON
            string jsonPayload = JsonConvert.SerializeObject(parameters);

            try
            {
                // Execute POST request with JSON payload
                string response = webClient.UploadString(url, jsonPayload);

                // Parse JSON response
                JObject json = JObject.Parse(response);

                if (json["error"].ToObject<bool>() == false)
                {
                    // Get URL of generated XLS file
                    string resultFileUrl = json["url"].ToString();

                    // Download XLS file
                    webClient.DownloadFile(resultFileUrl, DestinationFile);

                    Console.WriteLine("Generated XLS file saved as \"{0}\" file.", DestinationFile);
                }
                else
                {
                    Console.WriteLine(json["message"].ToString());
                }
            }
            catch (WebException e)
            {
                Console.WriteLine(e.ToString());
            }

            webClient.Dispose();

            Console.WriteLine();
            Console.WriteLine("Press any key...");
            Console.ReadKey();
        }
    }
}

Java

  package com.company;

  import com.google.gson.JsonObject;
  import com.google.gson.JsonParser;
  import okhttp3.*;

  import java.io.*;
  import java.net.*;
  import java.nio.file.Path;
  import java.nio.file.Paths;

  public class Main
  {
      // The authentication key (API Key).
      // Get your own by registering at https://app.pdf.co
      final static String API_KEY = "***********************************";

      // Direct URL of source PDF file.
      // You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
      final static String SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-excel/sample.pdf";
      // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
      final static String Pages = "";
      // PDF document password. Leave empty for unprotected documents.
      final static String Password = "";
      // Destination XLS file name
      final static Path DestinationFile = Paths.get(".\\result.xls");


      public static void main(String[] args) throws IOException
      {
          // Create HTTP client instance
          OkHttpClient webClient = new OkHttpClient();

          // Prepare URL for `PDF To XLS` API call
          String query = "https://api.pdf.co/v1/pdf/convert/to/xls";

          // Make correctly escaped (encoded) URL
          URL url = null;
          try
          {
              url = new URI(null, query, null).toURL();
          }
          catch (URISyntaxException e)
          {
              e.printStackTrace();
          }

          // Create JSON payload
          String jsonPayload = String.format("{\"name\": \"%s\", \"password\": \"%s\", \"pages\": \"%s\", \"url\": \"%s\"}",
                  DestinationFile.getFileName(),
                  Password,
                  Pages,
                  SourceFileUrl);

          // Prepare request body
          RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload);

          // Prepare request
          Request request = new Request.Builder()
              .url(url)
              .addHeader("x-api-key", API_KEY) // (!) Set API Key
              .addHeader("Content-Type", "application/json")
              .post(body)
              .build();

          // Execute request
          Response response = webClient.newCall(request).execute();

          if (response.code() == 200)
          {
              // Parse JSON response
              JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

              boolean error = json.get("error").getAsBoolean();
              if (!error)
              {
                  // Get URL of generated XLS file
                  String resultFileUrl = json.get("url").getAsString();

                  // Download XLS file
                  downloadFile(webClient, resultFileUrl, DestinationFile.toFile());

                  System.out.printf("Generated XLS file saved as \"%s\" file.", DestinationFile.toString());
              }
              else
              {
                  // Display service reported error
                  System.out.println(json.get("message").getAsString());
              }
          }
          else
          {
              // Display request error
              System.out.println(response.code() + " " + response.message());
          }
      }

      public static void downloadFile(OkHttpClient webClient, String url, File destinationFile) throws IOException
      {
          // Prepare request
          Request request = new Request.Builder()
                  .url(url)
                  .build();
          // Execute request
          Response response = webClient.newCall(request).execute();

          byte[] fileBytes = response.body().bytes();

          // Save downloaded bytes to file
          OutputStream output = new FileOutputStream(destinationFile);
          output.write(fileBytes);
          output.flush();
          output.close();

          response.close();
      }
  }



<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>PDF To Excel Extraction Results</title>
</head>
<body>

<?php
// Note: If you have input files large than 200kb we highly recommend to check "async" mode example.

// Get submitted form data
$apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co
$pages = $_POST["pages"];


// 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
// * If you already have the direct PDF file link, go to the step 3.

// Create URL
$url = "https://api.pdf.co/v1/file/upload/get-presigned-url" .
    "?name=" . urlencode($_FILES["file"]["name"]) .
    "&contenttype=application/octet-stream";

// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Execute request
$result = curl_exec($curl);

if (curl_errno($curl) == 0)
{
    $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

    if ($status_code == 200)
    {
        $json = json_decode($result, true);

        // Get URL to use for the file upload
        $uploadFileUrl = $json["presignedUrl"];
        // Get URL of uploaded file to use with later API calls
        $uploadedFileUrl = $json["url"];

        // 2. UPLOAD THE FILE TO CLOUD.
        //echo json_encode($_FILES["file"]);
        //print_r($_FILES["file"]);
        $localFile = $_FILES["file"]["tmp_name"];
        $fileHandle = fopen($localFile, "r");

        curl_setopt($curl, CURLOPT_URL, $uploadFileUrl);
        curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream"));
        curl_setopt($curl, CURLOPT_PUT, true);
        curl_setopt($curl, CURLOPT_INFILE, $fileHandle);
        curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile));

        // Execute request
        curl_exec($curl);

        fclose($fileHandle);

        if (curl_errno($curl) == 0)
        {
            $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

            if ($status_code == 200)
            {
                // 3. CONVERT UPLOADED PDF FILE TO Excel

                ExtractExcel($apiKey, $uploadedFileUrl, $pages);
            }
            else
            {
                // Display request error
                echo "<p>Status code: " . $status_code . "</p>";
                echo "<p>" . $result . "</p>";
            }
        }
        else
        {
            // Display CURL error
            echo "Error: " . curl_error($curl);
        }
    }
    else
    {
        // Display service reported error
        echo "<p>Status code: " . $status_code . "</p>";
        echo "<p>" . $result . "</p>";
    }

    curl_close($curl);
}
else
{
    // Display CURL error
    echo "Error: " . curl_error($curl);
}

function ExtractExcel($apiKey, $uploadedFileUrl, $pages)
{
    // Create URL
    $url = "https://api.pdf.co/v1/pdf/convert/to/xlsx";
    // (!) If you need the old XLS format use `https://api.pdf.co/v1/pdf/convert/to/xls` endpoint,

    // Prepare requests params
    $parameters = array();
    $parameters["url"] = $uploadedFileUrl;
    $parameters["pages"] = $pages;
$parameters["async"] = true;  // (!) Make asynchronous job

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
              // URL of generated XLSX file that will available after the job completion
              $resultFileUrl = $json["url"];
              // Asynchronous job ID
              $jobId = $json["jobId"];

              // Check the job status in a loop
              do
              {
                  $status = CheckJobStatus($jobId, $apiKey); // Possible statuses: "working", "failed", "aborted", "success".

                  // Display timestamp and status (for demo purposes)
                  echo "<p>" . date(DATE_RFC2822) . ": " . $status . "</p>";

                  if ($status == "success")
                  {
                      // Display link to the file with conversion results
                      echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>";
                      break;
                  }
                  else if ($status == "working")
                  {
                      // Pause for a few seconds
                      sleep(3);
                  }
                  else
                  {
                      echo $status . "<br/>";
                      break;
                  }
              }
              while (true);
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);
}

function CheckJobStatus($jobId, $apiKey)
{
    $status = null;

  // Create URL
    $url = "https://api.pdf.co/v1/job/check";

    // Prepare requests params
    $parameters = array();
    $parameters["jobid"] = $jobId;

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
                $status = $json["status"];
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);

    return $status;
}

?>

</body>
</html>

On Github#

Code samples (PDF to XLSX)#

JavaScript / Node.js

var https = require("https");
var path = require("path");
var fs = require("fs");


// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "***********************************";


// Direct URL of source PDF file.
// You can also upload your own file into PDF.co and use it as url. Check "Upload File" samples for code snippets: https://github.com/bytescout/pdf-co-api-samples/tree/master/File%20Upload/
const SourceFileUrl = "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-excel/sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const Pages = "";
// PDF document password. Leave empty for unprotected documents.
const Password = "";
// Destination XLSX file name
const DestinationFile = "./result.xlsx";


// Prepare request to `PDF To XLSX` API endpoint
var queryPath = `/v1/pdf/convert/to/xlsx`;

// JSON payload for api request
var jsonPayload = JSON.stringify({
    name: path.basename(DestinationFile), password: Password, pages: Pages, url: SourceFileUrl
});

var reqOptions = {
    host: "api.pdf.co",
    method: "POST",
    path: queryPath,
    headers: {
        "x-api-key": API_KEY,
        "Content-Type": "application/json",
        "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
    }
};
// Send request
var postRequest = https.request(reqOptions, (response) => {
    response.on("data", (d) => {
        // Parse JSON response
        var data = JSON.parse(d);
        if (data.error == false) {
            // Download XLSX file
            var file = fs.createWriteStream(DestinationFile);
            https.get(data.url, (response2) => {
                response2.pipe(file)
                .on("close", () => {
                    console.log(`Generated XLSX file saved as "${DestinationFile}" file.`);
                });
            });
        }
        else {
            // Service reported error
            console.log(data.message);
        }
    });
}).on("error", (e) => {
    // Request error
    console.log(e);
});

// Write request data
postRequest.write(jsonPayload);
postRequest.end();

Python

import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "***************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"

# Source PDF file
SourceFile = ".\\sample.pdf"
# Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
Pages = ""
# PDF document password. Leave empty for unprotected documents.
Password = ""
# Destination Excel file name
DestinationFile = ".\\result.xlsx"


def main(args = None):
    uploadedFileUrl = uploadFile(SourceFile)
    if (uploadedFileUrl != None):
        convertPdfToExcel(uploadedFileUrl, DestinationFile)


def convertPdfToExcel(uploadedFileUrl, destinationFile):
    """Converts PDF To Excel using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://developer.pdf.co/
    parameters = {}
    parameters["name"] = os.path.basename(destinationFile)
    parameters["password"] = Password
    parameters["pages"] = Pages
    parameters["url"] = uploadedFileUrl

    # Prepare URL for 'PDF To Xlsx' API request
    url = "{}/pdf/convert/to/xlsx".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")


def uploadFile(fileName):
    """Uploads file to the cloud"""

    # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.

    # Prepare URL for 'Get Presigned URL' API request
    url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format(
        BASE_URL, os.path.basename(fileName))

    # Execute request and get response as JSON
    response = requests.get(url, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            # URL to use for file upload
            uploadUrl = json["presignedUrl"]
            # URL for future reference
            uploadedFileUrl = json["url"]

            # 2. UPLOAD FILE TO CLOUD.
            with open(fileName, 'rb') as file:
                requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" })

            return uploadedFileUrl
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

    return None


if __name__ == '__main__':
    main()

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace PDFcoApiExample
{
  class Program
  {
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    const String API_KEY = "***********************************";

    // Source PDF file
    const string SourceFile = @".\sample.pdf";
    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    const string Pages = "";
    // PDF document password. Leave empty for unprotected documents.
    const string Password = "";
    // Destination XLSX file name
    const string DestinationFile = @".\result.xlsx";

    static void Main(string[] args)
    {
      // Create standard .NET web client instance
      WebClient webClient = new WebClient();

      // Set API Key
      webClient.Headers.Add("x-api-key", API_KEY);

      // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
      // * If you already have a direct file URL, skip to the step 3.

      // Prepare URL for `Get Presigned URL` API call
      string query = Uri.EscapeUriString(string.Format(
        "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
        Path.GetFileName(SourceFile)));

      try
      {
        // Execute request
        string response = webClient.DownloadString(query);

        // Parse JSON response
        JObject json = JObject.Parse(response);

        if (json["error"].ToObject<bool>() == false)
        {
          // Get URL to use for the file upload
          string uploadUrl = json["presignedUrl"].ToString();
          string uploadedFileUrl = json["url"].ToString();

          // 2. UPLOAD THE FILE TO CLOUD.

          webClient.Headers.Add("content-type", "application/octet-stream");
          webClient.UploadFile(uploadUrl, "PUT", SourceFile); // You can use UploadData() instead if your file is byte[] or Stream
          webClient.Headers.Remove("content-type");

          // 3. CONVERT UPLOADED PDF FILE TO XLSX

          // URL for `PDF To XLSX` API call
          var url = "https://api.pdf.co/v1/pdf/convert/to/xlsx";

          // Prepare requests params as JSON
          Dictionary<string, object> parameters = new Dictionary<string, object>();
          parameters.Add("name", Path.GetFileName(DestinationFile));
          parameters.Add("password", Password);
          parameters.Add("pages", Pages);
          parameters.Add("url", uploadedFileUrl);

          // Convert dictionary of params to JSON
          string jsonPayload = JsonConvert.SerializeObject(parameters);

          // Execute POST request with JSON payload
          response = webClient.UploadString(url, jsonPayload);

          // Parse JSON response
          json = JObject.Parse(response);

          if (json["error"].ToObject<bool>() == false)
          {
            // Get URL of generated XLSX file
            string resultFileUrl = json["url"].ToString();

            // Download XLSX file
            webClient.DownloadFile(resultFileUrl, DestinationFile);

            Console.WriteLine("Generated XLSX file saved as \"{0}\" file.", DestinationFile);
          }
          else
          {
            Console.WriteLine(json["message"].ToString());
          }
        }
        else
        {
          Console.WriteLine(json["message"].ToString());
        }
      }
      catch (WebException e)
      {
        Console.WriteLine(e.ToString());
      }

      webClient.Dispose();


      Console.WriteLine();
      Console.WriteLine("Press any key...");
      Console.ReadKey();
    }
  }
}

Java

package com.company;

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import okhttp3.*;

import java.io.*;
import java.net.*;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main
{
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    final static String API_KEY = "***********************************";

    // Source PDF file
    final static Path SourceFile = Paths.get(".\\sample.pdf");
    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    final static String Pages = "";
    // PDF document password. Leave empty for unprotected documents.
    final static String Password = "";
    // Destination XLSX file name
    final static Path DestinationFile = Paths.get(".\\result.xlsx");


    public static void main(String[] args) throws IOException
    {
        // Create HTTP client instance
        OkHttpClient webClient = new OkHttpClient();

        // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
        // * If you already have a direct file URL, skip to the step 3.

        // Prepare URL for `Get Presigned URL` API call
        String query = String.format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=%s",
                SourceFile.getFileName());

        // Prepare request
        Request request = new Request.Builder()
                .url(query)
                .addHeader("x-api-key", API_KEY) // (!) Set API Key
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL to use for the file upload
                String uploadUrl = json.get("presignedUrl").getAsString();
                // Get URL of uploaded file to use with later API calls
                String uploadedFileUrl = json.get("url").getAsString();

                // 2. UPLOAD THE FILE TO CLOUD.

                if (uploadFile(webClient, API_KEY, uploadUrl, SourceFile))
                {
                    // 3. CONVERT UPLOADED PDF FILE TO XLSX

                    PdfToXlsx(webClient, API_KEY, DestinationFile, Password, Pages, uploadedFileUrl);
                }
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static void PdfToXlsx(OkHttpClient webClient, String apiKey, Path destinationFile,
        String password, String pages, String uploadedFileUrl) throws IOException
    {
        // Prepare URL for `PDF To XLSX` API call
        String query = "https://api.pdf.co/v1/pdf/convert/to/xlsx";

        // Make correctly escaped (encoded) URL
        URL url = null;
        try
        {
            url = new URI(null, query, null).toURL();
        }
        catch (URISyntaxException e)
        {
            e.printStackTrace();
        }

        // Create JSON payload
    String jsonPayload = String.format("{\"name\": \"%s\", \"password\": \"%s\", \"pages\": \"%s\", \"url\": \"%s\"}",
                destinationFile.getFileName(),
                password,
                pages,
                uploadedFileUrl);

        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload);

        // Prepare request
        Request request = new Request.Builder()
            .url(url)
            .addHeader("x-api-key", API_KEY) // (!) Set API Key
            .addHeader("Content-Type", "application/json")
            .post(body)
            .build();

        // Execute request
        Response response = webClient.newCall(request).execute();


        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL of generated XLSX file
                String resultFileUrl = json.get("url").getAsString();

                // Download XLSX file
                downloadFile(webClient, resultFileUrl, destinationFile.toFile());

                System.out.printf("Generated XLSX file saved as \"%s\" file.", destinationFile.toString());
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static boolean uploadFile(OkHttpClient webClient, String apiKey, String url, Path sourceFile) throws IOException
    {
        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/octet-stream"), sourceFile.toFile());

        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .addHeader("x-api-key", apiKey) // (!) Set API Key
                .addHeader("content-type", "application/octet-stream")
                .put(body)
                .build();

        // Execute request
        Response response = webClient.newCall(request).execute();

        return (response.code() == 200);
    }

    public static void downloadFile(OkHttpClient webClient, String url, File destinationFile) throws IOException
    {
        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        byte[] fileBytes = response.body().bytes();

        // Save downloaded bytes to file
        OutputStream output = new FileOutputStream(destinationFile);
        output.write(fileBytes);
        output.flush();
        output.close();

        response.close();
    }
}

PHP

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>PDF To Excel Extraction Results</title>
</head>
<body>

<?php
// Note: If you have input files large than 200kb we highly recommend to check "async" mode example.

// Get submitted form data
$apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co
$pages = $_POST["pages"];


// 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
// * If you already have the direct PDF file link, go to the step 3.

// Create URL
$url = "https://api.pdf.co/v1/file/upload/get-presigned-url" .
    "?name=" . urlencode($_FILES["file"]["name"]) .
    "&contenttype=application/octet-stream";

// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Execute request
$result = curl_exec($curl);

if (curl_errno($curl) == 0)
{
    $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

    if ($status_code == 200)
    {
        $json = json_decode($result, true);

        // Get URL to use for the file upload
        $uploadFileUrl = $json["presignedUrl"];
        // Get URL of uploaded file to use with later API calls
        $uploadedFileUrl = $json["url"];

        // 2. UPLOAD THE FILE TO CLOUD.
        //echo json_encode($_FILES["file"]);
        //print_r($_FILES["file"]);
        $localFile = $_FILES["file"]["tmp_name"];
        $fileHandle = fopen($localFile, "r");

        curl_setopt($curl, CURLOPT_URL, $uploadFileUrl);
        curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream"));
        curl_setopt($curl, CURLOPT_PUT, true);
        curl_setopt($curl, CURLOPT_INFILE, $fileHandle);
        curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile));

        // Execute request
        curl_exec($curl);

        fclose($fileHandle);

        if (curl_errno($curl) == 0)
        {
            $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

            if ($status_code == 200)
            {
                // 3. CONVERT UPLOADED PDF FILE TO Excel

                ExtractExcel($apiKey, $uploadedFileUrl, $pages);
            }
            else
            {
                // Display request error
                echo "<p>Status code: " . $status_code . "</p>";
                echo "<p>" . $result . "</p>";
            }
        }
        else
        {
            // Display CURL error
            echo "Error: " . curl_error($curl);
        }
    }
    else
    {
        // Display service reported error
        echo "<p>Status code: " . $status_code . "</p>";
        echo "<p>" . $result . "</p>";
    }

    curl_close($curl);
}
else
{
    // Display CURL error
    echo "Error: " . curl_error($curl);
}

function ExtractExcel($apiKey, $uploadedFileUrl, $pages)
{
    // Create URL
    $url = "https://api.pdf.co/v1/pdf/convert/to/xlsx";
    // (!) If you need the old XLS format use `https://api.pdf.co/v1/pdf/convert/to/xls` endpoint,

    // Prepare requests params
    $parameters = array();
    $parameters["url"] = $uploadedFileUrl;
    $parameters["pages"] = $pages;
$parameters["async"] = true;  // (!) Make asynchronous job

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
              // URL of generated XLSX file that will available after the job completion
              $resultFileUrl = $json["url"];
              // Asynchronous job ID
              $jobId = $json["jobId"];

              // Check the job status in a loop
              do
              {
                  $status = CheckJobStatus($jobId, $apiKey); // Possible statuses: "working", "failed", "aborted", "success".

                  // Display timestamp and status (for demo purposes)
                  echo "<p>" . date(DATE_RFC2822) . ": " . $status . "</p>";

                  if ($status == "success")
                  {
                      // Display link to the file with conversion results
                      echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>";
                      break;
                  }
                  else if ($status == "working")
                  {
                      // Pause for a few seconds
                      sleep(3);
                  }
                  else
                  {
                      echo $status . "<br/>";
                      break;
                  }
              }
              while (true);
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);
}

function CheckJobStatus($jobId, $apiKey)
{
    $status = null;

  // Create URL
    $url = "https://api.pdf.co/v1/job/check";

    // Prepare requests params
    $parameters = array();
    $parameters["jobid"] = $jobId;

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
                $status = $json["status"];
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);

    return $status;
}

?>

</body>
</html>

On Github#

Footnotes

1

Supports publicly accessible links from any source, including Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API, check out the File Upload section. Note: If you experience intermittent Access Denied or Too Many Requests errors, please try adding cache: to enable built-in URL caching (e.g., cache:https://example.com/file1.pdf). For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.

2

Main response codes as follows:

Code	Description
`200`	Success
`400`	Bad request. Typically happens because of bad input parameters, or because the input URLs can’t be reached, possibly due to access restrictions like needing a login or password.
`401`	Unauthorized
`402`	Not enough credits
`445`	Timeout error. To process large documents or files please use asynchronous mode (set the `async` parameter to `true`) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using the `pages` parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.

Note

For more see the complete list of available response codes.

3

PDF.co Request size: API requests do not support request sizes of more than 4 megabytes in size. Please ensure that request sizes do not exceed this limit.

Was this page helpful?

PDF to Excel#

Available Methods#

/pdf/convert/to/xls#

/pdf/convert/to/xlsx#

Attributes#

Query parameters#

Payload 3#

Response 2#

CURL#

Code samples (PDF to XLS)#

On Github#

Code samples (PDF to XLSX)#

On Github#

Are you a human?

Payload 3 #

Response 2 #