PDF Make Text Searchable or Unsearchable#

/pdf/makesearchable#

This method converts scanned PDF documents (where pages are fully or partially made from scanned images) or image files into a text-searchable PDF. It runs OCR and adds an invisible text layer on top of your document that can be used for text search, text indexing, etc.

  • Method: POST

  • Endpoint: /v1/pdf/makesearchable

Attributes#

Note

Attributes are case-sensitive and should be inside JSON for POST request, for example:

{
    "url": "https://example.com/file1.pdf"
}

Attribute

Description

Required

url

URL to the source file. 1

If the URL is an image file (jpg, png, tif) then the result will be converted into a text-searchable PDF.

yes

httpusername

HTTP auth user name if required to access source url.

no

httppassword

HTTP auth password if required to access source url.

no

lang

Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. The default is eng. Other languages are also supported: deu, spa, chi_sim, jpn, and many others, see Language Support. You can also use 2 languages simultaneously like this: eng+deu or jpn+kor (any combination).

no

pages

Specify page indices as comma-separated values or ranges to process (e.g. "0, 1, 2-" or "1, 2, 3-7"). The first-page index is 0, Use "!" before a number for inverted page numbers (e.g. "!0" for the last page). If not specified, the default configuration processes all pages. The input must be in string format.

no

password

Password of PDF file, the input must be in string format.

no

async

Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks.

no

name

File name for the generated output, the input must be in string format.

no

expiration

Set the expiration time for the output link in minutes (default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co Temporary Files Storage. The maximum duration for link expiration varies based on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf templates, documents) consider using PDF.co Built-In Files Storage.

no

profiles

Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more.

no

Query parameters#

No query parameters accepted.

Payload#

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-make-searchable/sample.pdf",
    "lang": "eng",
    "pages": "",
    "name": "result.pdf",
    "password": "",
    "async": "false",
    "profiles": ""
}

Response 2#

{
    "url": "https://pdf-temp-files.s3.amazonaws.com/a0d52f35504e47148d1771fce875db7b/result.pdf",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "result.pdf",
    "remainingCredits": 99033681,
    "credits": 35
}

CURL#

curl --location --request POST 'https://api.pdf.co/v1/pdf/makesearchable' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-make-searchable/sample.pdf",
    "lang": "eng",
    "pages": "",
    "name": "result.pdf",
    "password": "",
    "async": "false",
    "profiles": ""
}'


Language Support#

Code

Description

afr

Afrikaans

amh

Amharic

ara

Arabic

asm

Assamese

aze

Azerbaijani

aze_cyrl

Azerbaijani - Cyrillic

bel

Belarusian

ben

Bengali

bod

Tibetan

bos

Bosnian

bul

Bulgarian

cat

Catalan; Valencian

ceb

Cebuano

ces

Czech

chi_sim

Chinese - Simplified

chi_tra

Chinese - Traditional

chr

Cherokee

cym

Welsh

dan

Danish

deu

German

dzo

Dzongkha

ell

Greek, Modern (1453-)

eng

English

enm

English, Middle (1100-1500)

epo

Esperanto

est

Estonian

eus

Basque

fas

Persian

fin

Finnish

fra

French

frk

Frankish

frm

French, Middle (ca. 1400-1600)

gle

Irish

glg

Galician

grc

Greek, Ancient (-1453)

guj

Gujarati

hat

Haitian; Haitian Creole

heb

Hebrew

hin

Hindi

hrv

Croatian

hun

Hungarian

iku

Inuktitut

ind

Indonesian

isl

Icelandic

ita

Italian

ita_old

Italian - Old

jav

Javanese

jpn

Japanese

kan

Kannada

kat

Georgian

kat_old

Georgian - Old

kaz

Kazakh

khm

Central Khmer

kir

Kirghiz; Kyrgyz

kor

Korean

kur

Kurdish

lao

Lao

lat

Latin

lav

Latvian

lit

Lithuanian

mal

Malayalam

mar

Marathi

mkd

Macedonian

mlt

Maltese

msa

Malay

mya

Burmese

nep

Nepali

nld

Dutch; Flemish

nor

Norwegian

ori

Oriya

pan

Panjabi; Punjabi

pol

Polish

por

Portuguese

pus

Pushto; Pashto

ron

Romanian; Moldavian; Moldovan

rus

Russian

san

Sanskrit

sin

Sinhala; Sinhalese

slk

Slovak

slv

Slovenian

spa

Spanish; Castilian

spa_old

Spanish; Castilian - Old

sqi

Albanian

srp

Serbian

srp_latn

Serbian - Latin

swa

Swahili

swe

Swedish

syr

Syriac

tam

Tamil

tel

Telugu

tgk

Tajik

tgl

Tagalog

tha

Thai

tir

Tigrinya

tur

Turkish

uig

Uighur; Uyghur

ukr

Ukrainian

urd

Urdu

uzb

Uzbek

uzb_cyrl

Uzbek - Cyrillic

vie

Vietnamese

yid

Yiddish


Code samples#

var https = require("https");
var path = require("path");
var fs = require("fs");

// `request` module is required for file upload.
// Use "npm install request" command to install.
var request = require("request");

// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "***********************************";


// Source PDF file
const SourceFile = "./sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const Pages = "";
// PDF document password. Leave empty for unprotected documents.
const Password = "";
// OCR language. "eng", "fra", "deu", "spa"  supported currently. Let us know if you need more.
const Language = "eng";
// Destination PDF file name
const DestinationFile = "./result.pdf";


// 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.
getPresignedUrl(API_KEY, SourceFile)
    .then(([uploadUrl, uploadedFileUrl]) => {
        // 2. UPLOAD THE FILE TO CLOUD.
        uploadFile(API_KEY, SourceFile, uploadUrl)
            .then(() => {
                // 3. MAKE UPLOADED PDF FILE SEARCHABLE
                makePdfSearchable(API_KEY, uploadedFileUrl, Password, Pages, Language, DestinationFile);
            })
            .catch(e => {
                console.log(e);
            });
    })
    .catch(e => {
        console.log(e);
    });


function getPresignedUrl(apiKey, localFile) {
    return new Promise(resolve => {
        // Prepare request to `Get Presigned URL` API endpoint
        let queryPath = `/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=${path.basename(SourceFile)}`;
        let reqOptions = {
            host: "api.pdf.co",
            path: encodeURI(queryPath),
            headers: { "x-api-key": API_KEY }
        };
        // Send request
        https.get(reqOptions, (response) => {
            response.on("data", (d) => {
                let data = JSON.parse(d);
                if (data.error == false) {
                    // Return presigned url we received
                    resolve([data.presignedUrl, data.url]);
                }
                else {
                    // Service reported error
                    console.log("getPresignedUrl(): " + data.message);
                }
            });
        })
            .on("error", (e) => {
                // Request error
                console.log("getPresignedUrl(): " + e);
            });
    });
}

function uploadFile(apiKey, localFile, uploadUrl) {
    return new Promise(resolve => {
        fs.readFile(SourceFile, (err, data) => {
            request({
                method: "PUT",
                url: uploadUrl,
                body: data,
                headers: {
                    "Content-Type": "application/octet-stream"
                }
            }, (err, res, body) => {
                if (!err) {
                    resolve();
                }
                else {
                    console.log("uploadFile() request error: " + e);
                }
            });
        });
    });
}

function makePdfSearchable(apiKey, uploadedFileUrl, password, pages, language, destinationFile) {
    // Prepare request to `Make Searchable PDF` API endpoint
    var queryPath = `/v1/pdf/makesearchable`;

    // JSON payload for api request
    var jsonPayload = JSON.stringify({
        name: path.basename(destinationFile), password: password, pages: pages, lang: language, url: uploadedFileUrl, async: true
    });

    var reqOptions = {
        host: "api.pdf.co",
        method: "POST",
        path: queryPath,
        headers: {
            "x-api-key": apiKey,
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
        }
    };
    // Send request
    var postRequest = https.request(reqOptions, (response) => {
        response.on("data", (d) => {
            response.setEncoding("utf8");
            // Parse JSON response
            let data = JSON.parse(d);
            if (data.error == false) {
                console.log(`Job #${data.jobId} has been created!`);
                checkIfJobIsCompleted(data.jobId, data.url, destinationFile);
            }
            else {
                // Service reported error
                console.log("makePdfSearchable(): " + data.message);
            }
        });
    })
        .on("error", (e) => {
            // Request error
            console.log("makePdfSearchable(): " + e);
        });


    // Write request data
    postRequest.write(jsonPayload);
    postRequest.end();
}


function checkIfJobIsCompleted(jobId, resultFileUrl, destinationFile) {
    let queryPath = `/v1/job/check`;

    // JSON payload for api request
    let jsonPayload = JSON.stringify({
        jobid: jobId
    });

    let reqOptions = {
        host: "api.pdf.co",
        path: queryPath,
        method: "POST",
        headers: {
            "x-api-key": API_KEY,
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
        }
    };

    // Send request
    var postRequest = https.request(reqOptions, (response) => {
        response.on("data", (d) => {
            response.setEncoding("utf8");

            // Parse JSON response
            let data = JSON.parse(d);
            console.log(`Checking Job #${jobId}, Status: ${data.status}, Time: ${new Date().toLocaleString()}`);

            if (data.status == "working") {
                // Check again after 3 seconds
                setTimeout(function () { checkIfJobIsCompleted(jobId, resultFileUrl, destinationFile); }, 3000);
            }
            else if (data.status == "success") {
                // Download PDF file
                var file = fs.createWriteStream(destinationFile);
                https.get(resultFileUrl, (response2) => {
                    response2.pipe(file)
                        .on("close", () => {
                            console.log(`Generated PDF file saved as "${destinationFile}" file.`);
                        });
                });
            }
            else {
                console.log(`Operation ended with status: "${data.status}".`);
            }
        })
    });

    // Write request data
    postRequest.write(jsonPayload);
    postRequest.end();
}
import os
import requests # pip install requests

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "******************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"

# Source PDF file
SourceFile = ".\\sample.pdf"
# Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
Pages = ""
# PDF document password. Leave empty for unprotected documents.
Password = ""
# OCR language. "eng", "fra", "deu", "spa"  supported currently. Let us know if you need more.
Language = "eng"
# Destination PDF file name
DestinationFile = ".\\result.pdf"


def main(args = None):
    uploadedFileUrl = uploadFile(SourceFile)
    if (uploadedFileUrl != None):
        makeSearchablePDF(uploadedFileUrl, DestinationFile)


def makeSearchablePDF(uploadedFileUrl, destinationFile):
    """Make Uploaded PDF file Searchable using PDF.co Web API"""

    # Prepare requests params as JSON
    # See documentation: https://developer.pdf.co/
    parameters = {}
    parameters["name"] = os.path.basename(destinationFile)
    parameters["password"] = Password
    parameters["pages"] = Pages
    parameters["lang"] = Language
    parameters["url"] = uploadedFileUrl

    # Prepare URL for 'Make Searchable PDF' API request
    url = "{}/pdf/makesearchable".format(BASE_URL)

    # Execute request and get response as JSON
    response = requests.post(url, data=parameters, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            #  Get URL of result file
            resultFileUrl = json["url"]
            # Download result file
            r = requests.get(resultFileUrl, stream=True)
            if (r.status_code == 200):
                with open(destinationFile, 'wb') as file:
                    for chunk in r:
                        file.write(chunk)
                print(f"Result file saved as \"{destinationFile}\" file.")
            else:
                print(f"Request error: {response.status_code} {response.reason}")
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")


def uploadFile(fileName):
    """Uploads file to the cloud"""

    # 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.

    # Prepare URL for 'Get Presigned URL' API request
    url = "{}/file/upload/get-presigned-url?contenttype=application/octet-stream&name={}".format(
        BASE_URL, os.path.basename(fileName))

    # Execute request and get response as JSON
    response = requests.get(url, headers={ "x-api-key": API_KEY })
    if (response.status_code == 200):
        json = response.json()

        if json["error"] == False:
            # URL to use for file upload
            uploadUrl = json["presignedUrl"]
            # URL for future reference
            uploadedFileUrl = json["url"]

            # 2. UPLOAD FILE TO CLOUD.
            with open(fileName, 'rb') as file:
                requests.put(uploadUrl, data=file, headers={ "x-api-key": API_KEY, "content-type": "application/octet-stream" })

            return uploadedFileUrl
        else:
            # Show service reported error
            print(json["message"])
    else:
        print(f"Request error: {response.status_code} {response.reason}")

    return None


if __name__ == '__main__':
    main()
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

namespace ByteScoutWebApiExample
{
  class Program
  {
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    const String API_KEY = "***********************************";

    // Source PDF file
    const string SourceFile = @".\sample.pdf";
    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    const string Pages = "";
    // PDF document password. Leave empty for unprotected documents.
    const string Password = "";
    // OCR language. "eng", "fra", "deu", "spa"  supported currently. Let us know if you need more.
    const string Language = "eng";
    // Destination PDF file name
    const string DestinationFile = @".\result.pdf";

    static void Main(string[] args)
    {
      // Create standard .NET web client instance
      WebClient webClient = new WebClient();

      // Set API Key
      webClient.Headers.Add("x-api-key", API_KEY);

      // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
      // * If you already have a direct file URL, skip to the step 3.

      // Prepare URL for `Get Presigned URL` API call
      string query = Uri.EscapeUriString(string.Format(
        "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name={0}",
        Path.GetFileName(SourceFile)));

      try
      {
        // Execute request
        string response = webClient.DownloadString(query);

        // Parse JSON response
        JObject json = JObject.Parse(response);

        if (json["error"].ToObject<bool>() == false)
        {
          // Get URL to use for the file upload
          string uploadUrl = json["presignedUrl"].ToString();
          string uploadedFileUrl = json["url"].ToString();

          // 2. UPLOAD THE FILE TO CLOUD.

          webClient.Headers.Add("content-type", "application/octet-stream");
          webClient.UploadFile(uploadUrl, "PUT", SourceFile); // You can use UploadData() instead if your file is byte[] or Stream

          // 3. MAKE UPLOADED PDF FILE SEARCHABLE

          // URL for `Make Searchable PDF` API call
          var url = "https://api.pdf.co/v1/pdf/makesearchable";

          // Prepare requests params as JSON
          Dictionary<string, object> parameters = new Dictionary<string, object>();
          parameters.Add("name", Path.GetFileName(DestinationFile));
          parameters.Add("password", Password);
          parameters.Add("url", uploadedFileUrl);
          parameters.Add("pages", Pages);
          parameters.Add("lang", Language);

          // Convert dictionary of params to JSON
          string jsonPayload = JsonConvert.SerializeObject(parameters);

          // Execute POST request with JSON payload
          response = webClient.UploadString(url, jsonPayload);

          // Parse JSON response
          json = JObject.Parse(response);

          if (json["error"].ToObject<bool>() == false)
          {
            // Get URL of generated PDF file
            string resultFileUrl = json["url"].ToString();

            // Download PDF file
            webClient.DownloadFile(resultFileUrl, DestinationFile);

            Console.WriteLine("Generated PDF file saved as \"{0}\" file.", DestinationFile);
          }
          else
          {
            Console.WriteLine(json["message"].ToString());
          }
        }
        else
        {
          Console.WriteLine(json["message"].ToString());
        }
      }
      catch (WebException e)
      {
        Console.WriteLine(e.ToString());
      }

      webClient.Dispose();

      Console.WriteLine();
      Console.WriteLine("Press any key...");
      Console.ReadKey();
    }
  }
}
package com.company;

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import okhttp3.*;

import java.io.*;
import java.net.*;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main
{
    // The authentication key (API Key).
    // Get your own by registering at https://app.pdf.co
    final static String API_KEY = "***********************************";

    // Source PDF file
    final static Path SourceFile = Paths.get(".\\sample.pdf");
    // Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
    final static String Pages = "";
    // PDF document password. Leave empty for unprotected documents.
    final static String Password = "";
    // OCR language. "eng", "fra", "deu", "spa"  supported currently. Let us know if you need more.
    final static String Language = "eng";
    // Destination PDF file name
    final static Path DestinationFile = Paths.get(".\\result.pdf");


    public static void main(String[] args) throws IOException
    {
        // Create HTTP client instance
        OkHttpClient webClient = new OkHttpClient();

        // 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
        // * If you already have a direct file URL, skip to the step 3.

        // Prepare URL for `Get Presigned URL` API call
        String query = String.format(
                "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=%s",
                SourceFile.getFileName());

        // Prepare request
        Request request = new Request.Builder()
                .url(query)
                .addHeader("x-api-key", API_KEY) // (!) Set API Key
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL to use for the file upload
                String uploadUrl = json.get("presignedUrl").getAsString();
                // Get URL of uploaded file to use with later API calls
                String uploadedFileUrl = json.get("url").getAsString();

                // 2. UPLOAD THE FILE TO CLOUD.

                if (uploadFile(webClient, API_KEY, uploadUrl, SourceFile))
                {
                    // 3. MAKE UPLOADED PDF FILE SEARCHABLE

                    MakePdfSearchable(webClient, API_KEY, DestinationFile, Password, Pages, Language, uploadedFileUrl);
                }
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static void MakePdfSearchable(OkHttpClient webClient, String apiKey, Path destinationFile,
        String password, String pages, String language, String uploadedFileUrl) throws IOException
    {
        // Prepare URL for `Make Searchable PDF` API call
        String query = "https://api.pdf.co/v1/pdf/makesearchable";

        // Make correctly escaped (encoded) URL
        URL url = null;
        try
        {
            url = new URI(null, query, null).toURL();
        }
        catch (URISyntaxException e)
        {
            e.printStackTrace();
        }

        // Create JSON payload
    String jsonPayload = String.format("{\"name\": \"%s\", \"password\": \"%s\", \"pages\": \"%s\", \"lang\": \"%s\", \"url\": \"%s\"}",
                destinationFile.getFileName(),
                password,
                pages,
                language,
                uploadedFileUrl);

        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/json"), jsonPayload);

        // Prepare request
        Request request = new Request.Builder()
            .url(url)
            .addHeader("x-api-key", API_KEY) // (!) Set API Key
            .addHeader("Content-Type", "application/json")
            .post(body)
            .build();

        // Execute request
        Response response = webClient.newCall(request).execute();


        if (response.code() == 200)
        {
            // Parse JSON response
            JsonObject json = new JsonParser().parse(response.body().string()).getAsJsonObject();

            boolean error = json.get("error").getAsBoolean();
            if (!error)
            {
                // Get URL of generated PDF file
                String resultFileUrl = json.get("url").getAsString();

                // Download PDF file
                downloadFile(webClient, resultFileUrl, destinationFile.toFile());

                System.out.printf("Generated PDF file saved as \"%s\" file.", destinationFile.toString());
            }
            else
            {
                // Display service reported error
                System.out.println(json.get("message").getAsString());
            }
        }
        else
        {
            // Display request error
            System.out.println(response.code() + " " + response.message());
        }
    }

    public static boolean uploadFile(OkHttpClient webClient, String apiKey, String url, Path sourceFile) throws IOException
    {
        // Prepare request body
        RequestBody body = RequestBody.create(MediaType.parse("application/octet-stream"), sourceFile.toFile());

        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .addHeader("x-api-key", apiKey) // (!) Set API Key
                .addHeader("content-type", "application/octet-stream")
                .put(body)
                .build();

        // Execute request
        Response response = webClient.newCall(request).execute();

        return (response.code() == 200);
    }

    public static void downloadFile(OkHttpClient webClient, String url, File destinationFile) throws IOException
    {
        // Prepare request
        Request request = new Request.Builder()
                .url(url)
                .build();
        // Execute request
        Response response = webClient.newCall(request).execute();

        byte[] fileBytes = response.body().bytes();

        // Save downloaded bytes to file
        OutputStream output = new FileOutputStream(destinationFile);
        output.write(fileBytes);
        output.flush();
        output.close();

        response.close();
    }
}
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Make PDF Searchable Results</title>
</head>
<body>

<?php

// Note: If you have input files large than 200kb we highly recommend to check "async" mode example.

// Get submitted form data
$apiKey = $_POST["apiKey"]; // The authentication key (API Key). Get your own by registering at https://app.pdf.co
$pages = $_POST["pages"];
$ocrLanguage = $_POST["ocrLanguage"];


// 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
// * If you already have the direct PDF file link, go to the step 3.

// Create URL
$url = "https://api.pdf.co/v1/file/upload/get-presigned-url" .
    "?name=" . urlencode($_FILES["file"]["name"]) .
    "&contenttype=application/octet-stream";

// Create request
$curl = curl_init();
curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey));
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Execute request
$result = curl_exec($curl);

if (curl_errno($curl) == 0)
{
    $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

    if ($status_code == 200)
    {
        $json = json_decode($result, true);

        // Get URL to use for the file upload
        $uploadFileUrl = $json["presignedUrl"];
        // Get URL of uploaded file to use with later API calls
        $uploadedFileUrl = $json["url"];

        // 2. UPLOAD THE FILE TO CLOUD.

        $localFile = $_FILES["file"]["tmp_name"];
        $fileHandle = fopen($localFile, "r");

        curl_setopt($curl, CURLOPT_URL, $uploadFileUrl);
        curl_setopt($curl, CURLOPT_HTTPHEADER, array("content-type: application/octet-stream"));
        curl_setopt($curl, CURLOPT_PUT, true);
        curl_setopt($curl, CURLOPT_INFILE, $fileHandle);
        curl_setopt($curl, CURLOPT_INFILESIZE, filesize($localFile));

        // Execute request
        curl_exec($curl);

        fclose($fileHandle);

        if (curl_errno($curl) == 0)
        {
            $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

            if ($status_code == 200)
            {
                // 3. MAKE UPLOADED PDF FILE SEARCHABLE

                MakePdfSearchable($apiKey, $uploadedFileUrl, $pages, $ocrLanguage);
            }
            else
            {
                // Display request error
                echo "<p>Status code: " . $status_code . "</p>";
                echo "<p>" . $result . "</p>";
            }
        }
        else
        {
            // Display CURL error
            echo "Error: " . curl_error($curl);
        }
    }
    else
    {
        // Display service reported error
        echo "<p>Status code: " . $status_code . "</p>";
        echo "<p>" . $result . "</p>";
    }

    curl_close($curl);
}
else
{
    // Display CURL error
    echo "Error: " . curl_error($curl);
}

function MakePdfSearchable($apiKey, $uploadedFileUrl, $pages, $ocrLanguage)
{
    // Prepare URL for `Make Searchable PDF` API call
    $url = "https://api.pdf.co/v1/pdf/makesearchable";

    // Prepare requests params
    $parameters = array();
    $parameters["name"] = "result.pdf";
    $parameters["url"] = $uploadedFileUrl;
    $parameters["pages"] = $pages;
    $parameters["lang"] = $ocrLanguage;

    // Create Json payload
    $data = json_encode($parameters);

    // Create request
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey, "Content-type: application/json"));
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_POST, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, $data);

    // Execute request
    $result = curl_exec($curl);

    if (curl_errno($curl) == 0)
    {
        $status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        if ($status_code == 200)
        {
            $json = json_decode($result, true);

            if (!isset($json["error"]) || $json["error"] == false)
            {
                $resultFileUrl = $json["url"];

                // Display link to the result file
                echo "<div><h2>Conversion Result:</h2><a href='" . $resultFileUrl . "' target='_blank'>" . $resultFileUrl . "</a></div>";
            }
            else
            {
                // Display service reported error
                echo "<p>Error: " . $json["message"] . "</p>";
            }
        }
        else
        {
            // Display request error
            echo "<p>Status code: " . $status_code . "</p>";
            echo "<p>" . $result . "</p>";
        }
    }
    else
    {
        // Display CURL error
        echo "Error: " . curl_error($curl);
    }

    // Cleanup
    curl_close($curl);
}

?>

</body>
</html>

/pdf/makeunsearchable#

This method converts PDF files into a “text unsearchable” version by converting your PDF into a “scanned” PDF file which is effectively a flat image.

  • Method: POST

  • Endpoint: /v1/pdf/makeunsearchable

Attributes#

Note

Attributes are case-sensitive and should be inside JSON for POST request, for example:

{
    "url": "https://example.com/file1.pdf"
}

Attribute

Description

Required

url

URL to the source file. 1

yes

httpusername

HTTP auth user name if required to access source url.

no

httppassword

HTTP auth password if required to access source url.

no

pages

Specify page indices as comma-separated values or ranges to process (e.g. "0, 1, 2-" or "1, 2, 3-7"). The first-page index is 0, Use "!" before a number for inverted page numbers (e.g. "!0" for the last page). If not specified, the default configuration processes all pages. The input must be in string format.

no

password

Password of PDF file, the input must be in string format.

no

async

Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks.

no

name

File name for the generated output, the input must be in string format.

no

expiration

Set the expiration time for the output link in minutes (default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co Temporary Files Storage. The maximum duration for link expiration varies based on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf templates, documents) consider using PDF.co Built-In Files Storage.

no

profiles

Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more.

no

Query parameters#

No query parameters accepted.

Payload#

{
    "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
    "pages": "",
    "name": "result.pdf",
    "password": "",
    "async": "false",
    "profiles": ""
}

Response 2#

{
    "url": "https://pdf-temp-files.s3.amazonaws.com/6b755238963a472abf67fd5e7ffafd79/result.pdf",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "result.pdf",
    "remainingCredits": 327244,
    "credits": 35
}

CURL#

curl --location --request POST 'https://api.pdf.co/v1/pdf/makeunsearchable' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
    "pages": "",
    "name": "result.pdf",
    "password": "",
    "async": "false",
    "profiles": ""
}'


Code samples#

var https = require("https");
var path = require("path");
var fs = require("fs");

// `request` module is required for file upload.
// Use "npm install request" command to install.
var request = require("request");

// The authentication key (API Key).
// Get your own by registering at https://app.pdf.co
const API_KEY = "***********************************";


// Source PDF file
const SourceFile = "./sample.pdf";
// Comma-separated list of page indices (or ranges) to process. Leave empty for all pages. Example: '0,2-5,7-'.
const Pages = "";
// PDF document password. Leave empty for unprotected documents.
const Password = "";
// Destination PDF file name
const DestinationFile = "./result.pdf";


// 1. RETRIEVE PRESIGNED URL TO UPLOAD FILE.
getPresignedUrl(API_KEY, SourceFile)
    .then(([uploadUrl, uploadedFileUrl]) => {
        // 2. UPLOAD THE FILE TO CLOUD.
        uploadFile(API_KEY, SourceFile, uploadUrl)
            .then(() => {
                // 3. MAKE UPLOADED PDF FILE UNSEARCHABLE
                makePdfUnSearchable(API_KEY, uploadedFileUrl, Password, Pages, DestinationFile);
            })
            .catch(e => {
                console.log(e);
            });
    })
    .catch(e => {
        console.log(e);
    });


function getPresignedUrl(apiKey, localFile) {
    return new Promise(resolve => {
        // Prepare request to `Get Presigned URL` API endpoint
        let queryPath = `/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=${path.basename(SourceFile)}`;
        let reqOptions = {
            host: "api.pdf.co",
            path: encodeURI(queryPath),
            headers: { "x-api-key": API_KEY }
        };
        // Send request
        https.get(reqOptions, (response) => {
            response.on("data", (d) => {
                let data = JSON.parse(d);
                if (data.error == false) {
                    // Return presigned url we received
                    resolve([data.presignedUrl, data.url]);
                }
                else {
                    // Service reported error
                    console.log("getPresignedUrl(): " + data.message);
                }
            });
        })
            .on("error", (e) => {
                // Request error
                console.log("getPresignedUrl(): " + e);
            });
    });
}

function uploadFile(apiKey, localFile, uploadUrl) {
    return new Promise(resolve => {
        fs.readFile(SourceFile, (err, data) => {
            request({
                method: "PUT",
                url: uploadUrl,
                body: data,
                headers: {
                    "Content-Type": "application/octet-stream"
                }
            }, (err, res, body) => {
                if (!err) {
                    resolve();
                }
                else {
                    console.log("uploadFile() request error: " + e);
                }
            });
        });
    });
}

function makePdfUnSearchable(apiKey, uploadedFileUrl, password, pages, destinationFile) {
    // Prepare request to `Make UnSearchable PDF` API endpoint
    var queryPath = `/v1/pdf/makeunsearchable`;

    // JSON payload for api request
    var jsonPayload = JSON.stringify({
        name: path.basename(destinationFile), password: password, pages: pages, url: uploadedFileUrl, async: true
    });

    var reqOptions = {
        host: "api.pdf.co",
        method: "POST",
        path: queryPath,
        headers: {
            "x-api-key": apiKey,
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
        }
    };
    // Send request
    var postRequest = https.request(reqOptions, (response) => {
        response.on("data", (d) => {
            response.setEncoding("utf8");
            // Parse JSON response
            let data = JSON.parse(d);
            if (data.error == false) {
                console.log(`Job #${data.jobId} has been created!`);
                checkIfJobIsCompleted(data.jobId, data.url, destinationFile);
            }
            else {
                // Service reported error
                console.log("makePdfUnSearchable(): " + data.message);
            }
        });
    })
        .on("error", (e) => {
            // Request error
            console.log("makePdfUnSearchable(): " + e);
        });


    // Write request data
    postRequest.write(jsonPayload);
    postRequest.end();
}


function checkIfJobIsCompleted(jobId, resultFileUrl, destinationFile) {
    let queryPath = `/v1/job/check`;

    // JSON payload for api request
    let jsonPayload = JSON.stringify({
        jobid: jobId
    });

    let reqOptions = {
        host: "api.pdf.co",
        path: queryPath,
        method: "POST",
        headers: {
            "x-api-key": API_KEY,
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(jsonPayload, 'utf8')
        }
    };

    // Send request
    var postRequest = https.request(reqOptions, (response) => {
        response.on("data", (d) => {
            response.setEncoding("utf8");

            // Parse JSON response
            let data = JSON.parse(d);
            console.log(`Checking Job #${jobId}, Status: ${data.status}, Time: ${new Date().toLocaleString()}`);

            if (data.status == "working") {
                // Check again after 3 seconds
                setTimeout(function () { checkIfJobIsCompleted(jobId, resultFileUrl, destinationFile); }, 3000);
            }
            else if (data.status == "success") {
                // Download PDF file
                var file = fs.createWriteStream(destinationFile);
                https.get(resultFileUrl, (response2) => {
                    response2.pipe(file)
                        .on("close", () => {
                            console.log(`Generated PDF file saved as "${destinationFile}" file.`);
                        });
                });
            }
            else {
                console.log(`Operation ended with status: "${data.status}".`);
            }
        })
    });

    // Write request data
    postRequest.write(jsonPayload);
    postRequest.end();
}
import requests
import json

# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
API_KEY = "*****************************************"

# Base URL for PDF.co Web API requests
BASE_URL = "https://api.pdf.co/v1"

fileName = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf"

url = "{}/pdf/makeunsearchable?url={}".format(BASE_URL, fileName)

# Execute request and get response as JSON
response = requests.get(url, headers={"x-api-key": API_KEY})
if (response.status_code == 200):
  json = response.json()

  if json["error"] == False:
      # URL of unsearchable PDF
      unsearchableFile = json["url"]
      print(unsearchableFile)
<?
  $apiKey = "***************";
  $fileName = "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf";
  $url = "https://api.pdf.co/v1/pdf/makeunsearchable?url=" . $fileName);

  // Create request
  $curl = curl_init();
  curl_setopt($curl, CURLOPT_HTTPHEADER, array("x-api-key: " . $apiKey));
  curl_setopt($curl, CURLOPT_URL, $url);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  // Execute request
  $result = curl_exec($curl);
?>

Footnotes

1(1,2)

Supports links from Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API check out the File Upload section. Note: If you experience intermittent Access Denied or Too Many Requests errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.

2(1,2)

Main response codes as follows:

Code

Description

200

Success

400

Bad request. Typically happens because of bad input parameters, or because the input URLs can’t be reached, possibly due to access restrictions like needing a login or password.

401

Unauthorized

402

Not enough credits

445

Timeout error. To process large documents or files please use asynchronous mode (set the async parameter to true) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using the pages parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.