PDF Make Text Searchable or Unsearchable#
Available Methods#
/pdf/makesearchable#
This method converts scanned PDF documents (where pages are fully or partially made from scanned images) or image files into a text-searchable PDF. It runs OCR and adds an invisible text layer on top of your document that can be used for text search, text indexing, etc.
Method: POST
Endpoint: /v1/pdf/makesearchable
Attributes#
Note
Attributes are case-sensitive and should be inside JSON for POST request, for example:
{
"url": "https://example.com/file1.pdf"
}
Attribute |
Description |
Required |
---|---|---|
|
URL to the source file. 1
|
yes |
|
HTTP auth user name if required to access source |
no |
|
HTTP auth password if required to access source |
no |
|
Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. The default is |
no |
|
Comma-separated indices of pages (or page ranges) that you want to use. The first-page index is always 0. For example, if you have a 7-page document that you want to be split into 3 separate PDFs but a different number of pages it would go like this: 0, 1, 2- or 1, 2, 3-7 which will result in 1 PDF with page one, 1 PDF with page two and one PDF with the rest of the pages. You can also use inverted page numbers adding |
no |
|
Password of PDF file, the input must be in string format. |
no |
|
Set |
no |
|
File name for the generated output, the input must be in string format. |
no |
|
Set the expiration time for the output link in minutes (default is |
no |
|
Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more. |
no |
Query parameters#
No query parameters accepted.
Payload#
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-make-searchable/sample.pdf",
"lang": "eng",
"pages": "",
"name": "result.pdf",
"password": "",
"async": "false",
"profiles": ""
}
Response 2#
{
"url": "https://pdf-temp-files.s3.amazonaws.com/a0d52f35504e47148d1771fce875db7b/result.pdf",
"pageCount": 1,
"error": false,
"status": 200,
"name": "result.pdf",
"remainingCredits": 99033681,
"credits": 35
}
CURL#
curl --location --request POST 'https://api.pdf.co/v1/pdf/makesearchable' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-make-searchable/sample.pdf",
"lang": "eng",
"pages": "",
"name": "result.pdf",
"password": "",
"async": "false",
"profiles": ""
}'
Language Support#
Code |
Description |
---|---|
|
Afrikaans |
|
Amharic |
|
Arabic |
|
Assamese |
|
Azerbaijani |
|
Azerbaijani - Cyrillic |
|
Belarusian |
|
Bengali |
|
Tibetan |
|
Bosnian |
|
Bulgarian |
|
Catalan; Valencian |
|
Cebuano |
|
Czech |
|
Chinese - Simplified |
|
Chinese - Traditional |
|
Cherokee |
|
Welsh |
|
Danish |
|
German |
|
Dzongkha |
|
Greek, Modern (1453-) |
|
English |
|
English, Middle (1100-1500) |
|
Esperanto |
|
Estonian |
|
Basque |
|
Persian |
|
Finnish |
|
French |
|
Frankish |
|
French, Middle (ca. 1400-1600) |
|
Irish |
|
Galician |
|
Greek, Ancient (-1453) |
|
Gujarati |
|
Haitian; Haitian Creole |
|
Hebrew |
|
Hindi |
|
Croatian |
|
Hungarian |
|
Inuktitut |
|
Indonesian |
|
Icelandic |
|
Italian |
|
Italian - Old |
|
Javanese |
|
Japanese |
|
Kannada |
|
Georgian |
|
Georgian - Old |
|
Kazakh |
|
Central Khmer |
|
Kirghiz; Kyrgyz |
|
Korean |
|
Kurdish |
|
Lao |
|
Latin |
|
Latvian |
|
Lithuanian |
|
Malayalam |
|
Marathi |
|
Macedonian |
|
Maltese |
|
Malay |
|
Burmese |
|
Nepali |
|
Dutch; Flemish |
|
Norwegian |
|
Oriya |
|
Panjabi; Punjabi |
|
Polish |
|
Portuguese |
|
Pushto; Pashto |
|
Romanian; Moldavian; Moldovan |
|
Russian |
|
Sanskrit |
|
Sinhala; Sinhalese |
|
Slovak |
|
Slovenian |
|
Spanish; Castilian |
|
Spanish; Castilian - Old |
|
Albanian |
|
Serbian |
|
Serbian - Latin |
|
Swahili |
|
Swedish |
|
Syriac |
|
Tamil |
|
Telugu |
|
Tajik |
|
Tagalog |
|
Thai |
|
Tigrinya |
|
Turkish |
|
Uighur; Uyghur |
|
Ukrainian |
|
Urdu |
|
Uzbek |
|
Uzbek - Cyrillic |
|
Vietnamese |
|
Yiddish |
/pdf/makeunsearchable#
This method converts PDF files into a “text unsearchable” version by converting your PDF into a “scanned” PDF file which is effectively a flat image.
Method: POST
Endpoint: /v1/pdf/makeunsearchable
Attributes#
Note
Attributes are case-sensitive and should be inside JSON for POST request, for example:
{
"url": "https://example.com/file1.pdf"
}
Attribute |
Description |
Required |
---|---|---|
|
URL to the source file. 1 |
yes |
|
HTTP auth user name if required to access source |
no |
|
HTTP auth password if required to access source |
no |
|
Comma-separated indices of pages (or page ranges) that you want to use. The first-page index is always 0. For example, if you have a 7-page document that you want to be split into 3 separate PDFs but a different number of pages it would go like this: 0, 1, 2- or 1, 2, 3-7 which will result in 1 PDF with page one, 1 PDF with page two and one PDF with the rest of the pages. You can also use inverted page numbers adding |
no |
|
Password of PDF file, the input must be in string format. |
no |
|
Set |
no |
|
File name for the generated output, the input must be in string format. |
no |
|
Set the expiration time for the output link in minutes (default is |
no |
|
Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more. |
no |
Query parameters#
No query parameters accepted.
Payload#
{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"pages": "",
"name": "result.pdf",
"password": "",
"async": "false",
"profiles": ""
}
Response 2#
{
"url": "https://pdf-temp-files.s3.amazonaws.com/6b755238963a472abf67fd5e7ffafd79/result.pdf",
"pageCount": 1,
"error": false,
"status": 200,
"name": "result.pdf",
"remainingCredits": 327244,
"credits": 35
}
CURL#
curl --location --request POST 'https://api.pdf.co/v1/pdf/makeunsearchable' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"pages": "",
"name": "result.pdf",
"password": "",
"async": "false",
"profiles": ""
}'
Code samples#
Footnotes
- 1(1,2)
Supports links from Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API check out the File Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add
cache:
to enable built-in URL caching. (e.gcache:https://example.com/file1.pdf
) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.- 2(1,2)
Main response codes as follows:
Code
Description
200
Success
400
Bad request. Typically happens because of bad input parameters, or because the input URLs can’t be reached, possibly due to access restrictions like needing a login or password.
401
Unauthorized
402
Not enough credits
445
Timeout error. To process large documents or files please use asynchronous mode (set the
async
parameter totrue
) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using thepages
parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.Note
For more see the complete list of available response codes.