PDF Find
PDF Find Text
Find text in PDF and get coordinates. Supports regular expressions.
POST /v1/pdf/find
Attributes
Attributes are case-sensitive and should be inside JSON for POST request. for example:
{ "url": "https://example.com/file1.pdf" }
When using regular expressions in JSON payloads, ensure that backslashes are properly escaped. For example, a single backslash
\
should be written as \\
.Attribute | Type | Required | Default | Description |
---|---|---|---|---|
url | string | Yes | - | URL to the source file url attribute |
callback | string | No | - | The callback URL (or Webhook) used to receive the POST data. see Webhooks & Callbacks. This is only applicable when async is set to true . |
httpusername | string | No | - | HTTP auth user name if required to access source URL. |
httppassword | string | No | - | HTTP auth password if required to access source URL. |
pages | string | No | all pages | Specify page indices as comma-separated values or ranges to process (e.g. “0, 1, 2-” or “1, 2, 3-7”). The first-page index is 0. Use ”!” before a number for inverted page numbers (e.g. “!0” for the last page). If not specified, the default configuration processes all pages. The input must be in string format. |
inline | boolean | No | false | Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated. |
password | string | No | - | Password for the PDF file. |
async | boolean | No | false | Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks |
searchString | string | Yes | - | Text to search can support regular expressions if you set the regexSearch param to true. |
wordMatchingMode | string | No | None | WordMatchingMode defines how search terms match PDF text. Modes: None (exact string match only), SmartMatch (default; flexible word boundary match, includes letters/digits/punctuation), ExactMatch (strict word boundaries, whole-word match only). |
regexSearch | boolean | No | false | Set to true to enable regular expression search for the searchString(s) parameter. |
profiles | object | No | - | See Profiles for more information. |
ColumnDetectionMode | string | No | ContentGroupsAndBorders | Controls column detection/alignment in PDF table extraction. Modes: ContentGroupsAndBorders (default; text + lines), ContentGroups (text grouping only), Borders (lines only), BorderedTables (OCR-based for bordered tables), ContentGroupsAI (AI for dense/complex layouts). |
DetectionMinNumberOfRows | integer | No | 1 | Minimum number of rows to detect in a table |
DetectionMinNumberOfColumns | integer | No | 1 | Minimum number of columns to detect in a table |
DetectionMaxNumberOfInvalidSubsequentRowsAllowed | integer | No | 0 | Maximum number of invalid subsequent rows allowed in a table |
DetectionMinNumberOfLineBreaksBetweenTables | integer | No | 0 | Minimum number of line breaks between tables |
EnhanceTableBorders | boolean | No | true | Enhance table borders or not |
OCRDetectPageRotation | boolean | No | false | Controls whether to detect page rotation in the PDF document when OCR applied. Set to true to detect page rotation. See Support page rotation for more information. |
DataEncryptionAlgorithm | string | No | - | Controls the encryption algorithm used for data encryption. See User-Controlled Encryption for more information. The available algorithms are: AES128 , AES192 , AES256 . |
DataEncryptionKey | string | No | - | Controls the encryption key used for data encryption. See User-Controlled Encryption for more information. |
DataEncryptionIV | string | No | - | Controls the encryption IV used for data encryption. See User-Controlled Encryption for more information. |
DataDecryptionAlgorithm | string | No | - | Controls the decryption algorithm used for data decryption. See User-Controlled Encryption for more information. The available algorithms are: AES128 , AES192 , AES256 . |
DataDecryptionKey | string | No | - | Controls the decryption key used for data decryption. See User-Controlled Encryption for more information. |
DataDecryptionIV | string | No | - | Controls the decryption IV used for data decryption. See User-Controlled Encryption for more information. |
requestParametersDocument | string | No | - | |
responseParameters | object | No | - | - |
error | boolean | No | - | Indicates whether an error occurred (false means success) |
status | string | No | - | Status code of the request (200, 404, 500, etc.). For more information, see Response Codes. |
message | string | No | - | Message of the request |
credits | integer | No | - | Number of credits consumed by the request |
remainingCredits | integer | No | - | Number of credits remaining in the account |
duration | integer | No | - | Time taken for the operation in milliseconds |
errorCode | integer | No | - | Error code of the request (400, 401, 402, 403, 404, 500, etc.) |
Support page rotation
This endpoint supports PDF page rotation as follows:
Find only bordered tables
You can limit search to bordered tables only by enabling the legacy table search mode with the following profiles
config:
Example
Payload
To see the request size limits, please refer to the Request Size Limits.
Example
Response
To see the main response codes, please refer to the Response Codes page.