POST /v1/pdf/find

Attributes

Attributes are case-sensitive and should be inside JSON for POST request. for example: { "url": "https://example.com/file1.pdf" }
When using regular expressions in JSON payloads, ensure that backslashes are properly escaped. For example, a single backslash \ should be written as \\.
AttributeTypeRequiredDefaultDescription
urlstringYes-URL to the source file url attribute
callbackstringNo-The callback URL (or Webhook) used to receive the POST data. see Webhooks & Callbacks. This is only applicable when async is set to true.
httpusernamestringNo-HTTP auth user name if required to access source URL.
httppasswordstringNo-HTTP auth password if required to access source URL.
pagesstringNoall pagesSpecify page indices as comma-separated values or ranges to process (e.g. “0, 1, 2-” or “1, 2, 3-7”). The first-page index is 0. Use ”!” before a number for inverted page numbers (e.g. “!0” for the last page). If not specified, the default configuration processes all pages. The input must be in string format.
inlinebooleanNofalseSet to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated.
passwordstringNo-Password for the PDF file.
asyncbooleanNofalseSet async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks
searchStringstringYes-Text to search can support regular expressions if you set the regexSearch param to true.
wordMatchingModestringNoNoneWordMatchingMode defines how search terms match PDF text. Modes: None (exact string match only), SmartMatch (default; flexible word boundary match, includes letters/digits/punctuation), ExactMatch (strict word boundaries, whole-word match only).
regexSearchbooleanNofalseSet to true to enable regular expression search for the searchString(s) parameter.
profilesobjectNo-See Profiles for more information.
    ColumnDetectionModestringNoContentGroupsAndBordersControls column detection/alignment in PDF table extraction. Modes: ContentGroupsAndBorders (default; text + lines), ContentGroups (text grouping only), Borders (lines only), BorderedTables (OCR-based for bordered tables), ContentGroupsAI (AI for dense/complex layouts).
    DetectionMinNumberOfRowsintegerNo1Minimum number of rows to detect in a table
    DetectionMinNumberOfColumnsintegerNo1Minimum number of columns to detect in a table
    DetectionMaxNumberOfInvalidSubsequentRowsAllowedintegerNo0Maximum number of invalid subsequent rows allowed in a table
    DetectionMinNumberOfLineBreaksBetweenTablesintegerNo0Minimum number of line breaks between tables
    EnhanceTableBordersbooleanNotrueEnhance table borders or not
    OCRDetectPageRotationbooleanNofalseControls whether to detect page rotation in the PDF document when OCR applied. Set to true to detect page rotation. See Support page rotation for more information.
    DataEncryptionAlgorithmstringNo-Controls the encryption algorithm used for data encryption. See User-Controlled Encryption for more information. The available algorithms are: AES128, AES192, AES256.
    DataEncryptionKeystringNo-Controls the encryption key used for data encryption. See User-Controlled Encryption for more information.
    DataEncryptionIVstringNo-Controls the encryption IV used for data encryption. See User-Controlled Encryption for more information.
    DataDecryptionAlgorithmstringNo-Controls the decryption algorithm used for data decryption. See User-Controlled Encryption for more information. The available algorithms are: AES128, AES192, AES256.
    DataDecryptionKeystringNo-Controls the decryption key used for data decryption. See User-Controlled Encryption for more information.
    DataDecryptionIVstringNo-Controls the decryption IV used for data decryption. See User-Controlled Encryption for more information.
requestParametersDocumentstringNo-
responseParametersobjectNo--
    errorbooleanNo-Indicates whether an error occurred (false means success)
    statusstringNo-Status code of the request (200, 404, 500, etc.). For more information, see Response Codes.
    messagestringNo-Message of the request
    creditsintegerNo-Number of credits consumed by the request
    remainingCreditsintegerNo-Number of credits remaining in the account
    durationintegerNo-Time taken for the operation in milliseconds
    errorCodeintegerNo-Error code of the request (400, 401, 402, 403, 404, 500, etc.)

Support page rotation

This endpoint supports PDF page rotation as follows:

{
 "profiles": "{ 'OCRDetectPageRotation': true }"
}

Find only bordered tables

You can limit search to bordered tables only by enabling the legacy table search mode with the following profiles config:

{
 "profiles": "{ 'Mode': 'Legacy',
 'ColumnDetectionMode': 'BorderedTables',
 'DetectionMinNumberOfRows': 1,
 'DetectionMinNumberOfColumns': 1,
 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0,
 'DetectionMinNumberOfLineBreaksBetweenTables': 0,
 'EnhanceTableBorders': false
 }"
}

Example Payload

To see the request size limits, please refer to the Request Size Limits.
{
  "async": "false",
  "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
  "searchString": "Invoice Date \\d+/\\d+/\\d+",
  "regexSearch": "true",
  "name": "output",
  "pages": "0-",
  "inline": "true",
  "wordMatchingMode": "",
  "password": ""
}

Example Response

To see the main response codes, please refer to the Response Codes page.
{
  "body": [
    {
      "text": "Invoice Date 01/01/2016",
      "left": 436.5400085449219,
      "top": 130.4599995137751,
      "width": 122.85311957550027,
      "height": 11.040000486224898,
      "pageIndex": 0,
      "bounds": {
        "location": {
          "isEmpty": false,
          "x": 436.54,
          "y": 130.46
        },
        "size": "122.853119, 11.0400009",
        "x": 436.54,
        "y": 130.46,
        "width": 122.853119,
        "height": 11.0400009,
        "left": 436.54,
        "top": 130.46,
        "right": 559.3931,
        "bottom": 141.5,
        "isEmpty": false
      },
      "elementCount": 1,
      "elements": [
        {
          "index": 0,
          "left": 436.5400085449219,
          "top": 130.4599995137751,
          "width": 122.85311957550027,
          "height": 11.040000486224898,
          "angle": 0,
          "text": "Invoice Date 01/01/2016",
          "isNewLine": true,
          "fontIsBold": true,
          "fontIsItalic": false,
          "fontName": "Helvetica-Bold",
          "fontSize": 11,
          "fontColor": "0, 0, 0",
          "fontColorAsOleColor": 0,
          "fontColorAsHtmlColor": "#000000",
          "bounds": {
            "location": {
              "isEmpty": false,
              "x": 436.54,
              "y": 130.46
            },
            "size": "122.853119, 11.0400009",
            "x": 436.54,
            "y": 130.46,
            "width": 122.853119,
            "height": 11.0400009,
            "left": 436.54,
            "top": 130.46,
            "right": 559.3931,
            "bottom": 141.5,
            "isEmpty": false
          }
        }
      ]
    }
  ],
  "pageCount": 1,
  "error": false,
  "status": 200,
  "name": "output",
  "remainingCredits": 59970
}

Code Samples

curl --location --request POST 'https://api.pdf.co/v1/pdf/find' \
--header 'x-api-key: *******************' \
--header 'Content-Type: application/json' \
--data-raw '{
"async": "false",
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"searchString": "Invoice Date \\d+/\\d+/\\d+",
"regexSearch": "true",
"name": "output",
"pages": "0-",
"inline": "true",
"wordMatchingMode": "",
"password": ""
}'