Split PDF by Text Search
Split a PDF into multiple PDF files using page indexes or page ranges.
POST /v1/pdf/split2
pages
to split out into individual documents. The page limit should not exceed the number of pages in the document - for example, you cannot split a 100 page document into 200 individual documents, however you can split it into 100 individual documents.pages
parameter is 1-based, meaning the first page is 1
and not 0
.Attributes
{ "url": "https://example.com/file1.pdf" }
Attribute | Type | Required | Default | Description |
---|---|---|---|---|
url | string | Yes | - | URL to the source file url attribute |
callback | string | No | - | The callback URL (or Webhook) used to receive the POST data. see Webhooks & Callbacks. This is only applicable when async is set to true . |
httpusername | string | No | - | HTTP auth user name if required to access source URL. |
httppassword | string | No | - | HTTP auth password if required to access source URL. |
pages | string | No | all pages | Specify page indices as comma-separated values or ranges to process (e.g. “0, 1, 2-” or “1, 2, 3-7”). The first-page index is 0. Use ”!” before a number for inverted page numbers (e.g. “!0” for the last page). If not specified, the default configuration processes all pages. The input must be in string format. |
inline | boolean | No | false | Set to true to return results inside the response. Otherwise, the endpoint will return a URL to the output file generated. |
async | boolean | No | false | Set async to true for long processes to run in the background, API will then return a jobId which you can use with the Background Job Check endpoint. Also see Webhooks & Callbacks |
name | string | No | - | File name for the generated output, the input must be in string format. |
expiration | integer | No | 60 | Set the expiration time for the output link in minutes. After this specified duration, any generated output file(s) will be automatically deleted from PDF.co Temporary Files Storage. The maximum duration for link expiration varies based on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf templates, documents) consider using PDF.co Built-In Files Storage. |
searchString | string | Yes | - | Text to search for on pages. Must be a string. |
regexSearch | boolean | No | false | Set to true to enable regular expression search for the searchString(s) parameter. |
caseSensitive | boolean | No | true | Set to false to don’t use case-sensitive search. |
lang | string | No | eng | Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. see Language Support. You can also use 2 languages simultaneously like this: eng+deu (any combination). |
excludeKeyPages | boolean | No | false | Set to true to exclude pages where the searchString text was found. |
profiles | object | No | - | See Profiles for more information. |
outputDataFormat | string | No | - | If you require your output as base64 format, set this to base64 |
DataEncryptionAlgorithm | string | No | - | Controls the encryption algorithm used for data encryption. See User-Controlled Encryption for more information. The available algorithms are: AES128 , AES192 , AES256 . |
DataEncryptionKey | string | No | - | Controls the encryption key used for data encryption. See User-Controlled Encryption for more information. |
DataEncryptionIV | string | No | - | Controls the encryption IV used for data encryption. See User-Controlled Encryption for more information. |
DataDecryptionAlgorithm | string | No | - | Controls the decryption algorithm used for data decryption. See User-Controlled Encryption for more information. The available algorithms are: AES128 , AES192 , AES256 . |
DataDecryptionKey | string | No | - | Controls the decryption key used for data decryption. See User-Controlled Encryption for more information. |
DataDecryptionIV | string | No | - | Controls the decryption IV used for data decryption. See User-Controlled Encryption for more information. |
Query parameters
No query parameters accepted.
Responses
Parameter | Type | Description |
---|---|---|
urls | array[string] | List of URLs to the final PDF file stored in S3. |
outputLinkValidTill | string | Timestamp indicating when the output link will expire |
pageCount | integer | Number of pages in the PDF document. |
error | boolean | Indicates whether an error occurred (false means success) |
status | string | Status code of the request (200, 404, 500, etc.). For more information, see Response Codes. |
name | string | Name of the output file |
credits | integer | Number of credits consumed by the request |
remainingCredits | integer | Number of credits remaining in the account |
duration | integer | Time taken for the operation in milliseconds |
searchString
Text to search for on pages. Must be a string.
To search for a barcode use the following macros string: [[barcode:<barcodeTypesSeparatedByComma> <barcodeValue>]]
.
To search for barcode type without analyzing its value, use this notation instead: [[barcode:<barcodeTypesSeparatedByComma>]].
Example #1, split by QR code: “searchString”: “[[barcode:qrcode]]”.
Example #2, split by QR code with value: “searchString”: “[[barcode:qrcode pdfco]]”.
Example #3, split by QR code with value search with regex: “searchString”: “[[barcode:qrcode /pdf.co/]]”.
Example #4, split by QR code or datamatrix with value search with regex: “searchString”: “[[barcode:qrcode,datamatrix /pdf.co/]]”.