Profiles are used to to set extra options for common API calls and are sometimes distinct to a particular API.

Profiles are embedded with a JSON type of notation along with the profiles object for your API calls, for example:

Please note that the value for the profiles field in the code snippets must be enclosed in quotes ("), making it a complete string. For example: { "profiles": "{'TrimSpaces':true, 'PreserveFormattingOnTextExtraction': true}"}

Sample Code

{
 "profiles": "{'TrimSpaces':true, 'PreserveFormattingOnTextExtraction': true}"
}

Generic Profile Options

The following profiles options are not specific to any one particular endpoint.

Standard Parameters

The std\_params within the profiles parameter enables the definition of regular API parameters in a JSON format. This std\_params feature is designed to simplify the process of passing standard parameters and additional options in the profiles parameter for PDF.co API requests.

When using Standard Parameters webhooks can be utilized by setting the callback object with the URL of your choice. However, is is simpler to set the callback object directly - see Webhooks & Callbacks for more.

When std\_params are used in the profiles parameter, if a parameter is duplicated within both std\_params and outside profiles, the value specified in std\_params will overwrite the duplicate value. Therefore if you define a callback object in std\_params then it will overwrite any value you may have defined via the basic callback object!

std\_params Structure

  • Description: Contains key-value pairs of standard parameters that will be used across PDF.co API requests.

  • Type: JSON Object (passed as a string)

  • Example:

{
 "profiles": "{'std\_params': {'callback': 'webhook\_url'}}"
}

Practical Application

Using the std\_params profile, you can define a set of standard parameters and configurations that will be consistently applied across your PDF.co API requests. This approach is particularly beneficial when using automation platforms like Zapier, Make, and others, where the number of parameters you can pass directly is limited.

Complete Request Example

Here is a complete example illustrating the use of the std\_params profile with other parameters:

/pdf/convert/to/text

{
 "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf",
 "inline": true,
 "profiles": "{'std\_params': {'callback': 'webhook\_url', 'async': true}, 'ExtractShadowLikeText': false, 'ExtractColumnByColumn': true, 'OCRMode': 'Auto'}}",
 "TrimSpaces": true,
 "PreserveFormattingOnTextExtraction": true
}

Output as Base64

If you require your output as base64 use the following:

{
 "profiles": "{ 'outputDataFormat': 'base64' }"
}
This output data format is supported by endpoints that generate binary files - PDF and images. The output is accessible via a generated link and the file under the link is in a base64-encoded text format.

Converting PDFs

There are a variety of profiles options which can be set when converting from PDF to other documents. These profiles control how to extract the information from the source PDF file.

These options apply to the following endpoints:

  • /pdf/convert/to/csv
  • /pdf/convert/to/xml
  • /pdf/convert/to/json
  • /pdf/convert/to/json2
  • /pdf/convert/to/xls
  • /pdf/convert/to/xlsx

Convert Vectors

You can choose whether the conversion process should convert vectors or not as follows:

{
 "profiles": "{ 'SaveVectors': true }"
}

Save Images

This profiles parameter includes the SaveImages property that extracts individual images in a regular PDF.

{
 "profiles": "{ 'SaveImages': 'Embed' }"
}

Consider Font Size

This profiles parameter allows you to seperate header and body text based on font size.

{
 "profiles": "{ 'ConsiderFontSizes': true }"
}

Set the Extraction Area

Extract text in a specific area by defining the extraction area - set with points in the format [x, y, width, height].

{
 "profiles": "{ 'ExtractionArea': [171.0,69.0,249.75,71.25] }"
}

Extracting Invisible Text

When dealing with PDF documents, sometimes there may be unwanted invisible text that makes it difficult to extract the desired content accurately. This could be due to various reasons such as the original document being scanned or saved with a low-quality setting. In such cases, it is important to remove the unwanted invisible text to ensure accurate extraction of the desired content.

{
 "profiles": "{ 'ExtractInvisibleText': false, 'ExtractShadowLikeText': false, 'OCRMode': 'Auto' }"
}

OCR (Optical Character Recognition) Mode Options

The following values can be configured for OCR mode:

OCR ModeDescription
Auto (default)Automatically determines the optimal OCR settings based on the input.
AutoRepairFontsAutomatically repairs fonts in text extracted from images or other documents.
TextFromImagesAndFontsExtracts text from images and fonts from documents.
TextFromImagesAndRepairedFontsExtracts text from images and repaired fonts from documents.
TextFromImagesAndVectorsAndFontsExtracts text, vectors, and fonts from images and documents.
TextFromImagesAndVectorsAndRepairedFontsExtracts text, vectors, and repaired fonts from images and documents.
TextFromImagesAndVectorsOnlyExtracts text and vectors from images only.
TextFromImagesOnlyExtracts text from images only.
TextFromRepairedFontsOnlyExtracts text from documents with repaired fonts only.
TextFromVectorsAndFontsExtracts text and fonts from documents with vectors.
TextFromVectorsAndRepairedFontsExtracts text and repaired fonts from documents with vectors.
TextFromVectorsOnlyExtracts text from documents with vectors only.
{
 "profiles": "{ 'OCRMode': 'TextFromImagesAndVectorsAndRepairedFonts' }"
}

OCR (Optical Character Recognition) Resolution

OCR resolution can be set from 72 to 1200 DPI. The default value is 300 DPI. The higher the resolution, the better the OCR results. However, higher resolution also means longer processing times.

{
 "profiles": "{ 'OCRResolution': 300 }"
}

Extracting Text from Colored Background

If you can’t extract text with a colored background, please add the Grayscale filter to the profiles as follows:

{
 "profiles": "{ 'OCRImagePreprocessingFilters.AddGrayscale()': [] }"
}

Considering the Font Color on Tables

Sometimes the data which OCR must extract from a table might have colored text which is difficult to extract. OCR results can be improved with the following:

{
 "profiles": "{
 'LineGroupingMode': 'JoinOrphanedRows',
 'ConsiderFontColors': true,
 'DetectNewColumnBySpacesRatio': '1.1',
 'AutoAlignColumnsToHeader': false,
 'OCRImagePreprocessingFilters.AddGammaCorrection()': [ '1.4' ]
 }"
}

Setting the Rotation Angle

Normally OCR detects PDF rotation and extracts text properly. But in some cases a PDF is constructed in such a way that a page is not rotated and instead text is drawn vertically, OCR does not detect page rotation automatically. In such scenarios we can use following profile setting.

{
 "profiles": "{ 'RotationAngle': 2 }"
}
  • 0 no rotation
  • 1 90 degrees
  • 2 180 degrees
  • 3 270 degrees