Profiles
This page describes the profiles
parameter that can be used with your API calls.
Profiles are used to to set extra options for common API calls and are sometimes distinct to a particular API.
Profiles are embedded with a JSON
type of notation along with the profiles
object for your API calls, for example:
profiles
field in the code snippets must be enclosed in quotes ("
), making it a complete string. For example: { "profiles": "{'TrimSpaces':true, 'PreserveFormattingOnTextExtraction': true}"}
Sample Code
Generic Profile Options
The following profiles
options are not specific to any one particular endpoint.
Standard Parameters
The std\_params
within the profiles
parameter enables the definition of regular API parameters in a JSON
format. This std\_params
feature is designed to simplify the process of passing standard parameters and additional options in the profiles
parameter for PDF.co API requests.
When using Standard Parameters webhooks can be utilized by setting the callback
object with the URL of your choice. However, is is simpler to set the callback
object directly - see Webhooks & Callbacks for more.
std\_params
are used in the profiles
parameter, if a parameter is duplicated within both std\_params
and outside profiles, the value specified in std\_params
will overwrite the duplicate value. Therefore if you define a callback object in std\_params
then it will overwrite any value you may have defined via the basic callback object!std\_params
Structure
-
Description: Contains key-value pairs of standard parameters that will be used across PDF.co API requests.
-
Type:
JSON
Object (passed as a string) -
Example:
Practical Application
Using the std\_params
profile, you can define a set of standard parameters and configurations that will be consistently applied across your PDF.co API requests. This approach is particularly beneficial when using automation platforms like Zapier, Make, and others, where the number of parameters you can pass directly is limited.
Complete Request Example
Here is a complete example illustrating the use of the std\_params
profile with other parameters:
/pdf/convert/to/text
Output as Base64
If you require your output as base64
use the following:
Converting PDFs
There are a variety of profiles
options which can be set when converting from PDF to other documents. These profiles
control how to extract the information from the source PDF file.
These options apply to the following endpoints:
- /pdf/convert/to/csv
- /pdf/convert/to/xml
- /pdf/convert/to/json
- /pdf/convert/to/json2
- /pdf/convert/to/xls
- /pdf/convert/to/xlsx
Convert Vectors
You can choose whether the conversion process should convert vectors or not as follows:
Save Images
This profiles
parameter includes the SaveImages
property that extracts individual images in a regular PDF.
Consider Font Size
This profiles
parameter allows you to seperate header and body text based on font size.
Set the Extraction Area
Extract text in a specific area by defining the extraction area - set with points in the format [x, y, width, height]
.
Extracting Invisible Text
When dealing with PDF documents, sometimes there may be unwanted invisible text that makes it difficult to extract the desired content accurately. This could be due to various reasons such as the original document being scanned or saved with a low-quality setting. In such cases, it is important to remove the unwanted invisible text to ensure accurate extraction of the desired content.
OCR (Optical Character Recognition) Mode Options
The following values can be configured for OCR mode:
OCR Mode | Description |
---|---|
Auto (default) | Automatically determines the optimal OCR settings based on the input. |
AutoRepairFonts | Automatically repairs fonts in text extracted from images or other documents. |
TextFromImagesAndFonts | Extracts text from images and fonts from documents. |
TextFromImagesAndRepairedFonts | Extracts text from images and repaired fonts from documents. |
TextFromImagesAndVectorsAndFonts | Extracts text, vectors, and fonts from images and documents. |
TextFromImagesAndVectorsAndRepairedFonts | Extracts text, vectors, and repaired fonts from images and documents. |
TextFromImagesAndVectorsOnly | Extracts text and vectors from images only. |
TextFromImagesOnly | Extracts text from images only. |
TextFromRepairedFontsOnly | Extracts text from documents with repaired fonts only. |
TextFromVectorsAndFonts | Extracts text and fonts from documents with vectors. |
TextFromVectorsAndRepairedFonts | Extracts text and repaired fonts from documents with vectors. |
TextFromVectorsOnly | Extracts text from documents with vectors only. |
OCR (Optical Character Recognition) Resolution
OCR resolution can be set from 72
to 1200
DPI. The default value is 300
DPI. The higher the resolution, the better the OCR results. However, higher resolution also means longer processing times.
Extracting Text from Colored Background
If you can’t extract text with a colored background, please add the Grayscale filter to the profiles
as follows:
Considering the Font Color on Tables
Sometimes the data which OCR must extract from a table might have colored text which is difficult to extract. OCR results can be improved with the following:
Setting the Rotation Angle
Normally OCR detects PDF rotation and extracts text properly. But in some cases a PDF is constructed in such a way that a page is not rotated and instead text is drawn vertically, OCR does not detect page rotation automatically. In such scenarios we can use following profile setting.
0
no rotation1
90 degrees2
180 degrees3
270 degrees