General#
Artifex Software and ByteScout/PDF.co Merger FAQ#
We understand that you may have some questions following the recent announcement about the merger of ByteScout PDF.co and Artifex Software. This FAQ is designed to address your concerns and provide information about the transition.
General Information
- Q1: Who is Artifex Software?
Artifex Software Inc. is a seasoned player in the realm of PDF solutions. As a subsidiary of ePapyrus, Inc., they’ve delivered robust PDF technologies such as open-source Ghostscript and MuPDF for over 30 years. Its flagship product, Ghostscript was the first non-Adobe PDF solution being shipped with almost all Linux distributions. Artifex Software has provided their services to a range of notable companies, including Google, Oracle, HP, Kyocera and Intuit, just to name a few.
- Q2: What is the aim behind the ByteScout and Artifex Software merger?
This merger combines the strengths of both companies: Artifex Software’s in-depth expertise in PDF solutions, and ByteScout’s innovative technologies. Our joint aim is to provide even more comprehensive and enhanced solutions to our clients.
Account and Services
- Q3: Will my ByteScout.com and PDF.co services be affected?
Rest assured, your services will continue as they are now. We aim to ensure a seamless transition where the tools, resources, and services you use daily continue without interruption.
- Q4: Will the brand names ByteScout.com and PDF.co change?
No. We recognize the trust and value you have in the ByteScout.com and PDF.co brands, and we will continue to maintain these names.
- Q5: Do I need to take any action due to this merger?
No action is required on your part. Your services will continue as before.
- Q6: Will this merger impact my subscription?
No, your subscription and its terms will remain the same.
- Q7: Will I need to make any changes to my current setup because of this merger?
No changes are necessary on your part. Everything will continue to function as it currently does.
Billing
- Q8: Will there be any changes to my billing?
While your services will remain unchanged, your invoices will now be issued under the name of Artifex Software Inc.
Support
- Q9: Who should I contact if I have further questions or concerns?
If you have any more questions or concerns, don’t hesitate to contact our customer support team at
support@bytescout.com
.
We appreciate your understanding and continued support during this exciting transition, and look forward to serving you under this new and promising partnership.
Fonts available for PDF Filling and Adding Text to PDF with pdf/edit/add
#
PDF.co Font List
Arial
Arial Black
Bahnschrift
Calibri
Cambria
Cambria Math
Candara
Comic Sans MS
Consolas
Constantia
Corbel
Courier New
Ebrima
Franklin Gothic Medium
Gabriola
Gadugi
Georgia
HoloLens MDL2 Assets
Impact
Ink Free
Javanese Text
Leelawadee UI
Lucida Console
Lucida Sans Unicode
Malgun Gothic
Marlett
Microsoft Himalaya
Microsoft JhengHei
Microsoft New Tai Lue
Microsoft PhagsPa
Microsoft Sans Serif
Microsoft Tai Le
Microsoft YaHei
Microsoft Yi Baiti
MingLiU-ExtB
Mongolian Baiti
MS Gothic
MV Boli
Myanmar Text
Nirmala UI
Palatino Linotype
Segoe MDL2 Assets
Segoe Print
Segoe Script
Segoe UI
Segoe UI Historic
Segoe UI Emoji
Segoe UI Symbol
SimSun
Sitka
Sylfaen
Symbol
Tahoma
Times New Roman
Trebuchet MS
Verdana
Webdings
Wingdings
Yu Gothic
Japanese Fonts
MS Gothic
MS Mincho
Yu Gothic
Chinese Fonts
SimSun
MingLiU
Microsoft YaHei
Korean Fonts
Malgun Gothic
Hebrew Fonts
Miriam
Arabic Fonts
Aldhabi
Andalus
Arabic Typesetting
How to create and test configurations for PDF extraction and image-to-text functions locally#
If you are working with scanned PDFs and the extracted text (text, csv, json, xml) is incomplete or inaccurate, consider using our desktop app, ByteScout PDF Multitool
(compatible with Windows 7/10/11 and higher). This app emulates most of the major functions of the PDF.co API and, more importantly, allows you to create and test configurations for PDF extraction and image-to-text functions locally.
ByteScout PDF Multitool
includes the OCR Analyzer
tool, which helps you quickly find the best combination of OCR filters and parameters to enhance the quality of PDF text extraction results.
PDF Multitool
and its OCR Analyzer
provide JSON code for profiles
that can be used with PDF.co cloud and on-premises versions. Simply set this JSON config to the profiles
parameters for the PDF To Text/CSV/XML/JSON API methods.
Step-by-step guide on how to start using the PDF Multitool
free app:
First, download the free version of
PDF Multitool
here.Next, load your PDF/JPG/PNG document into the multitool.
Then, in the left navigation menu, select
OCR Analyzer
.Choose the
OCR Language
andOCR Resolution
and clickGo
.Click
Copy To
button andselect Send to CSV..
or similar to copy this configuration into the appropriate extractor.This will open PDF Extractor config for PDF to CSV/Text/XML/JSON accordingly.
Try the new configuration by clicking
Preview
.If you’re satisfied with the outcome, go to the
Profile for PDF.co and API Server
tab.Click on
Copy as payload for PDF.co or API Server
.Finally, paste this as a value to the
profiles
parameter value into your script/code or in Zapier/Make plugin accordingly.If you are not satisfied with the results, try to adjust parameters and filters on the
All Options
tab (see Tips and Tricks below).
For a demo on how to use this tool, watch this video: https://youtu.be/NSyyohNNe6E.
Tips and Tricks On Finding Best OCR Settings Using PDF Multitool
For fuzzy or blurred scans: try to increase OCR Resolution from default
300 dpi
(dots per inch) to600
or even800
or1200
dpi and try again. Note: higher resolution means more time to process the document.For dark scans: try to add
Gamma Correction
filter with default value of1.4
or1.5
and try again. Note: this filter will make the dark images lighter automatically.To get text printed nearby borders or lines, try to add filter that removes lines before extraction. For tables with borders or lines, and if you see layout is reproduced incorrect or some words/letters are lost, try to add
Horizontal Line Removal
andVertical Line Removal
filters inAll Options
-OCRImageProcessingFilters
section. Make sure to put these filters first in the list (useUp
andDown
buttons to move filters up and down in the list).For non-English documents set proper recognition language: set
OCR Language
to the appropriate language you see on the document. Default selected iseng
(English). If you have a document in German, set it todeu
(German). If you have multiple languages in the same document, select 2 languages (for example,eng
anddeu
).If you don’t need a whole page, then try to limit extraction area to a specific area on a page. It will increase the quality of text extraction as well as processing speed. To set extraction area, click on the
Select
tool on the main toolbar inPDF Multitool
and use your mouse to select the area with the source text. Then run extraction and preview again.If extracted text is missing some important text snippets, try to set an extraction area to extract from. Limiting to a specific area on a page may dramatically increase the quality of the text recognition.
If extracting from the whole page produces broken results: try to run few extractions from the same page but limiting to selected areas, for example: extract from the top area, then from the middle area, then from the bottom area. Then combine results into one file. This will help to get better results if the page has different layouts or different fonts or different font sizes.
Setting extraction area to exclude header and footer and/or side notes in the document may simplify text analysis greatly.
Removing Background Noise: Lowering
Gamma
(with values below1.4
) and raisingContrast
can effectively remove background noise from images.Extracting text from color photos or scans. Enhancing Gamma Effect on Color Photos improves the extraction quality. Applying the
Grayscale
filter beforeGamma
may yield better gamma effects on color photos.Grayscale
alone is generally less useful.Removing Parasite Dots and Artifacts producing small garbled text snippets: Combining the
Median
filter with high-resolution rendering (600+
DPI) can help remove parasite dots from scanned images or fax rasterization artifacts. However, this approach may also remove punctuation symbols.Fixing Etched/Distorted Letters: The
Dilate
filter can be used to repair etched or distorted letters in images.
List of OCR Image Preprocessing Filters Supported By PDF Multitool and PDF.co API
Contrast
- Adds the Contrast image filter, which enhances image quality for OCR by improving contrast. This filter is particularly helpful for images where the text color is gray or similar to the background color. Lowering gamma and raising contrast can effectively remove background noise from images.Deskew
- Applies the Deskew image filter with a default angle threshold of 0.4 degrees (minimal admissible skew angle). This filter is useful for fixing slight rotatin of scanned images. For scans rotated 90, 180, 270 degrees, use the RotationAngle parameter in profiles instead, for example { ‘rotationAngle’: 1 }. RotationAngle parameters available are the following:0
no rotation (default)1
90 degrees2
180 degrees3
270 degrees
Dilate
- Incorporates the “Dilate” image filter, which improves image quality for OCR by thickening the letter strokes. The Dilate filter can be used to repair etched or distorted letters in images.Fit
- Adds the Fit image filter with a specified size limit. The image is proportionally resized when its width or height exceeds the limit, which improves text extraction performance from large images.Gamma
- Implements the Gamma Correction filter with a default value of 1.4. This filter enhances image quality for OCR by automatically lightening dark images.Grayscale
- Applies the “Grayscale” image filter. Applying theGrayscale
filter beforeGamma
may yield better gamma effects on color photos, althoughGrayscale
alone is less useful.HorizontalLinesRemover
- Integrates the “Horizontal Lines Remover” image filter. This filter enhances OCR text recognition quality inside borders and near borders by removing horizontal lines before text recognition. IMPORTANT: this filter is added by default in PDF.co cloud and on-prem. If you don’t need it, setprofiles
to{ 'OCRImagePreprocessingFilters.Clear()': [] }
.VerticalLinesRemover
- Implements the “Vertical Lines Remover” image filter. This filter enhances OCR text recognition quality inside borders and near borders by removing vertical lines before text recognition. IMPORTANT: this filter is added by default in PDF.co cloud and on-prem. If you don’t need it, setprofiles
to{ 'OCRImagePreprocessingFilters.Clear()': [] }
.Invert
- Adds theInvert
(negative) image filter. Sometime, scanned documents are inverted (white text on black background). This filter can be used to fix this issue by inverting all colors before extracting text.Median
- Incorporates the “Median” image filter. Combining theMedian
filter with high-resolution rendering (`600`+ DPI) can help remove parasite dots from scanned images or fax rasterization artifacts. However, this approach may also remove punctuation symbols.Scale
- Adds the Scale image filter with a specified scale factor. For example, 2.0 doubles the size of the input image, improving the recognition quality of small letters.
ByteScout PDF Multitool
- more information at https://bytescout.com/products/pdfmultitool/index.html.
How to use custom fonts?#
Due to possible security and licensing issues, we cannot add third-party fonts to our server. However, we have a PDF.co Self-Hosted server that will allow you to install custom fonts. The PDF.co Self-Hosted server is on-premise and must be hosted in your infrastructure.
Here’s a comparison of our PDF.co Cloud and PDF.co Self-hosted https://pdf.co/pricing/on-demand-cloud-vs-dedicated-vs-on-prem.
Please let us know if you’re interested in the PDF.co Self-Hosted server.
Another way to use custom fonts is through the HTML to PDF API. There are two ways that you can use custom fonts in your HTML template.
You can use custom fonts or find fonts similar to what you’d like to use in Google Web Fonts https://fonts.google.com/
You can embed fonts hosted from another resource https://stackoverflow.com/questions/24990554/how-to-include-a-font-ttf-using-css
You can read about HTML Template to PDF here: https://pdf.co/html-template-to-pdf.
IP addresses used by PDF.co Cloud#
The PDF.co Cloud is hosted on Amazon AWS infrastructure. For information about the IP addresses and IP address ranges used by AWS, you can refer to this link: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html.
Moreover, we are presently utilizing us-west-2 or the Oregon region for our servers. You may find details about the AWS Regions and IP ranges in this link: https://docs.aws.amazon.com/quicksight/latest/user/regions.html.
Where can I find the PDF.co output in Zapier?#
The PDF.co output is temporary and expires after an hour by default. The expiration can be extended in the Business plan.
We recommend that you add a third step in your Zap to save the PDF output to a permanent cloud storage such as Google Drive, Dropbox, or similar.
Here’s a step-by-step guide on how to set it up. It starts at Step 6: https://pdf.co/make-pdf-searchable-and-upload-in-google-drive#6.
If you’d like to review the generated output, please check out Step 5 here: https://pdf.co/make-pdf-searchable-and-upload-in-google-drive#5.
Who can access the pdf-temp-files
, and how long are files stored?#
The pdf-temp-files
storage is a private Amazon S3 bucket that utilizes strong industry-standard encryption at rest. Uploaded and output files are temporarily stored in this bucket under highly randomized names generated using a secure random generator. Each file is set to expire in 60
minutes by default and is automatically deleted permanently from the bucket upon expiration. Depending on your subscription plan, you may increase the expiration timeout from 5
minutes to 1440
minutes (1,440 minutes = 24 hours) using the expiration
parameter. You may also remove a file directly using the file/delete
endpoint at any time.
Since the pdf-temp-files
storage is a private bucket, files are accessed via a special “signed” link using the Amazon AWS powered signed links mechanism. This mechanism provides an additional layer of security when accessing the file.
The pdf-temp-files
bucket is not included in any backups. Only our engineers have temporary access to this bucket, and 2FA is enforced and required for access. Each access session to the storage is automatically logged, and information about the files’ relation to a specific user is stored separately in a different database.
For additional encryption of the file content, you may utilize user-controlled encryption. This feature provides a way to encrypt output file content with your own encryption option using industry-standard AES encryption, which is supported by all platforms, including Salesforce and others.
How to increase output link expiration#
The output link expires in 1 hour by default. This can be extended up to 24 hours in Business 2 and Business 3 plans. For more information, please visit the Subscription page.
To extend the output link expiration, please add the expiration parameter in your code with value set in minutes. Setting the expiration value to 1440 will generate an output link that expires after 24 hours.
{
"expiration": 1440
}