PDF to XML#
Available Methods#
/pdf/convert/to/xml#
Convert PDF to XML with information about text value, tables, fonts, images, objects positions.
Method: POST
Endpoint: /v1/pdf/convert/to/xml
Attributes#
Note
Attributes are case-sensitive and should be inside JSON for POST request, for example:
{
"url": "https://example.com/file1.pdf"
}
Attribute |
Description |
Required |
---|---|---|
|
URL to the source file. 1 |
yes |
|
HTTP auth user name if required to access source |
no |
|
HTTP auth password if required to access source |
no |
|
Comma-separated indices of pages (or page ranges) that you want to use. The first-page index is always 0. For example, if you have a 7-page document that you want to be split into 3 separate PDFs but a different number of pages it would go like this: 0, 1, 2- or 1, 2, 3-7 which will result in 1 PDF with page one, 1 PDF with page two and one PDF with the rest of the pages. You can also use inverted page numbers adding |
no |
|
Unwrap lines into a single line within table cells when |
no |
|
Defines coordinates for extraction, e.g. |
no |
|
Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. The default is |
no |
|
Set to |
no |
|
Line grouping within table cells. Set to |
no |
|
Password of PDF file, the input must be in string format. |
no |
|
Set |
no |
|
File name for the generated output, the input must be in string format. |
no |
|
Set the expiration time for the output link in minutes (default is |
no |
|
Use this parameter to set additional configurations for fine-tuning and extra options. Explore the Profiles section for more. |
no |
Query parameters#
No query parameters accepted.
Payload#
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
"async": false
}
Response 2#
{
"body": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<document>\r\n <page index=\"0\">\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"24.0\" fontStyle=\"Bold\" color=\"#538DD3\" x=\"36.00\" y=\"34.44\" width=\"242.81\" height=\"24.00\">Your Company Name</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"76.94\" width=\"66.62\" height=\"11.04\">Your Address</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"91.46\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"461.02\" y=\"115.94\" width=\"98.42\" height=\"11.04\">Invoice No. 123456</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"436.54\" y=\"130.46\" width=\"122.90\" height=\"11.04\">Invoice Date 01/01/2016</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"154.94\" width=\"63.62\" height=\"11.04\">Client Name</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"169.70\" width=\"40.34\" height=\"11.04\">Address</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"184.22\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"233.30\" width=\"28.70\" height=\"11.04\">Notes</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"316.25\" width=\"22.58\" height=\"11.04\">Item</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"247.61\" y=\"316.25\" width=\"44.64\" height=\"11.04\">Quantity</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"398.95\" y=\"316.25\" width=\"26.91\" height=\"11.04\">Price</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"533.14\" y=\"316.25\" width=\"26.30\" height=\"11.04\">Total</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"341.33\" width=\"30.62\" height=\"11.04\">Item 1</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"341.33\" width=\"6.12\" height=\"11.04\">1</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"341.33\" width=\"27.51\" height=\"11.04\">40.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"341.33\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"362.45\" width=\"30.62\" height=\"11.04\">Item 2</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"362.45\" width=\"6.12\" height=\"11.04\">2</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"362.45\" width=\"27.51\" height=\"11.04\">30.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"362.45\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"383.57\" width=\"30.62\" height=\"11.04\">Item 3</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"383.57\" width=\"6.12\" height=\"11.04\">3</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"383.57\" width=\"27.51\" height=\"11.04\">20.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"383.57\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"404.93\" width=\"30.62\" height=\"11.04\">Item 4</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"404.93\" width=\"6.12\" height=\"11.04\">4</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"404.93\" width=\"27.51\" height=\"11.04\">10.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"404.93\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"389.11\" y=\"425.83\" width=\"36.75\" height=\"11.04\">TOTAL</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"525.82\" y=\"425.83\" width=\"33.62\" height=\"11.04\">200.00</text>\r\n </column>\r\n </row>\r\n </page>\r\n</document>",
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.xml",
"remainingCredits": 60563
}
CURL#
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xml' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
"async": false
}'
Code samples#
Footnotes
- 1
Supports links from Google Drive, Dropbox, and PDF.co Built-In Files Storage. To upload files via the API check out the File Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add
cache:
to enable built-in URL caching. (e.gcache:https://example.com/file1.pdf
) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.- 2
Main response codes as follows:
Code
Description
200
Success
400
Bad request. Typically happens because of bad input parameters, or because the input URLs can’t be reached, possibly due to access restrictions like needing a login or password.
401
Unauthorized
402
Not enough credits
445
Timeout error. To process large documents or files please use asynchronous mode (set the
async
parameter totrue
) and then check status using the /job/check endpoint. If a file contains many pages then specify a page range using thepages
parameter. The number of pages of the document can be obtained using the /pdf/info endpoint.Note
For more see the complete list of available response codes.