Skip to main content

Changes to the pdf-text Endpoint

· 2 min read
James A. Brannan
Developer Evangelist

The DynamicPDF API has an important new feature for the pdf-text endpoint. The endpoint now supports extracting text as it appears visually or as it appears visually without truncating extra space. Previously, when extracting text, all text appeared in the order in which it was added to the PDF.

The pdf-text endpoint now allows extracting text as it appears visually in the PDF or as it appears visually without truncating extra space. Previously, when extracting text, all text appeared in the order in which it was added to the PDF.

The endpoint now has the additional textOrder query parameter. The parameter's values are: stream, visible, and visibleExtraSpace.

For example, the following PDF contains a header created when using our product, DynamicPDF Converter, to evaluate creating a PDF.

When extracting text the pdf-text endpoint's JSON response displays the header after the text because this is the order in the PDF document's operators. This is the default behavior or behavior when specifying textOrder=stream.

curl --location 'https://api.dpdf.io/v1.0/pdf-text?textOrder=stream'
--header 'Authorization: Bearer DP.--api-key--'
--header 'Content-Type: application/pdf'
--data-binary '@/C:/temp/sample.pdf'

Specifying textOrder = visible results in the endpoint's JSON response displaying the extracted text in the order visible to the PDF end user with extra space removed from the extracted text.

curl --location 'https://api.dpdf.io/v1.0/pdf-text?textOrder=visible'
--header 'Authorization: Bearer DP.--api-key--'
--header 'Content-Type: application/pdf'
--data-binary '@/C:/temp/sample.pdf'

Specifying textOrder = visibleExtraSpace results in the endpoint's JSON response displaying the extracted text in the order visible to the PDF end user and it maintains the extra visible space in the PDF.

curl --location 'https://api.dpdf.io/v1.0/pdf-text?textOrder=visibleExtraSpace'
--header 'Authorization: Bearer DP.--api-key--'
--header 'Content-Type: application/pdf'
--data-binary '@/C:/temp/sample.pdf'

This parameter allows extra flexibility in how you extract text from PDF documents. You can now extract text in the natural PDF operator order or the visible order of the PDF text. You can also choose to preserve a PDF's extra space rather than truncating it.

More Information

For more information, refer to the pdf-text endpoint's documentation.

   Follow us on social media for latest news!