pdf-text

Use the pdf-text
endpoint to extract text from a PDF. It uses an HTTP POST to send a PDF as binary and then returns the extracted text as a JSON response.
Request/Response
Request | Request Data | Response |
---|---|---|
POST | PDF as byte array | JSON document containing PDF text. |
The pdf-text
endpoint takes an HTTP POST form submission, where the PDF is sent as binary in the form's body.
POST https://api.dpdf.io/v1.0/pdf-text HTTP/1.1
Authorization: Bearer DP.xxx-your-api-key-xxx
Refer to pdf-text in the Client Libraries Users Guide section for this example using the DynamicPDF API's client libraries.
Parameters
The endpoint takes two optional query parameters, startPage
and pageCount
.
Parameter | Type | Description | |
---|---|---|---|
Query String | |||
startPage | integer | The page to start extracting text. | |
pageCount | integer | The number of pages to extract text. | |
textOrder | stream , visible , visibleExtraSpace | Display order for extracted text and if extra space included or excluded. | |
Request Body | Type | ||
Body | binary | The PDF document to send to the endpoint. | REQUIRED |
Header | Value | ||
Content-Type | application/pdf | Content type sent in request. | |
Authorization | Bearer DP.V9xxxxxxxxxxxxxxx | The API key. | REQUIRED |
The PDF is sent to the endpoint as binary data. In this example request, the binary is coming from a file, but it could also come from a URL, a database, or other source.
Example cURL Request
curl -X POST "https://api.dpdf.io/v1.0/pdf-text?startPage=1&pageCount=2"
-H "accept: application/json"
-H "Authorization: Bearer DP.xxx-your-api-key-xxx"
-H "Content-Type: application/pdf"
--data-binary "@c:/holding/pdf-text/fw4.pdf"
Example Response
[
{
"pageNumber": 1,
"text": "[DynamicPDF Evaluation] Form W-4\n(Rev. December 2020)\nDepartment of the Treasury \nInternal Revenue Service \nEmployee’s Withholding Certificate\n ▶ Complete Form W-4 so that your employer can withhold the correct federal income tax from your pay. \n ▶ Give Form W-4 to your empl ....[Text Truncated - Please purchase a license or contact support for an evaluation license.]"
},
{
"pageNumber": 2,
"text": "[DynamicPDF Evaluation] Form W-4 (2021) Page 2\nGeneral Instructions\nFuture Developments\nFor the latest information about developments related to \nForm W-4, such as legislation enacted after it was published, \ngo to www.irs.gov/FormW4 .\nPurpose of Form\nComplete Form W-4 so that yo ....[Text Truncated - Please purchase a license or contact support for an evaluation license.]"
}
]
Further Information
You can also use one of DynamicPDF's provided client libraries to integrate the endpoints into your client applications if you prefer. For more information, refer to the relevant Users Guide sections:
- DynamicPDF API's Endpoints - (cloud-api-overview), and
- DynamicPDF API's Client Libraries - (cloud-api-client-libraries).