pdf-text
Use the pdf-text endpoint to extract text from a PDF.
Use the pdf-text
endpoint to extract text from a PDF. It uses an HTTP POST to send a PDF as binary and then returns the extracted text as a JSON response. The pdf-text
endpoint takes an HTTP POST form submission, where the PDF is sent as binary in the form's body.
Refer to the following Users Guide page if you need more information illustrating how to call the endpoint directly as a REST call.
- Calling the
pdf-text
endpoint using REST (pdf-text REST API).
API
- C# (.NET)
- Java
- Node.js
- PHP
- Go
- Python
The PdfText
class encapsulates the pdf-text
endpoint. A PdfText
instance takes a PdfResource
instance. A PdfResource
is constructed from a PDF coming from a file, a byte array, or stream.
public PdfText(PdfResource resource);
public class PdfResource : Resource
{
public PdfResource(string filePath, string resourceName = null);
public PdfResource(byte[] value, string resourceName = null);
public PdfResource(Stream data, string resourceName = null);
}
The PdfText
class encapsulates the pdf-text
endpoint. A PdfText
instance takes a PdfResource
instance. A PdfResource
is constructed from a PDF coming from a file, a byte array, or Inputstream
.
public PdfText(PdfResource resource);
public class PdfResource extends Resource
{
public PdfResource(String filePath, String resourceName);
public PdfResource(String filePath);
public PdfResource(byte[] value, String resourceName);
public PdfResource(byte[] value);
public PdfResource(InputStream data, String resourceName);
public PdfResource(InputStream data);
}
The PdfText
class encapsulates the pdf-text
endpoint. A PdfText
instance takes a PdfResource
instance. A PdfResource
is constructed from a PDF coming from a file or a byte array.
export class PdfText extends Endpoint {
constructor(resource);
}
export class PdfResource extends Resource {
/**
* Initializes a new instance of the `PdfResource` class.
* @param {string | Buffer[]} input The pdf file path. | The byte array of the pdf file.
* @param {string} resourceName The name of the resource.
*/
constructor(pdf, resourceName);
The PdfText
class encapsulates the pdf-text
endpoint. A PdfText
instance takes a PdfResource
instance. A PdfResource
is constructed from a PDF coming from a file or a byte array.
class PdfText extends Endpoint
{
public function __construct(PdfResource $resource)
}
class PdfResource extends Resource
{
/**
*
* Initializes a new instance of the PdfResource class.
*
* @param string|array|stream $filePath The pdf file path or the byte
* array of the pdf file or the stream of the pdf file.
* @param ?string $resourceName The name of the resource.
*/
public function __construct($file, ?string $resourceName = null)
/** Represents the pdf text endpoint.*/
type PdfText struct {
}
/**
* Initializes a new instance of the `PdfText` class.
* @param { PdfResource } resource The image resource of type `PdfResource`.`
* @param { int } startPage The start page.
* @param { int } pageCount The page count.
*/
func NewPdfText(resource resource.PdfResource, startpage int, pagecount int) *PdfText
/* Represents the pdf resource. */
type PdfResource struct {
Resource
fileExtension string
}
func NewPdfResourceWithResourcePath(resource string, resourceName string) PdfResource
func NewPdfResourceWithByteValue(resource string, resourceName string) PdfResource
The PdfText
class encapsulates the pdf-text
endpoint. A PdfText
extends the Endpoint
class and takes a PdfResource
instance. A PdfResource
extends the Resource
class and is constructed from a PDF coming from a file, a byte array, or io.BytesIO
.
class PdfText(Endpoint):
def __init__(self, resource, start_page=1, page_count=0):
super().__init__()
self.resource = resource
self.StartPage = start_page
self.PageCount = page_count
self.EndpointName = "pdf-text"
Example
A complete example is available via one of the following GitHub projects depending upon the language you wish to use.
Language | GitHub Users Guide Project | Class | Location/Namespace/Package |
---|---|---|---|
C# | https://github.com/dynamicpdf-api/dotnet-client-examples | Program.cs | namespace PdfTextExample |
Go | https://github.com/dynamicpdf-api/go-client-examples | pdf-text-example.go | go-client-examples |
Java | https://github.com/dynamicpdf-api/java-client-examples | PdfTextExample.java | com.dynamicpdf.api.client.examples |
Node.js | https://github.com/dynamicpdf-api/nodejs-client-examples | PdfTextExample.js | nodejs-users-guide |
PHP | https://github.com/dynamicpdf-api/php-client-examples | PdfTextExample.php | php-cllient-examples |
Python | https://github.com/dynamicpdf-api/python-client-examples | PdfTextExample.py | python-client-examples |
The processing and syntax are similar for all six languages.
- Create a new
PdfText
instance and pass aPdfResource
instance to the constructor. - Call the
PdfText
instance'sProcess
method and get the results as aPdfTextResponse
which contains the extracted text as a JSON document.
- C# (.NET)
- Java
- Node.js
- PHP
- Go
- Python
using DynamicPDF.Api;
using System;
namespace PdfTextExample
{
class Program
{
static void Main(string[] args)
{
Run("DP.xxx-api-key-xxx", "C:/temp/dynamicpdf-api-usersguide-examples/");
}
public static void Run(String apiKey, String basePath)
{
PdfResource resource = new PdfResource(basePath + "/fw4.pdf");
PdfText pdfText = new PdfText(resource);
pdfText.ApiKey = apiKey;
PdfTextResponse response = pdfText.Process();
Console.WriteLine(PrettyPrintUtil.JsonPrettify(response.JsonContent));
}
}
}
import {
PdfResource,
PdfText
} from "@dynamicpdf/api"
export class PdfTextExample {
static async Run() {
var basePath = "C:/temp/dynamicpdf-api-usersguide-examples/";
var apiKey = "DP.xxx-api-key-xxx";
var resource = new PdfResource(basePath + "fw4.pdf");
var pdfText = new PdfText(resource);
pdfText.apiKey = apiKey;
var res = await pdfText.process();
if (res.isSuccessful) {
console.log(JSON.parse(res.content));
}
}
}
await PdfTextExample.Run();
package com.dynamicpdf.api.examples;
import com.dynamicpdf.api.PdfResource;
import com.dynamicpdf.api.PdfText;
import com.dynamicpdf.api.PdfTextResponse;
import com.dynamicpdf.api.util.PrettyPrintUtility;
public class PdfTextExample {
public static void Run(String apiKey, String basePath)
{
PdfResource resource = new PdfResource(basePath + "fw4.pdf");
PdfText pdfText = new PdfText(resource);
pdfText.setApiKey(apiKey);
PdfTextResponse response = pdfText.process();
System.out.println(PrettyPrintUtility.prettyPrintJSON(response.getJsonContent()));
}
public static void main(String[] args) {
PdfTextExample.Run("DP.xxx--api-key--xxx",
"C:/temp/dynamicpdf-api-usersguide-examples/");
}
}
<?php
use DynamicPDF\Api\PdfResource;
use DynamicPDF\Api\PdfText;
require __DIR__ . '/vendor/autoload.php';
class PdfTextExample
{
private static string $BasePath = "C:/temp/dynamicpdf-api-usersguide-examples/";
private static string $ApiKey = "DP.xxx-api-key-xxx";
public static function Run()
{
$resource = new PdfResource(PdfTextExample::$BasePath . "fw4.pdf");
$pdfText = new PdfText($resource);
$pdfText->ApiKey = PdfTextExample::$ApiKey;
$response = $pdfText->Process();
echo ($response->JsonContent);
}
}
PdfTextExample::Run();
func main() {
resource := resource.NewPdfResourceWithResourcePath("C:/temp/dynamicpdf-api-samples/fw4.pdf", "fw4.pdf")
txt := endpoint.NewPdfText(resource,1,3)
txt.Endpoint.BaseUrl = "https://api.dpdf.io/"
txt.Endpoint.ApiKey = "DP.xxx-api-key-xxx"
resp := txt.Process()
res := <-resp
if res.IsSuccessful() == true {
fmt.Print(string(res.Content().Bytes()))
}
}
from dynamicpdf_api.pdf_text import PdfText
from dynamicpdf_api.pdf_resource import PdfResource
from Shared import *
def pdf_text_example(apikey, full_path):
resource = PdfResource(full_path + "fw4.pdf")
pdf_text = PdfText(resource)
pdf_text.api_key = apikey
pdf_text.start_page=1
pdf_text.page_count=2
response = pdf_text.process()
print(response.json_content)
if __name__ == "__main__":
pdf_text_example(api_key, base_path + "/pdf-info/")