Skip to main content

Extract PDF Metadata


The pdf-info endpoint returns metadata from a PDF document.

The pdf-info endpoint extracts a PDF's metadata as a JSON document. In this tutorial we use the pdf-info endpoint to fetch metadata from a PDF document and return that metadata as a JSON document. We first call the pdf-info REST endpoint directly using cURL.

We then use the C# client library to invoke the REST endpoint programmatically.

Required Resources

To complete this tutorial, you must add the Merge PDFs to your samples folder in your cloud storage space using the File Manager. After adding the sample resources, you should see a samples/get-pdf-info-pdf-info-endpoint folder containing the resources for this tutorial.

SampleSample FolderResources
Get Pdf Informationsamples/get-pdf-info-pdf-info-endpointfw4.pdf
  • From the File Manager, download fw4.pdf to your local system; here we assume /temp/dynamicpdf-api-samples/get-pdf-info.
  • After downloading, delete the documents and instructions from your cloud storage space using the File Manager.
ResourceCloud/Local
fw4.pdflocal
tip

See Sample Resources for instructions on adding sample resources.

Obtaining API Key

This tutorial assumes a valid API key obtained from the DynamicPDF API's Portal. Refer to the following for instructions on getting an API key.

tip

If you are not familiar with the File Manager or Apps and API Keys, refer to the following tutorial and relevant Users Guide pages.

Make Request Using API

Let's begin by invoking the pdf-info REST endpoint directly using cURL.

  • Create and execute the following cURL command.
curl -X POST "https://api.dpdf.io/v1.0/pdf-info" 
-H "Authorization: Bearer DP.xxx-api-key-xxx"
-H "Content-Type: application/pdf"
--data-binary "@c:/temp/dynamicpdf-api-samples/get-pdf-info/fw4.pdf"

The cURL command makes a POST call to the pdf-info endpoint, passing the Authorization and Content-Type headers. It also sends the fw4.pdf file as binary data. The endpoint then returns a JSON document containing metadata describing the PDF.

Examine API Response

  • After executing the cURL command you should see the following JSON metadata.
{
"author": "SE:W:CAR:MP",
"subject": "Employee's Withholding Certificate",
"keywords": "Fillable",
"creator": "Adobe LiveCycle Designer ES 9.0",
"producer": "Adobe LiveCycle Designer ES 9.0",
"title": "2021 Form W-4",
"pages": [
{
"pageNumber": 1,
"width": 611.976,
"height": 791.968
},
{
"pageNumber": 2,
"width": 611.976,
"height": 791.968
},
{
"pageNumber": 3,
"width": 611.976,
"height": 791.968
},
{
"pageNumber": 4,
"width": 611.976,
"height": 791.968
}
],
"formFields": {
"signatureFields": null,
"textFields": [
{
"name": "topmostSubform[0].Page1[0].Step1a[0].f1_01[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].Step1a[0].f1_02[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].Step1a[0].f1_03[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].Step1a[0].f1_04[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_05[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].Step3_ReadOrder[0].f1_06[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].Step3_ReadOrder[0].f1_07[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_08[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_09[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_10[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_11[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_13[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_14[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page1[0].f1_15[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_01[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_02[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_03[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_04[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_05[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_06[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_07[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_08[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_09[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_10[0]",
"value": null,
"defaultValue": ""
},
{
"name": "topmostSubform[0].Page3[0].f3_11[0]",
"value": null,
"defaultValue": ""
}
],
"choiceFields": null,
"buttonFields": [
{
"name": "topmostSubform[0].Page1[0].c1_1[0]",
"type": "checkBox",
"value": null,
"defaultValue": "",
"exportValue": "1",
"exportValues": null
},
{
"name": "topmostSubform[0].Page1[0].c1_1[1]",
"type": "checkBox",
"value": null,
"defaultValue": "",
"exportValue": "2",
"exportValues": null
},
{
"name": "topmostSubform[0].Page1[0].c1_1[2]",
"type": "checkBox",
"value": null,
"defaultValue": "",
"exportValue": "3",
"exportValues": null
},
{
"name": "topmostSubform[0].Page1[0].Step2c[0].c1_2[0]",
"type": "checkBox",
"value": null,
"defaultValue": "",
"exportValue": "1",
"exportValues": null
}
],
"pushButtons": null,
"multiSelectListBoxFields": null
},
"customProperties": null,
"xmpMetaData": true,
"signed": false,
"tagged": true
}

Calling Endpoint Using Client Library

Although using the pdf-info endpoint is straightforward, you can also use on the the DynamicPDF API client libraries. Complete Source. You can access the complete source for this project at one of the following GitHub projects.

LanguageFile NameLocation (package/namespace/etc.)GitHub Project
JavaGetPdfInfo.javacom.dynamicpdf.api.exampleshttps://github.com/dynamicpdf-api/java-client-examples
C#Program.csGetPdfInfohttps://github.com/dynamicpdf-api/dotnet-client-examples
NodejsGetPdfInfo.jsnodejs-client-exampleshttps://github.com/dynamicpdf-api/nodejs-client-examples
PHPGetPdfInfo.phpphp-client-exampleshttps://github.com/dynamicpdf-api/php-client-examples
GOpdf-info-example.gogo-client-exampleshttps://github.com/dynamicpdf-api/go-client-examples/tree/main
PythonPdfInfoExample.pypython-client-exampleshttps://github.com/dynamicpdf-api/python-client-examples
tip

Click on the language tab of choice to view the tutorial steps for the particular language.

Available on NuGet:

Install-Package DynamicPDF.API
  • Create a new Visual Studio C# Console App (.NET Core) project named GetPdfInfo.
  • Add the DynamicPdf.Api Nuget package to the project.
  • Create a new static method named Run that takes the API key and base path as strings.
  • Add the call to Run to Main.
  • Create a new PdfResource instance that takes the path to the PDF.
  • Create a new PdfInfoinstance and pass the PdfResource instance to the constructor.
  • Add a call the endpoint using the PdfInfo instance's Process method.
  • Check that the call was successful and if successful, then write the PDF information (as JSON metadata) to the console.
using DynamicPDF.Api;
using System;

namespace GetPdfInfo
{
class Program
{
static void Main(string[] args)
{
Run("DP.xxx-api-key-xxx", "c:/temp/dynamicpdf-api-samples/get-pdf-info/");
}

public static void Run(String apiKey, String basePath)
{
PdfResource resource = new PdfResource(basePath + "fw4.pdf");
PdfInfo pdfInfo = new PdfInfo(resource);
pdfInfo.ApiKey = apiKey;
PdfInfoResponse response = pdfInfo.Process();

if (response.IsSuccessful)
{
Console.WriteLine(response.JsonContent);
} else
{
Console.WriteLine(response.ErrorJson);
}
}
}
}

In all six languages, the steps were similar. First, we created a new PdfResource instance by loading the path to the PDF via the constructor. Next, we created a new instance of the PdfInfo class, which abstracts the pdf-info endpoint. Then the PdfInfo instance prints the extracted PDF information as JSON after processing. Finally, we called the Process method and print the resultant JSON to the console.

   Follow us on social media for latest news!