Skip to main content

Extract XMP Metadata


Extract XMP meta-data from PDF documents using the pdf-xmp endpoint.

The pdf-xmp endpoint is for extracting XMP meta-data from PDF documents. In this tutorial we demonstrate just how easy it is to extract XMP meta-data from a PDF document via the pdf-xmp endpoint. We first call the pdf-xmp endpoint directly using REST.

We then use the DynamicPDF client libraries to illustrate using pdf-xmp with the C#, Java, Node.js, and PHP client libraries.

Required Resources

To complete this tutorial, you must add the Get XMP Metadata sample to your samples folder in your cloud storage space using the File Manager. After adding the sample resources, you should see a samples/get-xmp-metadata-pdf-endpoint folder containing the resources for this tutorial.

SampleSample FolderResources
Get XMP Metadatasamples/get-xmp-metadata-pdf-endpointfw4.pdf
  • From the File Manager, download fw4.pdf to your local system; here we assume /temp/dynamicpdf-api-samples/get-xmp-metadata.
  • After downloading, delete fw4.pdf from your cloud storage space using the File Manager.
ResourceCloud/Local
fw4.pdflocal
tip

See Sample Resources for instructions on adding sample resources.

Obtaining API Key

This tutorial assumes a valid API key obtained from the DynamicPDF API's Portal. Refer to the following for instructions on getting an API key.

tip

If you are not familiar with the File Manager or Apps and API Keys, refer to the following tutorial and relevant Users Guide pages.

Calling API Directly Using POST

The pdf-xmp endpoint takes a POST request. When using cURL, you specify the endpoint, the HTTP command, the API key and the local resources required. The following cURL command illustrates.

  • Create a cURL POST request, where you pass the API key as a header and the PDF as binary data.
curl -X POST "https://api.dpdf.io/v1.0/pdf-xmp" 
-H "Content-Type: application/pdf"
-H "Authorization: Bearer DP.xxx-api-key-xxx"
--data-binary "@c:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf"
  • Execute the cURL command and the XML metadata is written to the commandline.
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.398682, 2009/08/10-13:00:47 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:subject>
<rdf:Bag>
<rdf:li>Fillable</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">Employee's Withholding Certificate</rdf:li>
</rdf:Alt>
</dc:description>
<dc:creator>
<rdf:Seq>
<rdf:li>SE:W:CAR:MP</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">2021 Form W-4</rdf:li>
</rdf:Alt>
</dc:title>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:CreatorTool>Adobe LiveCycle Designer ES 9.0</xmp:CreatorTool>
<xmp:MetadataDate>2020-12-31T09:12:43-05:00</xmp:MetadataDate>
<xmp:ModifyDate>2020-12-31T09:12:43-05:00</xmp:ModifyDate>
<xmp:CreateDate>2020-12-31T09:12:43-05:00</xmp:CreateDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Adobe LiveCycle Designer ES 9.0</pdf:Producer>
<pdf:Keywords>Fillable</pdf:Keywords>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:01d97a6e-5605-44ae-8015-54a82bc56c5c</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:9d6007b3-eacb-4f13-8d6b-da9d46b7dfb3</xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:desc="http://ns.adobe.com/xfa/promoted-desc/">
<desc:embeddedHref rdf:parseType="Resource">
<rdf:value>..\..\..\..\..\..\..\TFACS\Misc\logo\pencil.bmp</rdf:value>
<desc:ref>/template/subform[1]/subform[3]/draw[2]</desc:ref>
</desc:embeddedHref>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

Calling Endpoint Using Client Library

To simplify development, you can also use one of the DynamicPDF API client libraries. Use the client library of your choice to complete this tutorial section.

Complete Source

You can access the complete source for this project at one of the following GitHub projects.

LanguageFile NameLocation (package/namespace/etc.)GitHub Project
JavaGetXmpMetaData.javacom.dynamicpdf.api.exampleshttps://github.com/dynamicpdf-api/java-client-examples
C#Program.csGetXmpMetaDatahttps://github.com/dynamicpdf-api/dotnet-client-examples
NodejsGetXmpMetaData.jsnodejs-client-exampleshttps://github.com/dynamicpdf-api/nodejs-client-examples
PHPGetXmpMetaData.phpphp-client-exampleshttps://github.com/dynamicpdf-api/php-client-examples
GOpdf-xmp-example.gogo-client-exampleshttps://github.com/dynamicpdf-api/go-client-examples/tree/main
PythonPdfXmpExample.pypython-client-exampleshttps://github.com/dynamicpdf-api/python-client-examples
tip

Click on the language tab of choice to view the tutorial steps for the particular language.

Available on NuGet:

Install-Package DynamicPDF.API
  • Create a new Console App (.NET Core) project named GetXmpMetaData.
  • Add the DynamicPDF.API NuGet package.
  • Create a new static method named Run.
  • Add the following code to the Run method.
  • Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
  • Add a call to the Process method in the PdfXmp instance.
  • Ensure the call was successful and add code to print the results to the console.
  • Run the application and the XML metadata is printed to the console.
using DynamicPDF.Api;
using System;

namespace GetXmpMetaData
{
class Program
{
static void Main(string[] args)
{
Run("DP.xxx-api-key-xxx", "C:/temp/dynamicpdf-api-samples/get-xmp-metadata");
}

public static void Run(String apiKey, String basePath)
{
//get the local pdf as pdf resource
PdfResource resource = new PdfResource(basePath + "/fw4.pdf");

//load the pdf and call the endpoint
PdfXmp pdfXmp = new PdfXmp(resource);
pdfXmp.ApiKey = apiKey;
XmlResponse response = pdfXmp.Process();

//if successful print results to console
if (response.IsSuccessful)
{
Console.WriteLine(response.Content);
}
else
{
Console.WriteLine(response.ErrorJson);
}
}
}
}

In all six languages, the steps were similar. First, we created a new PdfResource instance by loading the path to the PDF via the constructor. Next, we created a new instance of the PdfXmp class, which abstracts the pdf-xmp endpoint. Then the PdfXmp instance prints the XML metadata after processing. Finally, we called the Process method and printed the resultant XML to the console.

   Follow us on social media for latest news!