Extract XMP Metadata

Extract XMP meta-data from PDF documents using the pdf-xmp endpoint.

The pdf-xmp endpoint is for extracting XMP meta-data from PDF documents. In this tutorial we demonstrate just how easy it is to extract XMP meta-data from a PDF document via the pdf-xmp endpoint. We first call the pdf-xmp endpoint directly using REST.

Check out our blog for tips and tutorials!

We then use the DynamicPDF client libraries to illustrate using pdf-xmp with the C#, Java, Node.js, and PHP client libraries.

Required Resources

To complete this tutorial, you must add the Get XMP Metadata sample to your samples folder in your cloud storage space using the File Manager. After adding the sample resources, you should see a samples/get-xmp-metadata-pdf-endpoint folder containing the resources for this tutorial.

Sample	Sample Folder	Resources
Get XMP Metadata	`samples/get-xmp-metadata-pdf-endpoint`	`fw4.pdf`

From the File Manager, download fw4.pdf to your local system; here we assume /temp/dynamicpdf-api-samples/get-xmp-metadata.
After downloading, delete fw4.pdf from your cloud storage space using the File Manager.

Resource	Cloud/Local
`fw4.pdf`	local

tip

See Sample Resources for instructions on adding sample resources.

Obtaining API Key

This tutorial assumes a valid API key obtained from the DynamicPDF API's Portal. Refer to the following for instructions on getting an API key.

Apps and API Keys

tip

If you are not familiar with the File Manager or Apps and API Keys, refer to the following tutorial and relevant Users Guide pages.

Calling API Directly Using POST

The pdf-xmp endpoint takes a POST request. When using cURL, you specify the endpoint, the HTTP command, the API key and the local resources required. The following cURL command illustrates.

Create a cURL POST request, where you pass the API key as a header and the PDF as binary data.

curl -X POST "https://api.dpdf.io/v1.0/pdf-xmp" 
-H  "Content-Type: application/pdf"
-H  "Authorization: Bearer DP.xxx-api-key-xxx" 
--data-binary "@c:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf"

Execute the cURL command and the XML metadata is written to the commandline.

<?xpacket begin="﻿" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.398682, 2009/08/10-13:00:47        ">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
            <dc:format>application/pdf</dc:format>
            <dc:subject>
                <rdf:Bag>
                    <rdf:li>Fillable</rdf:li>
                </rdf:Bag>
            </dc:subject>
            <dc:description>
                <rdf:Alt>
                    <rdf:li xml:lang="x-default">Employee's Withholding Certificate</rdf:li>
                </rdf:Alt>
            </dc:description>
            <dc:creator>
                <rdf:Seq>
                    <rdf:li>SE:W:CAR:MP</rdf:li>
                </rdf:Seq>
            </dc:creator>
            <dc:title>
                <rdf:Alt>
                    <rdf:li xml:lang="x-default">2021 Form W-4</rdf:li>
                </rdf:Alt>
            </dc:title>
        </rdf:Description>
        <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/">
            <xmp:CreatorTool>Adobe LiveCycle Designer ES 9.0</xmp:CreatorTool>
            <xmp:MetadataDate>2020-12-31T09:12:43-05:00</xmp:MetadataDate>
            <xmp:ModifyDate>2020-12-31T09:12:43-05:00</xmp:ModifyDate>
            <xmp:CreateDate>2020-12-31T09:12:43-05:00</xmp:CreateDate>
        </rdf:Description>
        <rdf:Description rdf:about=""
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
            <pdf:Producer>Adobe LiveCycle Designer ES 9.0</pdf:Producer>
            <pdf:Keywords>Fillable</pdf:Keywords>
        </rdf:Description>
        <rdf:Description rdf:about=""
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
            <xmpMM:DocumentID>uuid:01d97a6e-5605-44ae-8015-54a82bc56c5c</xmpMM:DocumentID>
            <xmpMM:InstanceID>uuid:9d6007b3-eacb-4f13-8d6b-da9d46b7dfb3</xmpMM:InstanceID>
        </rdf:Description>
        <rdf:Description rdf:about=""
            xmlns:desc="http://ns.adobe.com/xfa/promoted-desc/">
            <desc:embeddedHref rdf:parseType="Resource">
                <rdf:value>..\..\..\..\..\..\..\TFACS\Misc\logo\pencil.bmp</rdf:value>
                <desc:ref>/template/subform[1]/subform[3]/draw[2]</desc:ref>
            </desc:embeddedHref>
        </rdf:Description>
    </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

Calling Endpoint Using Client Library

To simplify development, you can also use one of the DynamicPDF API client libraries. Use the client library of your choice to complete this tutorial section.

Complete Source

You can access the complete source for this project at one of the following GitHub projects.

Language	File Name	Location (package/namespace/etc.)	GitHub Project
Java	`GetXmpMetaData.java`	`com.dynamicpdf.api.examples`	https://github.com/dynamicpdf-api/java-client-examples
C#	`Program.cs`	`GetXmpMetaData`	https://github.com/dynamicpdf-api/dotnet-client-examples
Nodejs	`GetXmpMetaData.js`	`nodejs-client-examples`	https://github.com/dynamicpdf-api/nodejs-client-examples
PHP	`GetXmpMetaData.php`	`php-client-examples`	https://github.com/dynamicpdf-api/php-client-examples
GO	`pdf-xmp-example.go`	`go-client-examples`	https://github.com/dynamicpdf-api/go-client-examples/tree/main
Python	`PdfXmpExample.py`	`python-client-examples`	https://github.com/dynamicpdf-api/python-client-examples

tip

Click on the language tab of choice to view the tutorial steps for the particular language.

Available on NuGet:

Install-Package DynamicPDF.API

Create a new Console App (.NET Core) project named GetXmpMetaData.
Add the DynamicPDF.API NuGet package.
Create a new static method named Run.
Add the following code to the Run method.
Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
Add a call to the Process method in the PdfXmp instance.
Ensure the call was successful and add code to print the results to the console.
Run the application and the XML metadata is printed to the console.

using DynamicPDF.Api;
using System;

namespace GetXmpMetaData
{
    class Program
    {
        static void Main(string[] args)
        {
            Run("DP.xxx-api-key-xxx", "C:/temp/dynamicpdf-api-samples/get-xmp-metadata");
        }

        public static void Run(String apiKey, String basePath)
        {
            //get the local pdf as pdf resource
            PdfResource resource = new PdfResource(basePath + "/fw4.pdf");
            
            //load the pdf and call the endpoint
            PdfXmp pdfXmp = new PdfXmp(resource);
            pdfXmp.ApiKey = apiKey;
            XmlResponse response = pdfXmp.Process();

            //if successful print results to console
            if (response.IsSuccessful)
            {
                Console.WriteLine(response.Content);
            }
            else
            {
                Console.WriteLine(response.ErrorJson);
            }
        }
    }
}

Available on NPM:

npm i @dynamicpdf/api

Use npm to install the DynamicPDF API module.
Create a new class named GetXmpMetaData.
Create a static Run method.
Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
Add a call to the Process method in the PdfXmp instance.

import fs from 'fs';
import {
    PdfXmp,
    PdfResource,
    Endpoint
} from "@dynamicpdf/api"

export class GetXmpMetaData {

    static async Run() {
        
        //get Pdf as PdfResource and load into new PdfXmp
        var resource = new PdfResource("C:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf")
        var pdfXmp = new PdfXmp(resource);
        pdfXmp.apiKey = "DP.xxx-api-key-xxx";

        //call the endpoint  too get results
        var res = await pdfXmp.process();

        //if call was successful print xml to console
        if (res.isSuccessful) {
            console.log(res.content);
        } else {
            console.log(res.errorJson);
        }
    }
}
await GetXmpMetaData.Run();

Run the application node GetXmpMetaData.js and the XML is output to the console.

Available on Maven:

https://search.maven.org/search?q=g:com.dynamicpdf.api

<dependency>
  <groupId>com.dynamicpdf.api</groupId>
  <artifactId>dynamicpdf-api</artifactId>
  <version>1.0.0</version>
</dependency>

Create a new Maven project and add the DynamicPDF API as a dependency.
Create a new class named GetXmpMetaData with a main method.
Create a new method named Run.
Add the Run method call to main.
Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
Add a call to the process method in the PdfXmp instance.
Ensure the call was successful and add code to print the results to the console.
Run the application and the XML metadata is printed to the console.

package com.dynamicpdf.api.examples;

import com.dynamicpdf.api.PdfResource;
import com.dynamicpdf.api.PdfXmp;
import com.dynamicpdf.api.XmlResponse;

public class GetXmpMetaData {

	public static void main(String[] args) {
GetXmpMetaData.Run("DP.xxx-api-key-xxx",
				"C:/temp/dynamicpdf-api-samples/get-xmp-metadata/");
	}

	public static void Run(String apiKey, String basePath) {
        
        //load local pdf as a PdfResource and add to
        // PdfXmp instance
		PdfResource resource = new PdfResource(basePath + "/fw4.pdf");
		PdfXmp pdfXmp = new PdfXmp(resource);
		pdfXmp.setApiKey(apiKey);
        
        //call the endpoint
		XmlResponse response = pdfXmp.process();

        //if successful then print xml to console
        
		if (response.getIsSuccessful()) {
			System.out.println(response.getContent());
		} else {
			System.out.println(response.getErrorJson());
		}
	}
}

Available as a Composer package:

composer require dynamicpdf/api

Use composer to ensure you have the required PHP libraries.
Create a new class named GetXmpMetaData.
Add a Run method.
Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
Add a call to the Process method in the PdfXmp instance.
Ensure the call was successful and add code to print the results to the console.
Add the call to GetXmpMetaData::Run() method.

<?php
    
require __DIR__ . '/vendor/autoload.php';

use DynamicPDF\Api\PdfXmp;
use DynamicPDF\Api\PdfResource;

class GetXmpMetaData {

    private static string $BasePath = "C:/temp/dynamicpdf-api-samples/get-xmp-metadata";

    public static function Run()
    {
        //get the PDF and load as PdfResource then add to PdfXmp
        $resource = new PdfResource(GetXmpMetaData::$BasePath . "/fw4.pdf");
        $pdfXmp = new PdfXmp($resource);
        $pdfXmp->ApiKey = "DP.xxx-api-key-xxx";
        
        //call the endpoint to get the results
        $response = $pdfXmp->Process();
        
        //print xml results to console
        if($response->IsSuccessful)
        {
            echo($response->Content);
        } else {
            echo($response->ErrorMessage);
        }
    }
}
GetXmpMetaData::Run();

Run the application php GetXmpMetaData.php and the XML metadata is printed to the console.

Available as a GO package: https://pkg.go.dev/github.com/dynamicpdf-api/go-client

Ensure you have the required GO libraries.
Create a new file named pdf-xmp-example.go.
Add a main method.
Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
Add a call to the Process method in the PdfXmp instance.
Ensure the call was successful and add code to print the results to the console.
Run the application go run pdf-xmp-example.go and the XML metadata is printed to the console.

package main

import (
	"fmt"
	"github.com/dynamicpdf-api/go-client/endpoint"
	"github.com/dynamicpdf-api/go-client/resource"
)

func main() {

	resource := resource.NewPdfResourceWithResourcePath("C:/temp/dynamicpdf-api-samples/fw4.pdf", "fw4.pdf")
	xmp := endpoint.NewPdfXmp(resource)
    xmp.Endpoint.BaseUrl = "https://api.dpdf.io/"
    xmp.Endpoint.ApiKey = "DP.xxx-api-key-xxx"

    resp := xmp.Process()
    res := <-resp
	
	if res.IsSuccessful() == true {
		fmt.Print(string(res.Content().Bytes()))
	}
}

Available at: pip install dynamicpdf-api

Ensure you have the required Python libraries.
Create a new file named PdfXmpExample.py.
Add a run method.
Create a new PdfXmp instance and PdfResource instance with the path to the PDF in its constructor.
Add a call to the process method in the PdfXmp instance.
Ensure the call was successful and add code to print the results to the console.
Run the application python PdfXmpExample.py and the XML metadata is printed to the console.

from dynamicpdf_api.pdf_xmp import PdfXmp
from dynamicpdf_api.pdf_resource import PdfResource

def run(api_key):
    resource = PdfResource("C:/temp/dynamicpdf-api-samples/pdf-info/fw4.pdf")
    pdf_info = PdfXmp(resource)
    pdf_info.api_key = api_key
    response = pdf_info.process() 
    print(response.content)

if __name__ == "__main__":
    api_key = "DP.xxx-api-key-xxx"
    run(api_key)

In all six languages, the steps were similar. First, we created a new PdfResource instance by loading the path to the PDF via the constructor. Next, we created a new instance of the PdfXmp class, which abstracts the pdf-xmp endpoint. Then the PdfXmp instance prints the XML metadata after processing. Finally, we called the Process method and printed the resultant XML to the console.

Follow us on social media for latest news!

Required Resources​

Obtaining API Key​

Calling API Directly Using POST​

Calling Endpoint Using Client Library​

Complete Source​

Required Resources

Obtaining API Key

Calling API Directly Using POST

Calling Endpoint Using Client Library

Complete Source