Extract XMP Metadata
Extract XMP meta-data from PDF documents using the
pdf-xmp
endpoint.The pdf-xmp
endpoint is for extracting XMP meta-data from PDF documents. In this tutorial we demonstrate just how easy it is to extract XMP meta-data from a PDF document via the pdf-xmp
endpoint. We first call the pdf-xmp
endpoint directly using REST.
We then use the DynamicPDF client libraries to illustrate using pdf-xmp
with the C#, Java, Node.js, and PHP client libraries.
Required Resources
To complete this tutorial, you must add the Get XMP Metadata sample to your samples
folder in your cloud storage space using the File Manager. After adding the sample resources, you should see a samples/get-xmp-metadata-pdf-endpoint
folder containing the resources for this tutorial.
Sample | Sample Folder | Resources |
---|---|---|
Get XMP Metadata | samples/get-xmp-metadata-pdf-endpoint | fw4.pdf |
- From the File Manager, download
fw4.pdf
to your local system; here we assume/temp/dynamicpdf-api-samples/get-xmp-metadata
. - After downloading, delete
fw4.pdf
from your cloud storage space using the File Manager.
Resource | Cloud/Local |
---|---|
fw4.pdf | local |
See Sample Resources for instructions on adding sample resources.
Obtaining API Key
This tutorial assumes a valid API key obtained from the DynamicPDF API's Portal
. Refer to the following for instructions on getting an API key.
If you are not familiar with the File Manager or Apps and API Keys, refer to the following tutorial and relevant Users Guide pages.
Calling API Directly Using POST
The pdf-xmp
endpoint takes a POST request. When using cURL, you specify the endpoint, the HTTP command, the API key and the local resources required. The following cURL command illustrates.
- Create a cURL POST request, where you pass the API key as a header and the PDF as binary data.
curl -X POST "https://api.dpdf.io/v1.0/pdf-xmp"
-H "Content-Type: application/pdf"
-H "Authorization: Bearer DP.xxx-api-key-xxx"
--data-binary "@c:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf"
- Execute the cURL command and the XML metadata is written to the commandline.
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.398682, 2009/08/10-13:00:47 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:subject>
<rdf:Bag>
<rdf:li>Fillable</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">Employee's Withholding Certificate</rdf:li>
</rdf:Alt>
</dc:description>
<dc:creator>
<rdf:Seq>
<rdf:li>SE:W:CAR:MP</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">2021 Form W-4</rdf:li>
</rdf:Alt>
</dc:title>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:CreatorTool>Adobe LiveCycle Designer ES 9.0</xmp:CreatorTool>
<xmp:MetadataDate>2020-12-31T09:12:43-05:00</xmp:MetadataDate>
<xmp:ModifyDate>2020-12-31T09:12:43-05:00</xmp:ModifyDate>
<xmp:CreateDate>2020-12-31T09:12:43-05:00</xmp:CreateDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Adobe LiveCycle Designer ES 9.0</pdf:Producer>
<pdf:Keywords>Fillable</pdf:Keywords>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:01d97a6e-5605-44ae-8015-54a82bc56c5c</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:9d6007b3-eacb-4f13-8d6b-da9d46b7dfb3</xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:desc="http://ns.adobe.com/xfa/promoted-desc/">
<desc:embeddedHref rdf:parseType="Resource">
<rdf:value>..\..\..\..\..\..\..\TFACS\Misc\logo\pencil.bmp</rdf:value>
<desc:ref>/template/subform[1]/subform[3]/draw[2]</desc:ref>
</desc:embeddedHref>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
Calling Endpoint Using Client Library
To simplify development, you can also use one of the DynamicPDF API client libraries. Use the client library of your choice to complete this tutorial section.
Complete Source
You can access the complete source for this project at one of the following GitHub projects.
Language | File Name | Location (package/namespace/etc.) | GitHub Project |
---|---|---|---|
Java | GetXmpMetaData.java | com.dynamicpdf.api.examples | https://github.com/dynamicpdf-api/java-client-examples |
C# | Program.cs | GetXmpMetaData | https://github.com/dynamicpdf-api/dotnet-client-examples |
Nodejs | GetXmpMetaData.js | nodejs-client-examples | https://github.com/dynamicpdf-api/nodejs-client-examples |
PHP | GetXmpMetaData.php | php-client-examples | https://github.com/dynamicpdf-api/php-client-examples |
GO | pdf-xmp-example.go | go-client-examples | https://github.com/dynamicpdf-api/go-client-examples/tree/main |
Python | PdfXmpExample.py | python-client-examples | https://github.com/dynamicpdf-api/python-client-examples |
Click on the language tab of choice to view the tutorial steps for the particular language.
- C# (.NET)
- Java
- Node.js
- PHP
- GO
- Python
Available on NuGet:
Install-Package DynamicPDF.API
- Create a new Console App (.NET Core) project named
GetXmpMetaData
. - Add the DynamicPDF.API NuGet package.
- Create a new static method named
Run
. - Add the following code to the
Run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance. - Ensure the call was successful and add code to print the results to the console.
- Run the application and the XML metadata is printed to the console.
using DynamicPDF.Api;
using System;
namespace GetXmpMetaData
{
class Program
{
static void Main(string[] args)
{
Run("DP.xxx-api-key-xxx", "C:/temp/dynamicpdf-api-samples/get-xmp-metadata");
}
public static void Run(String apiKey, String basePath)
{
//get the local pdf as pdf resource
PdfResource resource = new PdfResource(basePath + "/fw4.pdf");
//load the pdf and call the endpoint
PdfXmp pdfXmp = new PdfXmp(resource);
pdfXmp.ApiKey = apiKey;
XmlResponse response = pdfXmp.Process();
//if successful print results to console
if (response.IsSuccessful)
{
Console.WriteLine(response.Content);
}
else
{
Console.WriteLine(response.ErrorJson);
}
}
}
}
Available on NPM:
npm i @dynamicpdf/api
- Use npm to install the DynamicPDF API module.
- Create a new class named
GetXmpMetaData
. - Create a static
Run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance.
import fs from 'fs';
import {
PdfXmp,
PdfResource,
Endpoint
} from "@dynamicpdf/api"
export class GetXmpMetaData {
static async Run() {
//get Pdf as PdfResource and load into new PdfXmp
var resource = new PdfResource("C:/temp/dynamicpdf-api-samples/get-xmp-metadata/fw4.pdf")
var pdfXmp = new PdfXmp(resource);
pdfXmp.apiKey = "DP.xxx-api-key-xxx";
//call the endpoint too get results
var res = await pdfXmp.process();
//if call was successful print xml to console
if (res.isSuccessful) {
console.log(res.content);
} else {
console.log(res.errorJson);
}
}
}
await GetXmpMetaData.Run();
- Run the application
node GetXmpMetaData.js
and the XML is output to the console.
Available on Maven:
https://search.maven.org/search?q=g:com.dynamicpdf.api
<dependency>
<groupId>com.dynamicpdf.api</groupId>
<artifactId>dynamicpdf-api</artifactId>
<version>1.0.0</version>
</dependency>
-
Create a new Maven project and add the DynamicPDF API as a dependency.
-
Create a new class named
GetXmpMetaData
with amain
method. -
Create a new method named
Run
. -
Add the
Run
method call tomain
. -
Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. -
Add a call to the
process
method in thePdfXmp
instance. -
Ensure the call was successful and add code to print the results to the console.
-
Run the application and the XML metadata is printed to the console.
package com.dynamicpdf.api.examples;
import com.dynamicpdf.api.PdfResource;
import com.dynamicpdf.api.PdfXmp;
import com.dynamicpdf.api.XmlResponse;
public class GetXmpMetaData {
public static void main(String[] args) {
GetXmpMetaData.Run("DP.xxx-api-key-xxx",
"C:/temp/dynamicpdf-api-samples/get-xmp-metadata/");
}
public static void Run(String apiKey, String basePath) {
//load local pdf as a PdfResource and add to
// PdfXmp instance
PdfResource resource = new PdfResource(basePath + "/fw4.pdf");
PdfXmp pdfXmp = new PdfXmp(resource);
pdfXmp.setApiKey(apiKey);
//call the endpoint
XmlResponse response = pdfXmp.process();
//if successful then print xml to console
if (response.getIsSuccessful()) {
System.out.println(response.getContent());
} else {
System.out.println(response.getErrorJson());
}
}
}
Available as a Composer package:
composer require dynamicpdf/api
- Use composer to ensure you have the required PHP libraries.
- Create a new class named
GetXmpMetaData
. - Add a
Run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance. - Ensure the call was successful and add code to print the results to the console.
- Add the call to
GetXmpMetaData::Run()
method.
<?php
require __DIR__ . '/vendor/autoload.php';
use DynamicPDF\Api\PdfXmp;
use DynamicPDF\Api\PdfResource;
class GetXmpMetaData {
private static string $BasePath = "C:/temp/dynamicpdf-api-samples/get-xmp-metadata";
public static function Run()
{
//get the PDF and load as PdfResource then add to PdfXmp
$resource = new PdfResource(GetXmpMetaData::$BasePath . "/fw4.pdf");
$pdfXmp = new PdfXmp($resource);
$pdfXmp->ApiKey = "DP.xxx-api-key-xxx";
//call the endpoint to get the results
$response = $pdfXmp->Process();
//print xml results to console
if($response->IsSuccessful)
{
echo($response->Content);
} else {
echo($response->ErrorMessage);
}
}
}
GetXmpMetaData::Run();
- Run the application
php GetXmpMetaData.php
and the XML metadata is printed to the console.
Available as a GO package: https://pkg.go.dev/github.com/dynamicpdf-api/go-client
- Ensure you have the required GO libraries.
- Create a new file named
pdf-xmp-example.go
. - Add a
main
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
Process
method in thePdfXmp
instance. - Ensure the call was successful and add code to print the results to the console.
- Run the application
go run pdf-xmp-example.go
and the XML metadata is printed to the console.
package main
import (
"fmt"
"github.com/dynamicpdf-api/go-client/endpoint"
"github.com/dynamicpdf-api/go-client/resource"
)
func main() {
resource := resource.NewPdfResourceWithResourcePath("C:/temp/dynamicpdf-api-samples/fw4.pdf", "fw4.pdf")
xmp := endpoint.NewPdfXmp(resource)
xmp.Endpoint.BaseUrl = "https://api.dpdf.io/"
xmp.Endpoint.ApiKey = "DP.xxx-api-key-xxx"
resp := xmp.Process()
res := <-resp
if res.IsSuccessful() == true {
fmt.Print(string(res.Content().Bytes()))
}
}
Available at: pip install dynamicpdf-api
- Ensure you have the required Python libraries.
- Create a new file named
PdfXmpExample.py
. - Add a
run
method. - Create a new
PdfXmp
instance andPdfResource
instance with the path to the PDF in its constructor. - Add a call to the
process
method in thePdfXmp
instance. - Ensure the call was successful and add code to print the results to the console.
- Run the application
python PdfXmpExample.py
and the XML metadata is printed to the console.
from dynamicpdf_api.pdf_xmp import PdfXmp
from dynamicpdf_api.pdf_resource import PdfResource
def run(api_key):
resource = PdfResource("C:/temp/dynamicpdf-api-samples/pdf-info/fw4.pdf")
pdf_info = PdfXmp(resource)
pdf_info.api_key = api_key
response = pdf_info.process()
print(response.content)
if __name__ == "__main__":
api_key = "DP.xxx-api-key-xxx"
run(api_key)
In all six languages, the steps were similar. First, we created a new PdfResource
instance by loading the path to the PDF via the constructor. Next, we created a new instance of the PdfXmp
class, which abstracts the pdf-xmp
endpoint. Then the PdfXmp
instance prints the XML metadata after processing. Finally, we called the Process
method and printed the resultant XML to the console.