PDF Parser

Attributes

URI:http://www.xmlpipe.org/xpe/pdf/filter/chunk
Type:filter
Namespace:http://www.xmlpipe.org/xpe/pdf
Owner:http://www.xmlpipe.org/xpe/pdf

Description

This filter converts a PDF file into an XML document

Parse a PDF document

To convert a PDF document into an XML document, the filter reacts to the following XML fragment


   <pdf:parse  href="{the URI of the PDF document}"  xmlns:pdf="http://www.xmlpipe.org/xpe/pdf" >
   </pdf:parse>

The filter will extract all text from the PDF document without any layout information, images, and other objects.


   <pdf:document  total="{the total number of pages}"  xmlns:pdf="http://www.xmlpipe.org/xpe/pdf" >
      <pdf:page  number="{page number}" >
         {Page content}
      </pdf:page>
   </pdf:document>

Elements

Element nameDescription
pdf:parse
The input element for parsing PDF

Attributes

Attribute nameDescription
href
This attribute specifies the URI of the PDF document to be parsed.
pdf:document
This element represents a PDF document

Attributes

Attribute nameDescription
total
The total number of pages.
pdf:page
Represents a PDF page.

Attributes

Attribute nameDescription
number
The current page number