[]
        
(Showing Draft Content)

Access Primitive and High-Level PDF Objects

A PDF document consists of some primitive and high-level PDF objects. Generally, a PDF document contains nine primitive types of objects and can be interpreted as a graph of linked primitive PDF objects, where an object is one of the following types defined in the PDF specification:

  • PDF array

  • PDF bool

  • PDF dictionary

  • PDF name

  • PDF null

  • PDF number

  • PDF reference

  • PDF stream

  • PDF string

All high-level PDF objects in object model (such as Page, AnnotationBase, Action, etc.) are implemented as wrappers around primitive PDF objects. A wrapper contains a reference to the underlying primitive PDF type (PdfDict, PdfArray, PdfDictObject, etc.) and provides methods and properties for accessing and manipulating the underlying object. The root class for all high-level objects is PdfWrapperBase; it contains a reference to the underlying PDF primitive object defined by IPdfObject.

GcPdf allows you to work directly with the primitive objects used to build all the high-level entities in a PDF document, such as DocumentInfo, a PDF dictionary, using the following listed interfaces and classes, and their methods and properties in GrapeCity.Documents.Pdf.Spec namespace:

type=warning

Note: All the types and their members mentioned below are for advanced users only. The reader of this document must have a basic idea of PDF specification, direct and indirect PDF objects, and how a PDF file is organized.

Interface/Class

Description

IPdfObject

It is the common interface supported by all PDF objects in a GcPdfDocument that are persisted in a PDF file. Indirect and ObjID properties allow you to identify indirect PDF objects and IDs of the PDF objects.

IPdfArray

It is the common interface implemented by PdfArray, PdfArrayObject, and PdfArrayWrapper types.

IPdfArrayExt

It contains extension methods for the IPdfArray interface.

IPdfDict

It is the common interface implemented by PdfDict, PdfDictObject, and PdfDictWrapper types.

IPdfDictExt

It contains extension methods for the IPdfDict interface.

IPdfName

It is the common interface for PdfName and PdfNameObject.

IPdfNameExt

It contains extension methods for the IPdfName interface.

IPdfNumber

It is the common interface for PdfNumber and PdfNumberObject.

IPdfNumberExt

It contains extension methods for the IPdfNumber interface.

IPdfRef

It is the common interface for PdfRef and PdfRefObject.

IPdfRefExt

It contains extension methods for the IPdfRef interface.

IPdfString

It is the common interface for PdfString and PdfStringObject.

IPdfStringExt

It contains extension methods for the IPdfString interface.

IPdfBool

It is the common interface for PdfBool and PdfBoolObject.

IPdfBoolExt

It contains extension methods for the IPdfBool interface.

IPdfNull

It is the common interface for PdfNull and PdfNullObject.

IPdfNullExt

It contains extension methods for the IPdfNull interface.

PdfArray

It represents a direct PDF array object.

PdfArrayObject

It represents an indirect PDF array object.

PdfArrayWrapper

It represents an array wrapper object.

PdfDict

It represents a direct PDF dictionary object.

PdfDictObject

It represents an indirect PDF dictionary object.

PdfDictWrapper

It represents a dictionary wrapper object.

PdfName

It represents a direct PDF name object. This class overrides GetHashCode() and Equals(object) methods and defines the equality and inequality operators. This class is immutable.

PdfNameObject

It represents an indirect PDF name object.

PdfNumber

It represents a direct PDF number object. The class overrides GetHashCode() and Equals(object) methods and defines the equality and inequality operators. This class is immutable.

PdfNumberObject

It represents an indirect PDF number object.

PdfStreamObjectBase

It represents a PDF stream. It is always an indirect object, as a stream cannot be a direct object in PDF.

PdfRef

It represents a direct PDF reference object. This class overrides GetHashCode() and Equals(object) methods. The class is immutable.

PdfRefObject

It represents an indirect PDF reference object.

PdfString

It represents a direct PDF string object. This class overrides GetHashCode() and Equals(object) methods and defines the equality and inequality operators. The class is immutable.

PdfStringObject

It represents an indirect PDF string object.

PdfBool

It represents a direct PDF bool object. You cannot create instances of this class from user code; the two predefined instances are PdfBool.True and PdfBool.False. Overrides GetHashCode() and Equals(object), which define equality and inequality operators.

PdfBoolObject

It represents an indirect PDF bool object.

PdfNull

It represents a direct PDF null object. You cannot create instances of this class from user code; instead, use the PdfNull.Instance predefined instance. It overrides GetHashCode() and Equals(object), which define equality and inequality operators. This class is immutable.

PdfNullObject

It represents an indirect PDF null object.

The PDF specification defines the properties that can be present in this dictionary (Creator, Author, etc.), but PDF producers can add arbitrary custom properties, such as the SourceModified property, which is often found in various real-world PDF files. Types from GrapeCity.Documents.Pdf.Spec namespace allow you to access (read, write, or edit) such custom elements.

Since most high-level objects in a PDF file are PDF dictionaries, in the GcPdf API, the corresponding objects are derived from the PdfDictWrapper class, which in turn is derived from PdfWrapperBase and uses IPdfDict as the underlying object. The GetPdfStream, GetPdfStreamInfo, and GetPdfStreamData methods of PdfWrapperBase can retrieve data from the PDF stream associated with the PDF dictionary.

Each high-level PDF object (depending on its type) implements one of the primitive interfaces, so the extension methods of GrapeCity.Documents.Pdf.

Refer to the following example code to get image properties from a PDF document:

// Initialize GcPdfDocument.
GcPdfDocument doc = new GcPdfDocument();
                
// Load PDF document.
doc.Load(fs);

// Get image from the PDF document.
var imgs = doc.GetImages();
var pi = imgs[0].Image;

// Write image ID.
Console.WriteLine($"PdfImage object ID: {pi.ObjID}");

/* The PdfImage is a descendant of PdfDictWrapper object and has a lot of methods
   that allow you to get properties and data from the underlying PDF stream object. */
using (PdfStreamInfo psi = pi.GetPdfStreamInfo())
{
    // Get image information such as length filter name, filter decode parameters, etc.
    Console.WriteLine($"    Image stream length: {psi.Stream.Length}");
    Console.WriteLine($"        ImageFilterName: {psi.ImageFilterName}");
    Console.WriteLine($"ImageFilterDecodeParams: {psi.ImageFilterDecodeParams}");
    
    // Dump content of ImageFilterDecodeParams.
    foreach (var kvp in psi.ImageFilterDecodeParams.Dict)
    {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }
    
    // Get value of BlackIs1.
    var blackIs1 = psi.ImageFilterDecodeParams.GetBool(PdfName.Std.BlackIs1, null);
    Console.WriteLine($"BlackIs1: {blackIs1}");
}
                
// Dump properties of PdfImage dictionary.
Console.WriteLine();
Console.WriteLine("Properties of PdfImage dictionary:");
foreach (KeyValuePair<PdfName, IPdfObject> kvp in pi.PdfDict.Dict)
{
    Console.WriteLine($"{kvp.Key}: {kvp.Value}");
}
                
// Get color space and bits per component.
var cs = pi.Get<IPdfObject>(PdfName.Std.ColorSpace);
Console.WriteLine($"ColorSpace: {cs.GetType().Name} {cs}");
var bpc = pi.Get<IPdfObject>(PdfName.Std.BitsPerComponent);
Console.WriteLine($"BitsPerComponent: {bpc?.GetType().Name} {bpc}");