[]
GcPdf allows text search in a PDF document to find all occurences of the specified text. The library supports all common find text options including regular expressions, case-sensitive search, etc. It also works across line breaks, so logically connected text that is rendered on different text lines can also be found. You can use FindText method of the GcPdfDocument to search text in a PDF document. This method accepts object of FindTextParams and OutputRange class as parameters to find all the occurrences of the searched string in the loaded document. The FindTextParams class represents the target text to be searched. The class also lets you incorporate other useful search options discussed below:
FindText method returns a list of all occurrences of the searched text. You can iterate through the list and highlight the search results using FillPolygon and DrawPolygon methods of the GcGraphics class.
The example below shows how to search and highlight a text string in a PDF document:
public void CreatePDF(Stream stream)
{
//load file
var doc = new GcPdfDocument();
using var fs = File.OpenRead("TimeSheet.pdf");
doc.Load(fs);
//define text bounds
var findText = new FindTextParams("HOURS", true, false);
//find text
IList findTextList = doc.FindText(findText);
//highlight text
foreach (FoundPosition text in findTextList)
{
//get bounds of each occurrence of found text
var g = doc.Pages[text.PageIndex].Graphics;
Quadrilateral[] pos = text.Bounds;
//highlight the text
g.DrawPolygon(pos[0], Color.Yellow, 1);
g.FillPolygon(pos[0], Color.FromArgb(100, Color.OrangeRed));
}
//save pdf
var newDoc = new GcPdfDocument();
newDoc.Load(fs);
doc.Save("FindText.pdf");
}
Case sensitivity is also one of the criteria while searching for a text string. Using GcPdf library, you can specify whether the text search should be case sensitive or not. To search for a text with matching case, you can set matchCase parameter of the FindTextParams method to true.
The example below shows how to search strings having specific case in a PDF document:
//find word “time”, the word “Time” or “TIME” will be ignored
var findWord = new FindTextParams("time", false, true);
var findText = doc.FindText(findWord);
GcPdf lets you search for a whole word or you can also search for instances that are subset of a certain word present in the PDF document. To search for a whole word, you can set wholeWord parameter of the FindTextParams method to true.
The example below shows how to search whole word strings in a PDF document:
//find word “time”, the word “overtime” will be ignored
var findWord = new FindTextParams("Time", true, false);
var findText = doc.FindText(findWord);
Regular expressions are useful when you want to search variable text strings that use common pattern such as date, time, email address, etc. instead of searching a particular text or phrase. To search using regular expressions, you need to pass regular expression as a string parameter to the FindTextParams method and set its regex parameter to true.
//finds all the dates present in PDF document, using regular expressions
var findWord = new FindTextParams(@"\d+[/-]\w+[/-]\d\d", false, false, 72, 72, false, true);
var findText = doc.FindText(findWord);
For more information about implementation of text search using GcPdf, see GcPdf sample browser.
With GcPdf, you can replace a text in the whole document or its specific page by using ReplaceText method which is available in the GcPdfDocument and Page classes, and on the ITextMap interface. This method accepts the object of FindTextParams class and the new text string along with other parameters to find and replace all occurrences of the target text. It searches the target text and replaces it with the new text along with adjusting the space required to accommodate the replaced text.
The code below shows how to replace a text in the whole document:
// replace word ".NET Standard 2.0" with ".NET 6" in document
using (FileStream fs = new FileStream(@"..\..\..\DotnetFramework.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams(".NET Standard 2.0", true, false);
doc.ReplaceText(ftp, ".NET 6", null, null, null);
doc.Save("DotnetFramework_Document.pdf");
}
GcPdf allows you to delete a text in the whole document or a specific page by using DeleteText method which is available in the GcPdfDocument and Page classes and, on the ITextMap interface. This method accepts the object of FindTextParams class and DeleteTextMode enumeration. The DeleteTextMode enumeration provides two options - ‘Standard’ and ‘PreserveSpace’ which represent two modes of deleting text in a Pdf document.
On deleting a text in the Standard mode, the text following the deleted text shifts to fill the void created by deleted text. However, in the PreserveSpace mode, the document retains an empty space at the place of the deleted text and text after the deleted text does not move.
The code below shows how to delete a text string from first page of a PDF document using standard mode:
// delete word "wetlands" from the first page using DeleteTextMode.Standard
using (FileStream fs = new FileStream(@"..\..\..\Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
{
GcPdfDocument doc = new GcPdfDocument();
doc.Load(fs);
FindTextParams ftp = new FindTextParams("wetlands", true, false);
doc.Pages[0].DeleteText(ftp, DeleteTextMode.Standard);
doc.Save("wetlands_deleted.pdf");
}