foxit::pdf::TextPage Class Reference
Inheritance diagram for foxit::pdf::TextPage:
foxit::Base

Public Types

enum  TextParseFlags { e_ParseTextNormal = 0x0000, e_ParseTextOutputHyphen = 0x0001, e_ParseTextUseStreamOrder = 0x0002 }
 Enumeration for parsing flags used for text page. More...
 

Public Member Functions

 TextPage (const PDFPage &page, int flags=foxit::pdf::TextPage::e_ParseTextNormal)
 Constructor, from a parsed PDF page. More...
 
 TextPage (const TextPage &other)
 Constructor, with another TextPage object. More...
 
 ~TextPage ()
 Destructor.
 
common::Rotation GetBaselineRotation (int rect_index)
 Get the text trend (as rotation) of a specified rectangle. More...
 
int GetCharCount () const
 Get the count of all the characters. More...
 
WString GetChars (int start=0, int count=-1) const
 Get all the characters within a range specified by a start index and count. More...
 
int GetIndexAtPos (float x, float y, float tolerance) const
 Get the character index at or around a specified position on the page, in PDF coordinate system. More...
 
WString GetTextInRect (const RectF &rect) const
 Get the text within a rectangle, in PDF coordinate system. More...
 
RectF GetTextRect (int rect_index) const
 Get the text rectangle by the index. More...
 
RectFArray GetTextRectArrayByRect (const RectF rect)
 Get the array of all text rectangles within the specified rectangle region. More...
 
int GetTextRectCount (int start=0, int count=-1)
 Count the text rectangles within a range specified by a start index and count. More...
 
common::Range GetWordAtPos (float x, float y, float tolerance) const
 Get the character range of a word at or around a specified position on the page, in PDF coordinate system. More...
 
bool IsEmpty () const
 Check whether current object is empty or not. More...
 
bool operator!= (const TextPage &other) const
 Not equal operator. More...
 
TextPageoperator= (const TextPage &other)
 Assign operator. More...
 
bool operator== (const TextPage &other) const
 Equal operator. More...
 
- Public Member Functions inherited from foxit::Base
FS_HANDLE Handle () const
 Get the handle of current object. More...
 

Detailed Description

PDF text page represents all the text contents in a PDF page, according to a specified parsing flag for these text. Class TextPage can be used to retrieve information about text in a PDF page, such as single character, single word, text content within specified character range or rectangle and so on.
This class object can also be used to construct objects of other text related classes in order to do more operation for text contents or access specified information from text contents:

  • To search text in text contents of a PDF page, please construct a TextSearch object with TextPage object.
  • To access text that are used as a hypertext link, please construct a PageTextLinks object with TextPage object.
See also
TextSearch
PageTextLinks

Member Enumeration Documentation

◆ TextParseFlags

Enumeration for parsing flags used for text page.

Values of this enumeration can be used alone or in combination.

Enumerator
e_ParseTextNormal 

No special parsing options for text page.

e_ParseTextOutputHyphen 

Parse the text content of a PDF page with outputting the hyphen on a line feed.

e_ParseTextUseStreamOrder 

Parse the text content of a PDF page by the stream order.

Constructor & Destructor Documentation

◆ TextPage() [1/2]

foxit::pdf::TextPage::TextPage ( const PDFPage page,
int  flags = foxit::pdf::TextPage::e_ParseTextNormal 
)
explicit

Constructor, from a parsed PDF page.

Parameters
[in]pageA valid PDF page object. This page should has been parsed.
[in]flagsParsing flags for the text page. Please refer to values starting from TextPage::e_ParseTextNormal and this can be one or combination of these values.

◆ TextPage() [2/2]

foxit::pdf::TextPage::TextPage ( const TextPage other)

Constructor, with another TextPage object.

Parameters
[in]otherAnother TextPage object.

Member Function Documentation

◆ GetBaselineRotation()

common::Rotation foxit::pdf::TextPage::GetBaselineRotation ( int  rect_index)

Get the text trend (as rotation) of a specified rectangle.

Parameters
[in]rect_indexThe index of the rectangle to be retrieved. Valid range: from 0 to (count -1). count is returned by function TextPage::GetTextRectCount.
Returns
Text trend, as rotation value. Please refer to values starting from common::e_Rotation0 and this would be one of these values.

◆ GetCharCount()

int foxit::pdf::TextPage::GetCharCount ( ) const

Get the count of all the characters.

Returns
Count of characters.

◆ GetChars()

WString foxit::pdf::TextPage::GetChars ( int  start = 0,
int  count = -1 
) const

Get all the characters within a range specified by a start index and count.

Parameters
[in]startIndex of start character, which is the first character of the expected text content. Valid range: from 0 to (charcount -1). charcount is returned by function TextPage::GetCharCount. Default value: 0.
[in]countCount of characters to be retrieved. -1 means to get the whole characters from start_index to the end of PDF page. Especially, when parameter count is larger than (charcount - start), all the rest character (from start_index) will be retrieved. charcount is returned by function TextPage::GetCharCount. Default value: -1.
Returns
The characters within the specified character index range.

◆ GetIndexAtPos()

int foxit::pdf::TextPage::GetIndexAtPos ( float  x,
float  y,
float  tolerance 
) const

Get the character index at or around a specified position on the page, in PDF coordinate system.

Parameters
[in]xValue of x position, in PDF coordinate system.
[in]yValue of y position, in PDF coordinate system.
[in]toleranceTolerance value for character hit detection, in point units. This should not be a negative.
Returns
Index of the character, which is at or nearby point (x,y), starting from 0. Specially, if there are several characters near by point (x, y), the smallest character index will be returned. If there is no character at or nearby the point, -1 will be returned.

◆ GetTextInRect()

WString foxit::pdf::TextPage::GetTextInRect ( const RectF rect) const

Get the text within a rectangle, in PDF coordinate system.

Parameters
[in]rectA rectangle region, in PDF coordinate system.
Returns
Text string within the specified rectangle.

◆ GetTextRect()

RectF foxit::pdf::TextPage::GetTextRect ( int  rect_index) const

Get the text rectangle by the index.

Parameters
[in]rect_indexThe index of the rectangle to be retrieved. Valid range: from 0 to (count -1). count is returned by function TextPage::GetTextRectCount.
Returns
A specified text rectangle.

◆ GetTextRectArrayByRect()

RectFArray foxit::pdf::TextPage::GetTextRectArrayByRect ( const RectF  rect)

Get the array of all text rectangles within the specified rectangle region.

Parameters
[in]rectA rectangle region, in PDF coordinate system.
Returns
Text rectangle array within the specified rectangle.

◆ GetTextRectCount()

int foxit::pdf::TextPage::GetTextRectCount ( int  start = 0,
int  count = -1 
)

Count the text rectangles within a range specified by a start index and count.

Parameters
[in]startIndex of start character in the character index range. Valid range: from 0 to (charcount -1). charcount is returned by function TextPage::GetCharCount.
[in]countCount of characters in the character index range. -1 means to get the whole characters from start_index to the end of PDF page.
Returns
The count of text rectangles in the specified character index range. -1 means error.

◆ GetWordAtPos()

common::Range foxit::pdf::TextPage::GetWordAtPos ( float  x,
float  y,
float  tolerance 
) const

Get the character range of a word at or around a specified position on the page, in PDF coordinate system.

Currently, for Chinese/Japanese/Korean, only support to get a single character at or around the specified position.

Parameters
[in]xValue of x position, in PDF coordinate system.
[in]yValue of y position, in PDF coordinate system.
[in]toleranceTolerance value for word hit detection, in point units.This should not be a negative.
Returns
The character range that represents the expected word. There would be at most one valid range segment in this Range object. If returned Range object is empty, that means no such word is found.

◆ IsEmpty()

bool foxit::pdf::TextPage::IsEmpty ( ) const

Check whether current object is empty or not.

When the current object is empty, that means current object is useless.

Returns
true means current object is empty, while false means not.

◆ operator!=()

bool foxit::pdf::TextPage::operator!= ( const TextPage other) const

Not equal operator.

Parameters
[in]otherAnother TextPage object. This function will check if current object is not equal to this one.
Returns
true means not equal, while false means equal.

◆ operator=()

TextPage& foxit::pdf::TextPage::operator= ( const TextPage other)

Assign operator.

Parameters
[in]otherAnother TextPage object, whose value would be assigned to current object.
Returns
Reference to current object itself.

◆ operator==()

bool foxit::pdf::TextPage::operator== ( const TextPage other) const

Equal operator.

Parameters
[in]otherAnother TextPage object. This function will check if current object is equal to this one.
Returns
true means equal, while false means not equal.

Foxit Software Corporation Logo
@2018 Foxit Software Incorporated. All rights reserved.