Foxit PDF SDK
FoxitPDFSDKPython2.TextPage Class Reference
Inheritance diagram for FoxitPDFSDKPython2.TextPage:
FoxitPDFSDKPython2.Base

Public Member Functions

def TextPage (other)
 Constructor, with another text page object.
More...
 
def TextPage (page, flags)
 Constructor, from a parsed PDF page.
More...
 
def GetBaselineRotation (rect_index)
 Get the text trend (as rotation) of a specified rectangle.
More...
 
def GetCharCount ()
 Get the count of all the characters.
More...
 
def GetCharInfo (char_index)
 Get character information of a specific character.
More...
 
def GetCharRange (rect)
 Get the character index range of all text rectangles within the specified rectangle region.
More...
 
def GetChars (start, count)
 Get all the characters within a range specified by a start index and count.
More...
 
def GetIndexAtPos (x, y, tolerance)
 Get the character index at or around a specified position on the page, in [PDF coordinate system] ().
More...
 
def GetText (flag)
 Get the page text.
More...
 
def GetTextInRect (rect)
 Get the text within a rectangle, in [PDF coordinate system] ().
More...
 
def GetTextRect (rect_index)
 Get the text rectangle by the index.
More...
 
def GetTextRectArrayByRect (rect)
 Get the array of all text rectangles within the specified rectangle region.
More...
 
def GetTextRectCount (start, count)
 Count the text rectangles within a range specified by a start index and count.
More...
 
def GetTextUnderAnnot (annot)
 Get the page text which intersect with a specified an annotation.
More...
 
def GetWordAtPos (x, y, tolerance)
 Get the character range of a word at or around a specified position on the page, in [PDF coordinate system] ().
More...
 
def IsEmpty ()
 Check whether current object is empty or not.
More...
 

Static Public Attributes

 e_ParseTextNormal = _fsdk.TextPage_e_ParseTextNormal
 Parse the text content of a PDF page by normalizing characters based on their positions in the PDF page.

 
 e_ParseTextOutputHyphen = _fsdk.TextPage_e_ParseTextOutputHyphen
 Parse the text content of a PDF page with outputting the hyphen on a line feed.

 
 e_ParseTextUseStreamOrder = _fsdk.TextPage_e_ParseTextUseStreamOrder
 Parse the text content of a PDF page by the stream order.

 
 e_TextDisplayOrder = _fsdk.TextPage_e_TextDisplayOrder
 If this is set, that means to get text content of a PDF page by the display order.

 
 e_TextStreamOrder = _fsdk.TextPage_e_TextStreamOrder
 If this is set, that means to get text content of a PDF page by the stream order.

 

Detailed Description

PDF text page represents all the text contents in a PDF page, according to a specified parsing flag for these text. Class TextPage can be used to retrieve information about text in a PDF page, such as single character, single word, text content within specified character range or rectangle and so on.
This class object can also be used to construct objects of other text related classes in order to do more operation for text contents or access specified information from text contents:

  • To search text in text contents of a PDF page, please construct a TextSearch object with text page object.
  • To access text that are used as a hypertext link, please construct a PageTextLinks object with text page object.


See also
TextSearch
PageTextLinks

Constructor & Destructor Documentation

◆ TextPage() [1/2]

def FoxitPDFSDKPython2.TextPage.TextPage (   page,
  flags 
)

Constructor, from a parsed PDF page.

Parameters
[in]pageA valid PDF page object. This page should has been parsed.
[in]flagsParsing flags for the text page. Please refer to values starting from FoxitPDFSDKPython2.TextPage.e_ParseTextNormal and this can be one or combination of these values.

◆ TextPage() [2/2]

def FoxitPDFSDKPython2.TextPage.TextPage (   other)

Constructor, with another text page object.

Parameters
[in]otherAnother text page object.

Member Function Documentation

◆ GetBaselineRotation()

def FoxitPDFSDKPython2.TextPage.GetBaselineRotation (   rect_index)

Get the text trend (as rotation) of a specified rectangle.

Parameters
[in]rect_indexThe index of the rectangle to be retrieved. Valid range: from 0 to (count -1). count is returned by function FoxitPDFSDKPython2.TextPage.GetTextRectCount .
Returns
Text trend, as rotation value. Please refer to values starting from FoxitPDFSDKPython2.e_Rotation0 and this would be one of these values.

◆ GetCharCount()

def FoxitPDFSDKPython2.TextPage.GetCharCount ( )

Get the count of all the characters.

Returns
Count of characters.

◆ GetCharInfo()

def FoxitPDFSDKPython2.TextPage.GetCharInfo (   char_index)

Get character information of a specific character.

Parameters
[in]char_indexA zero-based index of character. Range: from 0 to (charcount - 1).charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount .
Returns
Character information for the character speicifed by character index.

◆ GetCharRange()

def FoxitPDFSDKPython2.TextPage.GetCharRange (   rect)

Get the character index range of all text rectangles within the specified rectangle region.

Parameters
[in]rectA rectangle region, in [PDF coordinate system] ().
Returns
Character index range of all text rectangles within the specified rectangle region.

◆ GetChars()

def FoxitPDFSDKPython2.TextPage.GetChars (   start,
  count 
)

Get all the characters within a range specified by a start index and count.

Parameters
[in]startIndex of start character, which is the first character of the expected text content. Valid range: from 0 to (charcount -1). charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount .
[in]countCount of characters to be retrieved. -1 means to get the whole characters from start_index to the end of PDF page. Especially, when parameter count is larger than (charcount - start), all the rest character (from start_index) will be retrieved. charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount .
Returns
The characters within the specified character index range.

◆ GetIndexAtPos()

def FoxitPDFSDKPython2.TextPage.GetIndexAtPos (   x,
  y,
  tolerance 
)

Get the character index at or around a specified position on the page, in [PDF coordinate system] ().

Parameters
[in]xValue of x position, in [PDF coordinate system] ().
[in]yValue of y position, in [PDF coordinate system] ().
[in]toleranceTolerance value for character hit detection, in point units. This should not be a negative.
Returns
Index of the character, which is at or nearby point (x,y), starting from 0. Specially, if there are several characters near by point (x, y), the smallest character index will be returned. If there is no character at or nearby the point, -1 will be returned.

◆ GetText()

def FoxitPDFSDKPython2.TextPage.GetText (   flag)

Get the page text.

Parameters
[in]flagText order flag to decide how to get text content of the related PDF page. Please refer to values starting from FoxitPDFSDKPython2.TextPage.e_TextStreamOrder and this should be one of these values.
Returns
All the text content of the related PDF page, in specified text order.

◆ GetTextInRect()

def FoxitPDFSDKPython2.TextPage.GetTextInRect (   rect)

Get the text within a rectangle, in [PDF coordinate system] ().

Parameters
[in]rectA rectangle region, in [PDF coordinate system] ().
Returns
Text string within the specified rectangle.

◆ GetTextRect()

def FoxitPDFSDKPython2.TextPage.GetTextRect (   rect_index)

Get the text rectangle by the index.

Parameters
[in]rect_indexThe index of the rectangle to be retrieved. Valid range: from 0 to (count -1). count is returned by function FoxitPDFSDKPython2.TextPage.GetTextRectCount .
Returns
A specified text rectangle.

◆ GetTextRectArrayByRect()

def FoxitPDFSDKPython2.TextPage.GetTextRectArrayByRect (   rect)

Get the array of all text rectangles within the specified rectangle region.

Parameters
[in]rectA rectangle region, in [PDF coordinate system] ().
Returns
Text rectangle array within the specified rectangle.

◆ GetTextRectCount()

def FoxitPDFSDKPython2.TextPage.GetTextRectCount (   start,
  count 
)

Count the text rectangles within a range specified by a start index and count.

Parameters
[in]startIndex of start character in the character index range. Valid range: from 0 to (charcount -1). charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount .
[in]countCount of characters in the character index range. -1 means to get the whole characters from start_index to the end of PDF page.
Returns
The count of text rectangles in the specified character index range. -1 means error.

◆ GetTextUnderAnnot()

def FoxitPDFSDKPython2.TextPage.GetTextUnderAnnot (   annot)

Get the page text which intersect with a specified an annotation.

If the whole character or most part of the character intersects with an annotation, this character will be retrieved by current function.

Parameters
[in]annotAn annotation. Page text which intersects with this annotation is to be retrieved. Currently, only support text markup annotation (highlight/underline/strike-out/squggly annotations); for annotation in other types, this function will throw exception FoxitPDFSDKPython2.e_ErrUnsupported .
Returns
The text which intersects with the specified annotation.

◆ GetWordAtPos()

def FoxitPDFSDKPython2.TextPage.GetWordAtPos (   x,
  y,
  tolerance 
)

Get the character range of a word at or around a specified position on the page, in [PDF coordinate system] ().

Currently, for Chinese/Japanese/Korean, only support to get a single character at or around the specified position.

Parameters
[in]xValue of x position, in [PDF coordinate system] ().
[in]yValue of y position, in [PDF coordinate system] ().
[in]toleranceTolerance value for word hit detection, in point units.This should not be a negative.
Returns
The character range that represents the expected word. There would be at most one valid range segment in this range object. If returned range object is empty, that means no such word is found.

◆ IsEmpty()

def FoxitPDFSDKPython2.TextPage.IsEmpty ( )

Check whether current object is empty or not.

When the current object is empty, that means current object is useless.

Returns
true means current object is empty, while false means not.