Foxit PDF SDK
|
Public Member Functions | |
def | TextPage (other) |
Constructor, with another text page object. More... | |
def | TextPage (page, flags) |
Constructor, from a parsed PDF page. More... | |
def | GetBaselineRotation (rect_index) |
Get the text trend (as rotation) of a specified rectangle. More... | |
def | GetCharCount () |
Get the count of all the characters. More... | |
def | GetCharInfo (char_index) |
Get character information of a specific character. More... | |
def | GetCharRange (rect) |
Get the character index range of all text rectangles within the specified rectangle region. More... | |
def | GetChars (start, count) |
Get all the characters within a range specified by a start index and count. More... | |
def | GetIndexAtPos (x, y, tolerance) |
Get the character index at or around a specified position on the page, in [PDF coordinate system] (). More... | |
def | GetText (flag) |
Get the page text. More... | |
def | GetTextInRect (rect) |
Get the text within a rectangle, in [PDF coordinate system] (). More... | |
def | GetTextRect (rect_index) |
Get the text rectangle by the index. More... | |
def | GetTextRectArrayByRect (rect) |
Get the array of all text rectangles within the specified rectangle region. More... | |
def | GetTextRectCount (start, count) |
Count the text rectangles within a range specified by a start index and count. More... | |
def | GetTextUnderAnnot (annot) |
Get the page text which intersect with a specified an annotation. More... | |
def | GetWordAtPos (x, y, tolerance) |
Get the character range of a word at or around a specified position on the page, in [PDF coordinate system] (). More... | |
def | IsEmpty () |
Check whether current object is empty or not. More... | |
Static Public Attributes | |
e_ParseTextNormal = _fsdk.TextPage_e_ParseTextNormal | |
Parse the text content of a PDF page by normalizing characters based on their positions in the PDF page. | |
e_ParseTextOutputHyphen = _fsdk.TextPage_e_ParseTextOutputHyphen | |
Parse the text content of a PDF page with outputting the hyphen on a line feed. | |
e_ParseTextUseStreamOrder = _fsdk.TextPage_e_ParseTextUseStreamOrder | |
Parse the text content of a PDF page by the stream order. | |
e_TextDisplayOrder = _fsdk.TextPage_e_TextDisplayOrder | |
If this is set, that means to get text content of a PDF page by the display order. | |
e_TextStreamOrder = _fsdk.TextPage_e_TextStreamOrder | |
If this is set, that means to get text content of a PDF page by the stream order. | |
PDF text page represents all the text contents in a PDF page, according to a specified parsing flag for these text. Class TextPage can be used to retrieve information about text in a PDF page, such as single character, single word, text content within specified character range or rectangle and so on.
This class object can also be used to construct objects of other text related classes in order to do more operation for text contents or access specified information from text contents:
def FoxitPDFSDKPython2.TextPage.TextPage | ( | page, | |
flags | |||
) |
Constructor, from a parsed PDF page.
[in] | page | A valid PDF page object. This page should has been parsed. |
[in] | flags | Parsing flags for the text page. Please refer to values starting from FoxitPDFSDKPython2.TextPage.e_ParseTextNormal and this can be one or combination of these values. |
def FoxitPDFSDKPython2.TextPage.TextPage | ( | other | ) |
Constructor, with another text page object.
[in] | other | Another text page object. |
def FoxitPDFSDKPython2.TextPage.GetBaselineRotation | ( | rect_index | ) |
Get the text trend (as rotation) of a specified rectangle.
[in] | rect_index | The index of the rectangle to be retrieved. Valid range: from 0 to (count -1). count is returned by function FoxitPDFSDKPython2.TextPage.GetTextRectCount . |
def FoxitPDFSDKPython2.TextPage.GetCharCount | ( | ) |
Get the count of all the characters.
def FoxitPDFSDKPython2.TextPage.GetCharInfo | ( | char_index | ) |
Get character information of a specific character.
[in] | char_index | A zero-based index of character. Range: from 0 to (charcount - 1).charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount . |
def FoxitPDFSDKPython2.TextPage.GetCharRange | ( | rect | ) |
Get the character index range of all text rectangles within the specified rectangle region.
[in] | rect | A rectangle region, in [PDF coordinate system] (). |
def FoxitPDFSDKPython2.TextPage.GetChars | ( | start, | |
count | |||
) |
Get all the characters within a range specified by a start index and count.
[in] | start | Index of start character, which is the first character of the expected text content. Valid range: from 0 to (charcount -1). charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount . |
[in] | count | Count of characters to be retrieved. -1 means to get the whole characters from start_index to the end of PDF page. Especially, when parameter count is larger than (charcount - start), all the rest character (from start_index) will be retrieved. charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount . |
def FoxitPDFSDKPython2.TextPage.GetIndexAtPos | ( | x, | |
y, | |||
tolerance | |||
) |
Get the character index at or around a specified position on the page, in [PDF coordinate system] ().
[in] | x | Value of x position, in [PDF coordinate system] (). |
[in] | y | Value of y position, in [PDF coordinate system] (). |
[in] | tolerance | Tolerance value for character hit detection, in point units. This should not be a negative. |
def FoxitPDFSDKPython2.TextPage.GetText | ( | flag | ) |
Get the page text.
[in] | flag | Text order flag to decide how to get text content of the related PDF page. Please refer to values starting from FoxitPDFSDKPython2.TextPage.e_TextStreamOrder and this should be one of these values. |
def FoxitPDFSDKPython2.TextPage.GetTextInRect | ( | rect | ) |
Get the text within a rectangle, in [PDF coordinate system] ().
[in] | rect | A rectangle region, in [PDF coordinate system] (). |
def FoxitPDFSDKPython2.TextPage.GetTextRect | ( | rect_index | ) |
Get the text rectangle by the index.
[in] | rect_index | The index of the rectangle to be retrieved. Valid range: from 0 to (count -1). count is returned by function FoxitPDFSDKPython2.TextPage.GetTextRectCount . |
def FoxitPDFSDKPython2.TextPage.GetTextRectArrayByRect | ( | rect | ) |
Get the array of all text rectangles within the specified rectangle region.
[in] | rect | A rectangle region, in [PDF coordinate system] (). |
def FoxitPDFSDKPython2.TextPage.GetTextRectCount | ( | start, | |
count | |||
) |
Count the text rectangles within a range specified by a start index and count.
[in] | start | Index of start character in the character index range. Valid range: from 0 to (charcount -1). charcount is returned by function FoxitPDFSDKPython2.TextPage.GetCharCount . |
[in] | count | Count of characters in the character index range. -1 means to get the whole characters from start_index to the end of PDF page. |
def FoxitPDFSDKPython2.TextPage.GetTextUnderAnnot | ( | annot | ) |
Get the page text which intersect with a specified an annotation.
If the whole character or most part of the character intersects with an annotation, this character will be retrieved by current function.
[in] | annot | An annotation. Page text which intersects with this annotation is to be retrieved. Currently, only support text markup annotation (highlight/underline/strike-out/squggly annotations); for annotation in other types, this function will throw exception FoxitPDFSDKPython2.e_ErrUnsupported . |
def FoxitPDFSDKPython2.TextPage.GetWordAtPos | ( | x, | |
y, | |||
tolerance | |||
) |
Get the character range of a word at or around a specified position on the page, in [PDF coordinate system] ().
Currently, for Chinese/Japanese/Korean, only support to get a single character at or around the specified position.
[in] | x | Value of x position, in [PDF coordinate system] (). |
[in] | y | Value of y position, in [PDF coordinate system] (). |
[in] | tolerance | Tolerance value for word hit detection, in point units.This should not be a negative. |
def FoxitPDFSDKPython2.TextPage.IsEmpty | ( | ) |
Check whether current object is empty or not.
When the current object is empty, that means current object is useless.