public class PDFTextPage
extends java.lang.Object
PDFTextPage
represents an object which contains all PDF
text.
The PDFTextPage
class defines methods to operate the PDF page text and
it can also obtain other special text informations, including text link and
text selections and so on.
It can be constructed by the following ways:
Example:
PDFTextPage textPage = PDFTextPage.create(pdfPage); PDFTextPage textPage = PDFTextPage.create(pdfPage, TEXT_PARSEOPTION_STREAMORDER);
PDFPage
,
PDFTextLink
,
PDFTextSearch
,
PDFTextSelection
Modifier and Type | Class and Description |
---|---|
static class |
PDFTextPage.CharInfo
CharInfo is an inner class defined for a information of
character. |
Modifier and Type | Field and Description |
---|---|
static int |
CHARSTATE_GENERATED
Character is generated by Foxit, such as space character.
|
static int |
CHARSTATE_NONUNICODE
Character doesn't have its own unicode value .
|
static int |
CHARSTATE_NORMAL
Normal character.
|
static int |
TEXT_PARSEOPTION_OUTPUTHYPHEN
Parse the text content of PDF page with outputting the hyphen on a line feed.
|
static int |
TEXT_PARSEOPTION_STREAMORDER
Parse the text content of PDF page by the stream order.
|
static int |
TEXTDIRECTION_DOWN
Text direction: down.
|
static int |
TEXTDIRECTION_LEFT
Text direction: left.
|
static int |
TEXTDIRECTION_RIGHT
Text direction: right.
|
static int |
TEXTDIRECTION_UP
Text direction: up.
|
Modifier and Type | Method and Description |
---|---|
int |
countChars()
Get the count of characters in a page.
|
static PDFTextPage |
create(PDFPage page)
Create a new
PDFTextPage object with a specified
PDFPage object |
static PDFTextPage |
create(PDFPage page,
int option)
Create a new
PDFTextPage object with option from a specified
PDFPage object |
void |
exportToFile(FileHandler file)
Export text content in a page to a specific file.
|
PDFTextLink |
extractLinks()
Process a PDF page text object to get URL formatted texts (as
hyperlinks).
|
int |
getCharIndexAtPos(float x,
float y,
float tolerance)
Get index of a character at or nearby position on the page.
|
PDFTextPage.CharInfo |
getCharInfo(int charIndex)
Get character information of a specific character.
|
java.lang.String |
getChars(int start,
int count)
Get text content in a page, within a specific character range.
|
long |
getHandle()
Get the text page handle.
|
int |
getNextCharIndexByDirection(int curIndex,
int direction)
Deprecated.
Current function will be deprecated in future. So, not recommend to use current function any more.
Get index of the next character of a specific character in a specific
direction.
|
void |
release()
Release all resources allocated for a
PDFTextPage object. |
PDFTextSelection |
selectByRange(int start,
int count)
Get a text selection handle by specific character range.
|
PDFTextSelection |
selectByRectangle(android.graphics.RectF rect)
Get a text selection handle by specific rectangle.
|
PDFTextSearch |
startSearch(java.lang.String text,
int flags,
int startPostion)
Start a PDF text search process.
|
public static final int TEXTDIRECTION_LEFT
public static final int TEXTDIRECTION_RIGHT
public static final int TEXTDIRECTION_UP
public static final int TEXTDIRECTION_DOWN
public static final int CHARSTATE_NORMAL
public static final int CHARSTATE_GENERATED
public static final int CHARSTATE_NONUNICODE
public static final int TEXT_PARSEOPTION_STREAMORDER
public static final int TEXT_PARSEOPTION_OUTPUTHYPHEN
public long getHandle()
public static PDFTextPage create(PDFPage page) throws PDFException
PDFTextPage
object with a specified
PDFPage
object
Prepare information about all characters in a page.
The application must call function
to release the created text page handle, when no need to use it any more.release()
page
- The specified PDFPage
object.PDFTextPage
object to receive a new PDF text page
object if success.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFPage
public void release() throws PDFException
PDFTextPage
object.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
public int countChars() throws PDFException
Generated characters, such as additional space and new line characters,
are also counted.
Characters in a page are from a "stream". Inside the stream, each
character has an index. This index is used in most PDF text page related
functions and the first character in the page has an index value of zero.
PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
public java.lang.String getChars(int start, int count) throws PDFException
start
- A zero-based index of character. It'll be the first character
in text content. Range: from 0 to (charCount
-1).
charCount
is returned by function
countChars()
.count
- Count of characters to be got. -1 means to get the whole
characters in the page. Range: above -1.count
is larger than (
charCount
- start
), all the rest
character (from start
) will be got.String
object that receives the text string.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
public int getCharIndexAtPos(float x, float y, float tolerance) throws PDFException
x
- X position in PDF "user space".y
- Y position in PDF "user space".tolerance
- Tolerance value for character hit detection, in point units.
This should not be a negative.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
public int getNextCharIndexByDirection(int curIndex, int direction) throws PDFException
curIndex
- A zero-based index for current character. Range: from 0 to
(charCount - 1).charCount
is returned by function
countChars()
.direction
- Indicates the direction to get next character. Please refer to constant
definitions
TEXTDIRECTION_XXX
and
this should be one of these constants.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
public PDFTextLink extractLinks() throws PDFException
This function must be called before any other hyperlink related functions
can be used. Application should release this handle if it is not used any
more by calling function
.PDFTextLink.release()
PDFTextLink
object to receive a new PDF text link
object if successful.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFTextLink
public PDFTextSearch startSearch(java.lang.String text, int flags, int startPostion) throws PDFException
This function starts a search process. Then function
or
PDFTextSearch.findNext()
should be called to find
the first matched pattern.PDFTextSearch.findPrev()
text
- The string to be found.flags
- Indicate the find options. 0 means no special finding options.
And it can also be one or combination of the followings:
startPostion
- A zero-based index specifying the character from which the
search will start. Range: from -1 to (charcount-1).
-1 means from the end of the page. charcount
is
returned by function countChars()
.PDFTextSearch
object to receive a new PDF text
search object if successful.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFTextSearch
public void exportToFile(FileHandler file) throws PDFException
file
- A FileHandler
object which can be a file to
export the text.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
FileHandler
public PDFTextSelection selectByRange(int start, int count) throws PDFException
Application should release this handle if it is not used any more by
calling function
.PDFTextSelection.release()
start
- A zero-based index of the start character. Range: from 0 to
(charCount - 1).charCount
is returned by function
countChars()
.count
- Count of characters to be extracted. -1 means covering the
whole characters in the page.PDFTextSelection
object to receive a new PDF text
selection object if successful.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFTextSelection
public PDFTextSelection selectByRectangle(android.graphics.RectF rect) throws PDFException
Application should release this handle if it is not used any more by
calling function
.PDFTextSelection.release()
rect
- A RectF
object that specifies rectangle range for
selection.PDFTextSelection
object to receive a new PDF text
selection object if successful.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFTextSelection
public PDFTextPage.CharInfo getCharInfo(int charIndex) throws PDFException
charIndex
- A zero-based index of character. Range: from 0 to (charCount -
1). charCount
is returned by function
countChars()
.CharInfo
object that has: State of character, Font size of character,
measured in points (about 1/72 inch), XY position of the
character origin, Matrix of character,etc.null
.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFTextPage.CharInfo
public static PDFTextPage create(PDFPage page, int option) throws PDFException
PDFTextPage
object with option from a specified
PDFPage
object
Prepare information about all characters in a page, according to the specific option.
The application must call function
to release the created text
page handle, when no need to use it any more.release()
page
- The specified PDFPage
object.option
- An integer value that specifies the parsing option.PDFTextPage.TEXT_PARSEOPTION_XXX
and this should be one or a combination of these constants.create(PDFPage)
.PDFTextPage
object to receive a new PDF text page handle if success.PDFException
- For more exception information please see definitions
PDFException.ERRCODE_XXX
.PDFException
,
PDFPage