Documentation
¶
Index ¶
- Constants
- func FilePathsForFileGrp(mets, fg string) ([]string, error)
- func FindUnicodesInRegionSorted(region *xmlquery.Node) []*xmlquery.Node
- func NodeForToken(doc *xmlquery.Node, t apoco.Token) (*xmlquery.Node, error)
- func Tokenize(mets string, fgs ...string) apoco.StreamFunc
- func TokenizeDirs(ext string, dirs ...string) apoco.StreamFunc
Constants ¶
const MIMEType = "application/vnd.prima.page+xml"
MIMEType defines the mime type for page xml documents.
Variables ¶
This section is empty.
Functions ¶
func FilePathsForFileGrp ¶
FilePathsForFileGrp returns the list of file paths for the given file group. The returned file paths are updated to be relative to the mets's file base directory.
func FindUnicodesInRegionSorted ¶
FindUnicodesInRegionSorted searches for the TextEquiv / Unicode nodes beneath a text region (TextRegion, Line, Word, Glyph). The returend node list is ordered by the TextEquiv's index entries (interpreted as integers).
func NodeForToken ¶
NodeForToken is just a tmp testing function.
func Tokenize ¶
func Tokenize(mets string, fgs ...string) apoco.StreamFunc
Tokenize returns a function that reads tokens from the page xml files of the given file groups. An empty token is inserted as sentry between the token of different file groups. The returned function ignores the input stream it just writes tokens to the output stream.
func TokenizeDirs ¶
func TokenizeDirs(ext string, dirs ...string) apoco.StreamFunc
TokenizeDirs returns a function that reads page xml files with a matching file extension from the given directories. The returned function ignores the input stream. It only writes tokens to the output stream.
Types ¶
This section is empty.