Documentation
¶
Index ¶
- Variables
- func GetAttributeByKey(node *html.Node, key string) (html.Attribute, error)
- func GetChildren(node *html.Node) []*html.Node
- func GetElementNodeByTagName(name string, startNode *html.Node) *html.Node
- func GetElementsInTableRowByConditionForOneOfTheElements(tableNode *html.Node, cond func(n *html.Node) bool) []*html.Node
- func GetFirstTextNode(startNode *html.Node) *html.Node
- func GetFirstTextNodeWithCondition(startNode *html.Node, cond func(s string) bool) *html.Node
- func GetNextNodeByCondition(startNode *html.Node, cond func(node *html.Node) bool) *html.Node
- func GetNextNodesByCondition(startNode *html.Node, cond func(node *html.Node) bool) []*html.Node
- func GetNodeByCondition(startNode *html.Node, cond func(node *html.Node) bool) *html.Node
- func GetNodesByCondition(startNode *html.Node, cond func(node *html.Node) bool) []*html.Node
- func GetTextNodes(startNode *html.Node) []*html.Node
- func GetTextNodesByCondition(startNode *html.Node, cond func(s string) bool) []*html.Node
- func MakeByAttributeNameAndValueCondition(attributeName, attributeValue string) func(node *html.Node) bool
- func MakeByClassNameCondition(className string) func(node *html.Node) bool
- func MakeByIdCondition(id string) func(node *html.Node) bool
- func MakeByTagNameCondition(name string) func(node *html.Node) bool
- func MakeTextNodeComposite(textNodes []*html.Node, compositeRune string) string
- func MakeTextNodeCompositeWithNormalizerFunc(textNodes []*html.Node, compositeDelimiter string, ...) string
- func ParseSelectHTMLNode(selectNode *html.Node) (map[string]string, string, error)
- func WalkHtmlTree(node *html.Node, f func(n *html.Node) bool)
- type HtmlTable
- func (ht HtmlTable) GetColumnByIndex(j int) ([]string, string)
- func (ht HtmlTable) GetColumnByKey(key string) ([]string, int, bool)
- func (ht HtmlTable) GetColumnByKeyNum(key string, occurrence int) ([]string, int, bool)
- func (ht HtmlTable) GetElementByIndex(i, j int) string
- func (ht HtmlTable) GetElementByKeys(rowKey, columnKey string) (string, int, int, bool)
- func (ht HtmlTable) GetElementByKeysNum(rowKey, columnKey string, rowOccurrence, columnOccurrence int) (string, int, int, bool)
- func (ht HtmlTable) GetRowByIndex(i int) ([]string, string)
- func (ht HtmlTable) GetRowByKey(key string) ([]string, int, bool)
- func (ht HtmlTable) GetRowByKeyNum(key string, occurrence int) ([]string, int, bool)
Constants ¶
This section is empty.
Variables ¶
var TextRegex = regexp.MustCompile("[^!-~]") // without space
Functions ¶
func GetAttributeByKey ¶
func GetChildren ¶
GetChildren Same as below, return slice of pointers, even though considered bad practice, to be able to directly modify substructures of a bigger tree.
func GetElementNodeByTagName ¶
GetElementNodeByTagName Returns the first node with the given tag name provided a starting node Returns nil if none found
func GetElementsInTableRowByConditionForOneOfTheElements ¶
func GetElementsInTableRowByConditionForOneOfTheElements(tableNode *html.Node, cond func(n *html.Node) bool) []*html.Node
GetElementsInTableRowByConditionForOneOfTheElements Returns all children elements (with tag <td>) of the table row node with tag (<tr>), for which at least one children fulfills the provided condition cond
func GetNextNodeByCondition ¶
GetNextNodeByCondition Returns the first node for which the provided condition yields true, excluding the start node
func GetNextNodesByCondition ¶
GetNextNodesByCondition Return all nodes in the tree of startNode for which the provided condition yields true, excluding startNode. Note that this returns a slice with pointers to structs which is considered bad practice However, we do not want copies to the nodes but the actual pointers in case we want to modify nodes in part of a bigger tree structure.
func GetNodeByCondition ¶
GetNodeByCondition Returns the first node for which the provided condition yields true, including the start node
func GetNodesByCondition ¶
GetNodesByCondition Return all nodes in the tree of startNode for which the provided condition yields true, including startNode. Note that this returns a slice with pointers to structs which is considered bad practice However, we do not want copies to the nodes but the actual pointers in case we want to modify nodes in part of a bigger tree structure.
func GetTextNodesByCondition ¶
func MakeTextNodeComposite ¶
func ParseSelectHTMLNode ¶
ParseSelectHTMLNode Parses the html node with tag 'select' into its different options. Returns a map containing key: value as strings, in which key is the content text content of the option and value is the content of the 'value' attribute of this option.
If multiple options have the same content text, they will be overridden and only the last one is kept. Returns the currently selected option, which is the option with attribute 'selected' if it exists, otherwise the first occurring option.
If multiple options have the "selected" attribute, returns the last option that has it as "selectedOption" Returns nil map and nil error if no options were found.
Types ¶
type HtmlTable ¶
type HtmlTable struct { Headers []string // Headers, equal to TableData[0, :] in numpy expression Index []string // Index, equal to TableData[:, 0] in numpy expression TableData [][]string // All data excluding headers and index // contains filtered or unexported fields }
HtmlTable Represents an HTML table in a struct Contains only text content
func ParseHtmlTable ¶
func ParseHtmlTable(tableNode *html.Node, hasHeaderRow bool, hasIndexColumn bool, suffix string) (*HtmlTable, error)
ParseHtmlTable Parses a given html.Node which should point to a <table> ElementNode in a html tree to an HtmlTable Struct which can be used to easily look up existing indices, headers, and values. Content is set after normalizing with identity normalizer func, normalizer(s) = s. we append '{suffix}_{keyCount}' to keys which appear multiple times to make them unique. the first occurrence does not have this.
func ParseHtmlTableWithNormalizer ¶
func ParseHtmlTableWithNormalizer(tableNode *html.Node, hasHeaderRow bool, hasIndexColumn bool, suffix string, normalizerFunc func(string) string, allowCompositeTexts bool, compositeDelimiter string) (*HtmlTable, error)
ParseHtmlTableWithNormalizer Parses a given html.Node which should point to a <table> ElementNode in a html tree to an HtmlTable Struct which can be used to easily look up existing indices, headers, and values. Content is set after normalizing with normalizerFunc we append '{suffix}_{keyCount}' to keys which appear multiple times to make them unique. the first occurrence does not have this. TODO: describe the meaning of allowCompositeTexts and compositeDelimiter parameters
func (HtmlTable) GetColumnByIndex ¶
GetColumnByIndex Analogous to GetRowByIndex but for columns. You can check the length of columns via the length of the Headers. GetColumnByIndex(0) returns the index column.
func (HtmlTable) GetColumnByKey ¶
GetColumnByKey Analogous to GetRowByKey but for columns.
func (HtmlTable) GetColumnByKeyNum ¶
GetColumnByKeyNum Returns the column with the original key (with possibly multiple occurrences) and the num occurrence
func (HtmlTable) GetElementByIndex ¶
GetElementByIndex Returns the element in table data for row i and column j. Panics if either is out of bounds.
func (HtmlTable) GetElementByKeys ¶
GetElementByKeys Returns the element in table data with the provided row key and column key. returns "", false if at least one key is missing.
func (HtmlTable) GetElementByKeysNum ¶
func (ht HtmlTable) GetElementByKeysNum(rowKey, columnKey string, rowOccurrence, columnOccurrence int) (string, int, int, bool)
GetElementByKeysNum Returns the element in table data with the provided row key and column key and the corresponding occurrences. returns "", false if at least one key is missing.
func (HtmlTable) GetRowByIndex ¶
GetRowByIndex Returns a copy of the table row with index i as well as the key of the corresponding index. panics if the row is out of bounds You can check the length of rows via the length of the index. GetRowByIndex(0) returns the header row. GetRowByIndex(1) returns the first row below the header row, and so on. Note: There is always a header row. Even if during parsing no header row was specified, the resulting table will have an artificial header row like (Index 1 2 3 4 ...)
func (HtmlTable) GetRowByKey ¶
GetRowByKey Returns the copy of the row with the given key as index if it exists, else, returns (nil, false)