HTMLAnalyzer Toolkit 1.0 Guide

Content


HTMLDocAnalyzer Use

Dim oAnalyzer As New HTMLDocAnalyzer
Dim oItem As HTMLAObject
Dim i As Long

oAnalyzer.Analyze "TEST.HTML"
For i = 1 To oAnalyzer.Count

Set oItem = oAnalyzer.GetObject(i)
' process oItem

Next
 

HTMLDocAnalyzer Properties

ConvertCharRefs  Returns or sets the handling of character references. True converts character references to text objects.
 
Count  Returns the number of HTML objects in the HTML document.
 
File  Returns the name of the analyzed HTML file.
 
SN  Returns the serial number.
 
UpperCaseAttributeNames  Returns or sets upper-case conversion for attribute names.
 
UpperCaseTagNames  Returns or sets upper-case conversion for tag names.
 

HTMLDocAnalyzer Methods

Analyze

Analyzes an HTML file and divides it into a sequence of HTML objects: start/end tags, text, decimal character references, hexadecimal character references (HTML 4.0 feature), named character references, line breaks, DOCTYPE declarations, comments and errors.

Syntax:
Long Analyze( fileName )

String fileName

Return Value:
SHTAErrNo
SHTAErrFileError
SHTAErrParseError
SHTAErrInvalidToken
SHTAErrMemoryError
SHTAErrUnknownError
SHTAErrLicenseError
SHTAErrParserError

Note:
The analyzing process depends on FilterAdd, TagNameFilterAdd, ConvertCharRefs, UpperCaseAttributeNames and UpperCaseTagNames.
 

Clear Frees any memory used by the current HTML file.

Syntax:
Clear( )
 
FilterAdd

Restricts analyzing to specific HTML object types.

Syntax:
Boolean FilterAdd( objectType )

Long objectType

objectType can be one of the following:

SHTAObjectTypeUnknown
SHTAObjectTypeTagStart
SHTAObjectTypeTagEnd
SHTAObjectTypeText
SHTAObjectTypeDocType
SHTAObjectTypeCharRefNumDec
SHTAObjectTypeCharRefNumHex
SHTAObjectTypeCharRefName
SHTAObjectTypeComment
SHTAObjectTypeEol

Note:
Adding SHTAObjectTypeText automatically enables the following types:

SHTAObjectTypeCharRefNumDec,
SHTAObjectTypeCharRefNumHex,
SHTAObjectTypeCharRefName and
SHTAObjectTypeEol

SHTAObjectTypeError is always enabled.
 

FilterClear Enables all HTML object types.

Syntax:
FilterClear( )
 
GetObject

Returns a reference to an HTML object.

Syntax:
HTMLAObject GetObject( i )

Long i

Note:
The index i must range from 1 to Count.
 

Register

Registers the HTMLAnalyzer Toolkit on the computer.

Syntax:
Boolean Register( sn, key )

String sn
String key

 

TagName

Returns the name of a start or end tag without creating a COM object.

Syntax:
String TagName( i )

Long i

 

TagNameFilterAdd

Restricts start tags and end tags by name. The filter is NOT case sensitive.

Syntax:
Boolean TagNameFilterAdd( tagName )

String tagName

 

TagNameFilterClear Enables ALL start tags and end tags.

Syntax:
TagNameFilterClear( )
 
Type

Returns the type of an HTML object without creating a COM object.

Syntax:
Long Type( i )

Long i

Return Value:
SHTAObjectTypeUnknown
SHTAObjectTypeTagStart
SHTAObjectTypeTagEnd
SHTAObjectTypeText
SHTAObjectTypeDocType
SHTAObjectTypeCharRefNumDec
SHTAObjectTypeCharRefNumHex
SHTAObjectTypeCharRefName
SHTAObjectTypeComment
SHTAObjectTypeEol
SHTAObjectTypeError

See Also:
HTMLAObject.Type
 

 


HTMLAObject Use

 
Dim oItem As HTMLAObject
Dim i As Long

If oItem.Type = SHTAObjectTypeTagStart Then

For i = 1 To oItem.AttributeCount

Debug.Print oItem.AttributeName(i)

Next

End If

 

HTMLAObject Properties 

AttributeCount Returns the number of attributes.
 
Data Returns the associated data.

Type Data
SHTAObjectTypeUnknown Information found in the HTML file
SHTAObjectTypeTagStart Tag name
SHTAObjectTypeTagEnd Tag name
SHTAObjectTypeText Normal text
SHTAObjectTypeDocType "DOCTYPE"
SHTAObjectTypeCharRefNumDec Decimal number
SHTAObjectTypeCharRefNumHex Hexadecimal number
SHTAObjectTypeName Name
SHTAObjectTypeComment Comment content without "<!--" and "-->"
SHTAObjectTypeEol Line delimiter CR, LF, CRLF
SHTAObjectTypeError Full or partial HTML data that has generated the error

 
DocTypeParamCount  Returns the number of DOCTYPE parameters.
 
ErrorNumber Returns an error number. Only error objects can return a value other than SHTAOErrNo.

Return Value:
SHTAOErrNo
SHTAOErrParseError
SHTAOErrCriticalParseError
SHTAOErrInvalidToken
SHTAOErrInvalidCharRef
SHTAOErrFile
 
IsEmptySign Returns true if the start tag contains an empty element sign "/>" (HTML 4.0 feature).
 
Line Returns the starting line.
 
Offset Returns the starting offset.
 
Type Returns the object type.

Return Value:
SHTAObjectTypeUnknown
SHTAObjectTypeTagStart
SHTAObjectTypeTagEnd
SHTAObjectTypeText
SHTAObjectTypeDocType
SHTAObjectTypeCharRefNumDec
SHTAObjectTypeCharRefNumHex
SHTAObjectTypeCharRefName
SHTAObjectTypeComment
SHTAObjectTypeEol
SHTAObjectTypeError
 
 

 

HTMLAObject Methods

AttributeFind

Returns the position of an attribute or 0 if it does not exist.

Syntax:
Long AttributeFind( attributeName )

String attributeName

 

AttributeIsBoolean

Returns true if an attribute is of type boolean.
For example: <.... nowrap ...>

Syntax:
Boolean AttributeIsBoolean( position )

Long position

 

AttributeName

Returns the attribute name.

Syntax:
String AttributeName( position )

Long position

 

AttributeUnitData

Returns the unit data of an attribute.

Syntax:
String AttributeUnitData( position )

Long position

 

AttributeUnitType

Returns the unit type of an attribute.

Syntax:
Long AttributeUnitType( position )

Long position

Return Value:
SHTAUnitTypeNull
SHTAUnitTypePercent
SHTAUnitTypeRel
SHTAUnitTypeUnknown
 

AttributeValue

Returns an attribute value using an attribute name. The value does not contain type-specific decoration (#, ", ') and unit information (%,*).

Syntax:
String AttributeValue( attributeName )

String attributeName

 

AttributeValueData

Returns the value data of an attribute. The data does not contain type-specific decoration (#, ", ') and unit information (%,*).

Syntax:
String AttributeValueData( position )

Long position

 

AttributeValueType

Returns the value type of an attribute.

Syntax:
Long AttributeValueType( position )

Long position

Return Value:
SHTAValueTypeNull
SHTAValueTypeNumber
SHTAValueTypeHexNumber
SHTAValueTypeString
SHTAValueTypeText
 

DocTypeParamType 

Returns the parameter type.

Syntax:
Long DocTypeParamType( position )

Long position

Return Value:
SHTAValueTypeNull
SHTAValueTypeNumber
SHTAValueTypeHexNumber
SHTAValueTypeString
SHTAValueTypeText
 

DocTypeParamValue

Returns the parameter value. Type-specific decoration is removed.

Syntax:
String DocTypeParamValue( postion )

Long position

 

 


URLAnalyzer Use

 
Dim oAnalyzer As New URLAnalyzer
Dim i As Long

oAnalyzer.Analyze( "http://www.software-systems.de" )

For i = 1 To oAnalyzer.Count

Debug.Print oAnalyzer.ComponentData(i)

Next

URLAnalyzer Properties

ConvertEscapes  Returns or sets handling of URL escape sequences. True, '%xx', '+' and '%%' are converted.
 
Count Returns the number of URL components.
 
Data Returns the original URL text
 
SplitNet Returns or sets splitting of network locations. True, the analyzer splits "user:password@host:port" into separate components.
 

URLAnalyzer Methods

Analyze

Analyzes an URL string and divides it into components. URLs are processed from left to right.

Syntax:
Boolean Analyze( urlText )

String urlText

Return Value:
False, an error occurred.
 

ComponentData

Returns the component data of an entry. Any type-specific decoration
(:, //, ?, #) is removed.

Syntax:
String ComponentData( i )

Long i

 

ComponentType

Returns the component type of an entry.

Syntax:
Long ComponentType( i )

Long i

Return Value:
SHTAURLCompTypeUnknown
SHTAURLCompTypeScheme
SHTAURLCompTypeNet
SHTAURLCompTypeNetHost
SHTAURLCompTypeNetPort
SHTAURLCompTypeNetUser
SHTAURLCompTypeNetPW
SHTAURLCompTypePathAbs
SHTAURLCompTypePathRel
SHTAURLCompTypeQuery
SHTAURLCompTypeParams
SHTAURLCompTypeFragment
SHTAURLCompTypeNewsMsgId
SHTAURLCompTypeNewsMsgName
SHTAURLCompTypeMailAddress
 

Join

Helper method that takes two URLs and generates a new one. Level controls ('.', '..') are processed accordingly.

Syntax:
String Join( baseURL, secondURL )

String baseURL
String secondURL

Return Value:
Concatenated URL

baseURL secondURL result

absolute

relative

baseURL + secondURL

absolute

absolute

secondURL

relative

absolute

secondURL

empty

absolute

secondURL

empty

relative

secondURL

Note:
baseURL should be an absolute URL or empty. If Join() cannot process the two URLs an empty string is returned.
 

URLDecode

Helper method that decodes URL escapes.

Syntax:
String URLDecode( urlText )

String urlText

 

 


ValueAnalyzer Use

 
Dim oAnalyzer As New ValueAnalyzer
Dim i As Long

oAnalyzer.Analyze( "10%, 50%" )

For i = 1 To oAnalyzer.Count

Debug.Print oAnalyzer.ValueData(i)

Next

ValueAnalyzer Properties

Count Returns the number of values.
 
Data Returns the original text.
 
Separator Returns or sets the separator dividing values. The default is ASCII 44 for ','
 

ValueAnalyzer Methods

Analyze

Analyzes a string and divides it into separate values. The string may contain serveral delimited values. The default delimiter is the character ',' and can be changed with the property Separator.

Syntax:
Boolean Analyze( text )

String text

Return Value:
True, analyzing was successful.
 

UnitData

Returns the unit data of an entry.

Syntax:
String UnitData( i )

Long i

 

UnitType

Returns the unit type of an entry.

Syntax:
Long UnitType( i )

Long i

Return Value:
SHTAUnitTypeNull
SHTAUnitTypePercent
SHTAUnitTypeRel
SHTAUnitTypeUnknown
 

ValueData

Returns the value data of an entry.

Syntax:
String ValueData( i )

Long i

 

ValueType

Returns the value type of an entry.

Syntax:
Long ValueType( i )

Long i

Return Value:
SHTAValueTypeNull
SHTAValueTypeNumber
SHTAValueTypeHexNumber
SHTAValueTypeString
SHTAValueTypeText
 


Constants

SHTAObjectType Values

Symbol Description Value
SHTAObjectTypeUnknown Unknown object 0
SHTAObjectTypeTagStart Start tag 1
SHTAObjectTypeTagEnd End tag 2
SHTAObjectTypeText Normal text 3
SHTAObjectTypeDocType DOCTYPE declaration 4
SHTAObjectTypeCharRefNumDec Decimal character reference 5
HTAObjectTypeCharRefNumHex SHTAObjectTypeCharRefNumHex Hexadecimal character reference (HTML 4.0 feature) 6
SHTAObjectTypeCharRefName Named character reference 7
SHTAObjectTypeComment Normal comment 8
SHTAObjectTypeEol End of line 9
SHTAObjectTypeError Error 10

SHTAValueType Values

Symbol Description Value
SHTAValueTypeNull No value 0
SHTAValueTypeNumber Decimal number 1
SHTAValueTypeHexNumber Hexadecimal number 2
SHTAValueTypeString Text within double or single quotes 3
SHTAValueTypeText Text without quotes 4

SHTAUnitType Values

Symbol Description Value
SHTAUnitTypeNull No type information 0
SHTAUnitTypePercent Percent (%) 1
SHTAUnitTypeRel Relative (*) 2
SHTAUnitTypeUnknown Unknown type 3

SHTAOErrorCode Values

Symbol Description Value
SHTAOErrNo Success 0
SHTAOErrParseError Parse error 1
SHTAOErrCriticalParseError Unrecoverable parse error 2
SHTAOErrInvalidToken Invalid token found 3
SHTAOErrInvalidCharRef Character reference is invalid, e.g. wrong name 4
SHTAOErrFile Problem with HTML file 5

SHTAErrorCode Values

Symbol Description Value
SHTAErrNo Success 0
SHTAErrFileError OS file error 1
SHTAErrParseError Parse error detected 2
SHTAErrInvalidToken Invalid token found during parsing 3
SHTAErrMemoryError Not enough memory 4
SHTAErrUnknownErro Error cause is unknown 5
SHTAErrLicenseError Version is not licensed 6
SHTAErrParserError Problem with parser engine 7