Documentation
¶
Overview ¶
Package xmltext provides a streaming XML 1.0 tokenizer that returns caller-owned bytes and avoids building a DOM.
Index ¶
- type Attr
- type Decoder
- func (d *Decoder) InputOffset() int64
- func (d *Decoder) ReadTokenInto(dst *Token) error
- func (d *Decoder) ReadValueInto(dst []byte) (int, error)
- func (d *Decoder) Reset(r io.Reader, opts ...Options)
- func (d *Decoder) SkipValue() error
- func (d *Decoder) StackPointer() string
- func (d *Decoder) UnescapeInto(dst, data []byte) (int, error)
- type Kind
- type Options
- func CoalesceCharData(value bool) Options
- func EmitComments(value bool) Options
- func EmitDirectives(value bool) Options
- func EmitPI(value bool) Options
- func FastValidation() Options
- func JoinOptions(srcs ...Options) Options
- func MaxAttrs(value int) Options
- func MaxDepth(value int) Options
- func MaxQNameInternEntries(value int) Options
- func MaxTokenSize(value int) Options
- func ResolveEntities(value bool) Options
- func Strict(value bool) Options
- func TrackLineColumn(value bool) Options
- func WithCharsetReader(fn func(label string, r io.Reader) (io.Reader, error)) Options
- func WithEntityMap(values map[string]string) Options
- type SyntaxError
- type Token
- type TokenSizes
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Attr ¶ added in v0.0.7
type Attr struct {
Name []byte
Value []byte
// ValueNeeds reports whether Value includes unresolved entity references.
ValueNeeds bool
}
Attr holds an attribute name and value for a start element token. Name and Value are backed by the Token that produced them.
type Decoder ¶
type Decoder struct {
// contains filtered or unexported fields
}
Decoder streams XML tokens and copies token bytes into caller-owned storage.
func NewDecoder ¶
NewDecoder creates a new XML decoder for the reader.
func (*Decoder) InputOffset ¶
InputOffset reports the absolute byte offset of the next read position.
func (*Decoder) ReadTokenInto ¶
ReadTokenInto reads the next XML token into dst. Slices in dst are overwritten on the next call that reuses dst.
func (*Decoder) ReadValueInto ¶ added in v0.0.7
ReadValueInto writes the next element subtree or token into dst and returns the number of bytes written. It returns io.ErrShortBuffer if dst is too small and still consumes the value.
func (*Decoder) StackPointer ¶
StackPointer renders the current stack path using local names.
type Options ¶
type Options struct {
// contains filtered or unexported fields
}
Options holds decoder configuration values. The zero value means no overrides.
func CoalesceCharData ¶
CoalesceCharData merges adjacent text tokens into a single CharData token.
func EmitComments ¶
EmitComments controls whether comment tokens are emitted.
func EmitDirectives ¶
EmitDirectives controls whether directive tokens are emitted.
func FastValidation ¶
func FastValidation() Options
FastValidation returns a preset tuned for validation throughput.
func JoinOptions ¶
JoinOptions combines multiple option sets into one in declaration order. Later options override earlier ones when set.
func MaxQNameInternEntries ¶
MaxQNameInternEntries limits the number of interned QNames. Zero means no limit.
func MaxTokenSize ¶
MaxTokenSize limits the maximum size of a single token in bytes. Tokens exactly MaxTokenSize bytes long are allowed.
func ResolveEntities ¶
ResolveEntities controls whether entity references are expanded.
func Strict ¶ added in v0.0.7
Strict enables XML declaration validation. It enforces version and encoding/standalone ordering and values.
func TrackLineColumn ¶
TrackLineColumn controls whether line and column tracking is enabled.
func WithCharsetReader ¶
WithCharsetReader registers a decoder for non-UTF-8/UTF-16 encodings.
func WithEntityMap ¶
WithEntityMap configures custom named entity replacements.
func (Options) QNameInternEntries ¶ added in v0.0.10
QNameInternEntries reports the configured QName interner limit.
type SyntaxError ¶
type SyntaxError struct {
// Err is the underlying parser error.
Err error
// Path is the stack path at the error location.
Path string
// Snippet is a short input slice near the failure point.
Snippet []byte
// Offset is the absolute byte offset in the input stream.
Offset int64
// Line is the 1-based line number when tracking is enabled.
Line int
// Column is the 1-based column number when tracking is enabled.
Column int
}
SyntaxError reports a well-formedness error with location context.
func (*SyntaxError) Error ¶
func (e *SyntaxError) Error() string
Error formats the syntax error with location and cause.
func (*SyntaxError) Unwrap ¶
func (e *SyntaxError) Unwrap() error
Unwrap exposes the underlying error.
type Token ¶
type Token struct {
Attrs []Attr
Text []byte
Name []byte
Line int
Column int
TextNeeds bool
IsXMLDecl bool
Kind Kind
// contains filtered or unexported fields
}
Token is a decoded XML token with caller-owned byte slices. Slices are backed by the Token's internal buffers and remain valid until the next ReadTokenInto call that reuses the Token.
func (*Token) Reserve ¶ added in v0.0.9
func (t *Token) Reserve(sizes TokenSizes)
Reserve ensures the token has at least the requested capacities. It resets the buffer lengths to zero.