goabnf

package module
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2026 License: MIT Imports: 10 Imported by: 12

README

Go-ABNF

reference go report Coverage Status
License CI CodeQL
OpenSSF Scoreboard OpenSSF Best Practices Summary

Go module to handle Augmented Backus-Naur Form (ABNF), providing a large API. It implements RFC 5234 and 7405, with Errata 2968 and 3076.

Capabilities:

  • parse ABNF (to manipulable datastructure ; with cycle detection)
  • compile ABNF to regex
  • create a minimal set of tests that covers the full grammar
  • generate a visual representation of the ABNF grammar provided (mermaid)
  • create an ABNF fuzzer for your modules (version >= Go1.18beta1)
  • support Unicode rather than UTF-8

How it works

Under the hood, go-abnf is a dependency-free brute-force parser. It enumerates all possibilities for a given grammar and an input, and returns all possible paths. Those then have to be evaluated in order to produce a new grammar.

As this implementation is not adhesive to the ABNF grammar of the ABNF grammar as defined in RFC 5234, updated by RFC 7405 and fixed by Erratum 2968 and 3076, it enables genericity. This imply that for any valid grammar in ABNF that is properly evaluated, if you can write an evaluator for this grammar, you can parse new input with the original grammar. To init this loop, we had to hardcode the manual parsing and evaluation of the ABNF grammar, reviewed multiple times.

Examples can be found in the examples directory

Fuzzing

As go-abnf revolves around grammars, you can use a random walk to traverse its graph and efficiently generate valid inputs according to a given grammar.

This is particularly powerful when you want to fuzz Go implementations that require a very specific input format that the Go's fuzzing engine can't produce.

You can use go-abnf to efficiently produce test cases, as follows.

package example

import (
	_ "embed"
	"testing"

	goabnf "github.com/pandatix/go-abnf"
)

//go:embed my-grammar.abnf
var myGrammar []byte

func FuzzFunction(f *testing.F) {
	g, err := goabnf.ParseABNF(myGrammar)
	if err != nil {
		f.Fatal(err)
	}

	f.Fuzz(func(t *testing.T, seed int64) {
		// Generate a random test case based on the seed
		b, _ := g.Generate(seed, "a",
			goabnf.WithRepMax(15),      // Limit repetitions to 15
			goabnf.WithThreshold(1024), // Stop ASAP input generation if reached 1024 bytes
		)

		Function(b)
	})
}

Troubleshooting

My ABNF grammar does not work

Q: My ABNF grammar does not work. Do you have any idea why ?

R: There could be many reasons to this. First make sure your grammar ends up by a newline (LF), and especially that the input content has a CR LF. As those appear the same, it is often a source of error.

Difference between pap and bap

Q: Is there a difference between pap and bap ?

R: Yes, first of all the language (i.e. Go) enables more portability thus integration in workflows. But the real difference between pap and bap resides is the way they work: pap is built on an opportunity to challenge bap whether bap is built to generate meaningfull errors to the end user. Out of this, pap goes further as it enables you to build the transition graph from a given grammar and fuzz Go code, but also support Unicode code points that might be usefull (e.g., TOML is specified in ABNF)

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrNoSolutionFound is an error returned when parsing an ABNF
	// grammar and no solution has been found.
	ErrNoSolutionFound = errors.New("no solution found, input ABNF grammar may be invalid")

	// ErrHandlingProseVal is an error returned when an operation tried
	// to produce something on a prose-val, but it can't be handled properly.
	ErrHandlingProseVal = errors.New("can't handle prose-val descriptions")
)
View Source
var ABNF = &Grammar{
	Rulemap: map[string]*Rule{
		abnfRulelist.Name:              abnfRulelist,
		abnfRule.Name:                  abnfRule,
		abnfRulename.Name:              abnfRulename,
		abnfDefinedAs.Name:             abnfDefinedAs,
		abnfElements.Name:              abnfElements,
		abnfCWsp.Name:                  abnfCWsp,
		abnfCNl.Name:                   abnfCNl,
		abnfComment.Name:               abnfComment,
		abnfAlternation.Name:           abnfAlternation,
		abnfConcatenation.Name:         abnfConcatenation,
		abnfRepetition.Name:            abnfRepetition,
		abnfRepeat.Name:                abnfRepeat,
		abnfElement.Name:               abnfElement,
		abnfGroup.Name:                 abnfGroup,
		abnfOption.Name:                abnfOption,
		abnfCharVal.Name:               abnfCharVal,
		abnfCaseInsensitiveString.Name: abnfCaseInsensitiveString,
		abnfCaseSensitiveString.Name:   abnfCaseSensitiveString,
		abnfQuotedString.Name:          abnfQuotedString,
		abnfNumVal.Name:                abnfNumVal,
		abnfBinVal.Name:                abnfBinVal,
		abnfDecVal.Name:                abnfDecVal,
		abnfHexVal.Name:                abnfHexVal,
		abnfProseVal.Name:              abnfProseVal,
	},
}

ABNF is the manually parsed+evaluated+validated ABNF grammar.

Functions

func SemvalABNF

func SemvalABNF(g *Grammar) error

SemvalABNF proceed to semantic validations of an ABNF grammar. It currently support the following checks: - for all rules, its dependencies (rules) exist - for repetition, min <= max - for num-val, that the value fits in 7-bits (US-ASCII encoded) To update this list, please open an issue.

func WithRepMax

func WithRepMax(repMax int) repMaxOption

WithRepMax defines the maximum repetition to stop generating at.

func WithThreshold

func WithThreshold(threshold int) thresholdOption

WithThreshold defines the length threshold to stop generating at.

Types

type ABNFOption added in v0.3.0

type ABNFOption interface {
	// contains filtered or unexported methods
}

ABNFOption defines the interface for parsing ABNF functional options.

func WithRedefineCoreRules added in v0.3.0

func WithRedefineCoreRules(redefine bool) ABNFOption

WithRedefineCoreRules returns a functional option to enable redefining a core rule. Default is false.

WARNING: use with caution, as we left to the user the responsibility to ensure the redefinition keeps the ABNF grammar coherent (i.e. there is an isomorphism between the core rule and the redefinition, not especially the same textual representation).

func WithValidation

func WithValidation(validate bool) ABNFOption

WithValidation returns a functional option to proceed to validation or not. Default is true.

type Alternation

type Alternation struct {
	// Concatenations contains the variants that validate the upper
	// object (i.e. rule, group or option).
	Concatenations []Concatenation
}

Alternation represents an ABNF alternation object.

This is exposed for custom evaluation purposes, please don't use it else.

func (Alternation) String

func (alt Alternation) String() string

type Concatenation

type Concatenation struct {
	// Repetitions contains the following repetitions, order matter.
	Repetitions []Repetition
}

Concatenation represents an ABNF concatenation object.

This is exposed for custom evaluation purposes, please don't use it else.

func (Concatenation) String

func (cnt Concatenation) String() string

type Depgraph

type Depgraph map[string]*node

Depgraph contains for each rule the rules it depends on. Notice it does not differentiate between a mandatory dependency or an avoidable one.

func (Depgraph) Mermaid

func (dg Depgraph) Mermaid() string

Mermaid returns a flowchart of the dependency graph.

type ElemCharVal

type ElemCharVal struct {
	// Sensitive is by default false, support clarrified by RFC 7405 hence
	// only works on a-z and A-Z.
	Sensitive bool
	Values    []rune
}

ElemCharVal represents an ABNF char-val object.

This is exposed for custom evaluation purposes, please don't use it else.

func (ElemCharVal) String

func (ecvl ElemCharVal) String() string

type ElemGroup

type ElemGroup struct {
	Alternation Alternation
}

ElemGroup represents an ABNF group object. It will be straightly passed through with its underlying alternation.

This is exposed for custom evaluation purposes, please don't use it else.

func (ElemGroup) String

func (egrp ElemGroup) String() string

type ElemItf

type ElemItf interface {
	fmt.Stringer
	// contains filtered or unexported methods
}

ElemItf defines the interface of all the element alternations: - ElemRulename - ElemGroup - ElemOption - ElemProseVal - ElemNumVal - ElemCharVal

This is exposed for custom evaluation purposes, please don't use it else.

type ElemNumVal

type ElemNumVal struct {
	Base string
	// Status could be:
	// - `statSeries`: `elems` contains all the expected
	//   values in the order of the grammar defined them ;
	// - `statRange`: `elems` contains the start and end
	//   bounds (so no more than two).
	Status Status
	Elems  []string
}

ElemNumVal represents an ABNF num-val object.

This is exposed for custom evaluation purposes, please don't use it else.

func (ElemNumVal) String

func (envl ElemNumVal) String() string

type ElemOption

type ElemOption struct {
	Alternation Alternation
}

ElemOption represents an ABNF option object. It will be straightly converted to a 0*1 repetition.

This is exposed for custom evaluation purposes, please don't use it else.

func (ElemOption) String

func (eopt ElemOption) String() string

type ElemProseVal

type ElemProseVal struct {
	// contains filtered or unexported fields
}

ElemProseVal represents an ABNF prose-val object.

This is exposed for custom evaluation purposes, please don't use it else.

func (ElemProseVal) String

func (epvl ElemProseVal) String() string

type ElemRulename

type ElemRulename struct {
	// Name is the unique rule name.
	// Notice it is case-insensitive according to RFC 5234 Section 2.1.
	Name string
}

ElemRulename represents an ABNF rulename object.

This is exposed for custom evaluation purposes, please don't use it else.

func (ElemRulename) String

func (erln ElemRulename) String() string

type ErrCoreRuleModify

type ErrCoreRuleModify struct {
	CoreRulename string
}

ErrCoreRuleModify is an error returned when an incremental alternative for a core rule.

func (ErrCoreRuleModify) Error

func (err ErrCoreRuleModify) Error() string

type ErrCyclicRule

type ErrCyclicRule struct {
	Rulename string
}

ErrCyclicRule is an error returned when can't work due to an unavoidable cyclic rule.

func (ErrCyclicRule) Error

func (err ErrCyclicRule) Error() string

type ErrDependencyNotFound

type ErrDependencyNotFound struct {
	Rulename string
}

ErrDependencyNotFound is an error returned during ABNF grammar semantic vaildation, if a rule depends on an unexisting rule.

func (ErrDependencyNotFound) Error

func (err ErrDependencyNotFound) Error() string

type ErrDuplicatedRule

type ErrDuplicatedRule struct {
	Rulename string
}

ErrDuplicatedRule is an error returned when the rule already exist as part of the grammar.

func (ErrDuplicatedRule) Error

func (err ErrDuplicatedRule) Error() string

type ErrMultipleSolutionsFound

type ErrMultipleSolutionsFound struct {
	Paths []*Path
}

ErrMultipleSolutionsFound is an error returned when a parser found multiple paths/solutions when none or one were expected.

func (ErrMultipleSolutionsFound) Error

func (err ErrMultipleSolutionsFound) Error() string

type ErrRuleNotFound

type ErrRuleNotFound struct {
	Rulename string
}

ErrRuleNotFound is an error returned when the rule was not found as part of the grammar.

func (ErrRuleNotFound) Error

func (err ErrRuleNotFound) Error() string

type ErrSemanticRepetition

type ErrSemanticRepetition struct {
	Repetition Repetition
}

ErrSemanticRepetition is an error returned during ABNF grammar semantic validation, if a repetition has min < max.

func (ErrSemanticRepetition) Error

func (err ErrSemanticRepetition) Error() string

type ErrTooLargeNumeral

type ErrTooLargeNumeral struct {
	Base, Value string
}

ErrTooLargeNumeral is an error returned when the numeral value provided to parse cannot be handled as a 7-bit US-ASCII valid value.

func (ErrTooLargeNumeral) Error

func (err ErrTooLargeNumeral) Error() string

type GenerateOption

type GenerateOption interface {
	// contains filtered or unexported methods
}

GenerateOption is an option for the *Grammar.Generate method.

type Grammar

type Grammar struct {
	Rulemap map[string]*Rule
}

Grammar represents an ABNF grammar as defined by RFC 5234. It is constituted of a set of rules with a unique name.

func EvaluateABNF added in v0.3.0

func EvaluateABNF(input []byte, path *Path, opts ...ABNFOption) (*Grammar, error)

EvaluateABNF is an evaluator for the ABNF structural model as per RFCs and Erratum.

func LexABNF deprecated

func LexABNF(input []byte, path *Path) (*Grammar, error)

LexABNF has been replaced by EvaluateABNF, please refer to it.

Deprecated: replace by EvaluateABNF.

func ParseABNF

func ParseABNF(input []byte, opts ...ABNFOption) (*Grammar, error)

ParseABNF is a helper facilitating the call to Parse using the pre-computed ABNF grammar and evaluated the resulting to produce a ready-to-use grammar.

func (*Grammar) DependencyGraph

func (g *Grammar) DependencyGraph() Depgraph

DependencyGraph creates a dependency graph for a whole grammar. It includes core rules only if necessary.

func (*Grammar) Generate

func (g *Grammar) Generate(seed int64, rulename string, opts ...GenerateOption) ([]byte, error)

Generate is an experimental feature that consumes a seed for a pseudo-random number generator, used to randomly travel through the grammar given a rulename to work on. With this, it provides reproducibility, usefull for fuzz crashers.

You can leverage its capacities (maximum repetitions, length threshold, etc.) with the functional options.

It is a good capability for testing and fuzzing parsers during testing, compliance, fuzzing or optimization.

func (*Grammar) IsDAG

func (g *Grammar) IsDAG() bool

IsDag find Strongly Connected Components using Tarjan's algorithm and returns whether it contains cycle or not. Could be improved by Nuutila's or Pearce's algorithms, or replaced by Kosaraju's algorithm.

func (*Grammar) IsLeftTerminating

func (g *Grammar) IsLeftTerminating(rulename string) (bool, error)

IsLeftTerminating returns whether the rule is not left terminating. It travels through the whole rule dependency graph, such that it checks if the rule has a way to left terminate.

Notice that it depends on the ordering your grammar, which could be illustrated by the ABNF rule "element" that begins with the alternation of a "rulename", which is terminating, and not by "option" or "group" which are not.

WARNING: it is different than RuleContainsCycle, refer to its doc.

func (*Grammar) IsValid

func (g *Grammar) IsValid(rulename string, input []byte) (bool, error)

IsValid checks there exist at least a path that completly consumes input, hence is valid given this grammar and especially one of its rule.

func (*Grammar) PrettyPrint

func (g *Grammar) PrettyPrint() string

PrettyPrint returns a prettified string that represents the grammar.

func (*Grammar) Regex

func (g *Grammar) Regex(rulename string) (string, error)

Regex builds a regex that validates the given rulename.

It imply that the rulename does not contain any cycle in its dependency graph as it would not be able to compile it.

Notice it produces non-optimised regular expressions, such that it is easy to produce a better-performing one by hand.

func (*Grammar) RuleContainsCycle

func (g *Grammar) RuleContainsCycle(rulename string) (bool, error)

RuleContainsCycle returns whether the rule contains a cycle or not. It travels through the whole rule dependency graph, such that it checks if the rule is cyclic AND if one of its dependency is too.

WARNING: it is different than IsLeftTerminating, refer to its doc.

func (*Grammar) String

func (g *Grammar) String() string

String returns the representation of the grammar that is valid according to the ABNF specifications/RFCs. This notably imply the use of CRLF instead of LF, and does not preserve the initial order nor pretty print it.

func (*Grammar) TransitionGraph added in v0.2.0

func (g *Grammar) TransitionGraph(rulename string, opts ...TGOption) (*TransitionGraph, error)

TransitionGraph builds a transition graph out of a grammar and the given rulename. TODO it is possible to build transition graph out of cylic rules iff it is not concatenated to another repetition (can't pipe O->I as there is no O). For instance, `a = "a" a` can exist.

type Node added in v0.2.0

type Node struct {
	ID   string
	Elem ElemItf

	Nexts []*Node
}

Node of a TransitionGraph. Unically identifiate the node of the transition graph along with its underlying element and its next nodes.

type Path

type Path struct {
	// Subpaths aka children. Ordering applies
	Subpaths []*Path
	// MatchRule in source's grammar ruleset
	MatchRule string
	// Start ≤ End
	Start, End int
}

Path represents a portion of an input that matched a rule from an index to another, with a composite structure.

Notice it does not matches specifically ABNF grammar, but any compatible grammar. The most common case is parsing an input with the ABNF grammar as source, which is then evaluated to fall back into a ready-to-go ABNF grammar of this input. There may exist specific cases where you want to use another grammar as source (e.g. EBNF grammar provided by parsing EBNF specification input written in ABNF with the ABNF grammar as source, which as itself been implemented from the ABNF specification of ABNF in the ABNF structure). For those cases, you can use this implementation as it uses a generic behavior, by parsing your source ABNF grammar first then use it to validate inputs.

func Parse

func Parse(input []byte, grammar *Grammar, rootRulename string) ([]*Path, error)

Parse parses an ABNF-compliant input using a grammar. It uses uses a top-down parsing strategy using backtracking in order to look for solutions. If many are found, it raises an error. If the input is invalid (gramatically, incomplete...) it returns an error of type *ErrParse.

type Repetition

type Repetition struct {
	Min, Max int
	Element  ElemItf
}

Repetition represents an ABNF repetition object.

This is exposed for custom evaluation purposes, please don't use it else.

func (Repetition) String

func (rep Repetition) String() string

type Rule

type Rule struct {
	// Name is the unique rule name.
	// Notice it is case-insensitive according to RFC 5234 Section 2.1.
	Name string

	Alternation Alternation
}

Rule represents an ABNF rule, with its name and underlying alternation.

This is exposed for custom evaluation purposes, please don't use it else.

func GetRule

func GetRule(rulename string, rulemap map[string]*Rule) *Rule

GetRule returns the rule by the given rulename, whether it is a core rule or present in the grammar, or nil if not found. It validates the RFC 5234 Section 2.1 "rule names are case insensitive".

func (Rule) String

func (rl Rule) String() string

type Status

type Status int

Status defines the type of a num-val, rather StatSeries or StatRange.

This is exposed for custom evaluation purposes, please don't use it else.

const (
	// StatSeries represents a serie of unique byte value.
	StatSeries Status = iota
	// StatRange represents an interval of possible bytes.
	StatRange
)

type TGOption added in v0.2.0

type TGOption interface {
	// contains filtered or unexported methods
}

func WithDeflateCharVals added in v0.2.0

func WithDeflateCharVals(deflate bool) TGOption

WithDeflateCharVals when passed to true, will replace a single node representing the char value to an exhaustive concatenation of all characters (lower and upper case) and the transitions between them.

func WithDeflateNumVals added in v0.2.0

func WithDeflateNumVals(deflate bool) TGOption

WithDeflateNumVals when passed to true, will replace a single node representing the ranges or series of multiple numeric values to individual nodes. It generates a more complex but exhaustive transition graph.

func WithDeflateRules added in v0.2.0

func WithDeflateRules(deflate bool) TGOption

WithDeflateRules when passed to true, will recursively build the transition graphs of the rules it uses (reuse them if possible) and concatenate the sub transition graphs. It generates a more complex but exhaustive transition graph.

func WithRepetitionThreshold added in v0.2.0

func WithRepetitionThreshold(threshold int) TGOption

WithRepetitionThreshold defines the threshold to block a large repetition to occur, elseway it may consum all memory to build the transition graph. For instance, a rule defined as follows may self-DoS go-abnf.

a = 9555("a")

WARNING it does not avoid such memory consumption on chained repetitions. For instance, the two following cases could produce large transition graphs without control with this functional option.

a = 10( 10( 10("a") ) )

a = 10 ( 10*"a" ( 10"a" 9"a" 8"a" ) / ( *10"a" *10"a" *10"a" ) ) 10"a"

Defaults to 256.

type TransitionGraph added in v0.2.0

type TransitionGraph struct {
	Entrypoints []*Node
	Endpoints   []*Node
	// contains filtered or unexported fields
}

TransitionGraph represent a grammar transition graph. It contains the list of entrypoints and endpoints. You could travel through the graph starting from the entrypoints.

func (*TransitionGraph) Reader added in v0.2.0

func (tg *TransitionGraph) Reader() *TransitionGraphReader

func (*TransitionGraph) ToMermaid added in v0.2.0

func (tg *TransitionGraph) ToMermaid() string

ToMermaid produces a mermaid-encoded representation of the transition graph.

type TransitionGraphReader added in v0.2.0

type TransitionGraphReader struct {
	// contains filtered or unexported fields
}

func (*TransitionGraphReader) Next added in v0.2.0

func (tgr *TransitionGraphReader) Next() bool

func (*TransitionGraphReader) Scan added in v0.2.0

func (tgr *TransitionGraphReader) Scan() []byte

Directories

Path Synopsis
cmd
pap module
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL