Glob flexible searches

Globbing is an advanced form of wildcard searches, more powerful than DNSDB’s Standard Search left-hand or right-hand wildcards, but not as advanced as Farsight Compatible Regular Expressions (FCRE). They can be simpler to write, especially for API users who are not familiar with regular expressions.

In general, Farsight’s glob implementation follows standard Unix glob(7) semantics, but not what’s sometimes referred to as “extended globbing.”

Glob searches are evaluated against the DNS master file form of the hostnames (aka rrnames) and rdata values, which by design contains only printable ASCII characters. All non-printable characters, including octets outside the ASCII range, are converted to “\DDD” escape sequences, where “DDD” is a three digit decimal number per RFC 1035. This is only applicable to RData (RHS) queries.

Glob Syntax

A glob is a string of printable characters with the following characters given special meaning:

  • * — Match any zero or more characters.
  • ? — Match exactly any one character.
  • [ — Begin a character class. Any of the contained characters or ranges will match.
  • ] — End a character class.
  • \ — Escape the next character (but not within a character class)

Any other characters in globbing pattern get matched exactly as written, except that characters are not case sensitive.

Character Class Syntax

A character class is a set of characters enclosed between an opening [ and a closing ]. A simple example is [m-z1-3] to match characters m through z and 1 to 3.

Within the character class, the following characters are handled specially:

  • ! — If the first character after the opening [, denotes a negated character class, i.e. a class which matches any character not listed in the remainder of the class.
  • ] — If the first character after the opening [ or [!, encodes a literal ] as a member of the class. A ] after the first character after the opening [ or [! ends the character class.
  • - — If the first character after the opening [ or [! or the last character before the closing ], encodes a literal - as a member of the character class.
    • If between two characters A and B, encodes the range of characters between A and B, inclusive, as members of the character class. The character A must occur before B in ASCII encoding.

The sequences [. and [= are not allowed between the opening [or [! and the closing ], to prevent confusion with unsupported POSIX collation sequences and collation classes.

If the sequence [: appears in a character class, it must be the beginning of one of the following POSIX character classes:

  • [:alnum:] — Alphanumeric characters 0-9, A-Z, and a-z
  • [:alpha:] — Alphabetic characters A-Z, a-z
  • [:blank:] — Blank characters (space and tab)
    • Only printable characters occur in searchable strings and space is the only printable whitespace character, thus use of [:blank:] is equivalent to a space character.
    • Tabs in data appear as the escape sequence \009 and can be matched with \\009.
  • [:cntrl:] — Control characters
    • Only printable characters occur in searchable strings, so [:cntrl:] will not match any characters.
    • Control characters in data will appear as \DDD escape sequences sequences. To match one of those, you will need to backslash-quote the backslash. Thus to match \004, use \\004.
  • [:digit:] — Decimal digits 0-9
  • [:graph:] — Any printable character other than space.
    • Only printable characters occur in searchable strings, thus a character class containing [:graph:] is equivalent to [! ] (negated character class containing only a space).
  • [:lower:] — Lower case alphabetic characters a-z
    • Hostnames will be folded to lower case, thus use of [:lower:] is equivalent to [:alpha:].
  • [:print:] — Any printable character
    • Only printable characters occur in searchable strings, so [:print:] will match any character.
  • [:punct:] — Punctuation characters (printable characters other than space and [:alnum:])
  • [:space:] — Any whitespace character
    • The space character is the only printable whitespace character, thus use of [:space:] is equivalent to a space character.
  • [:upper:] — Upper case alphabetic characters A-Z
    • Since all of our data is indexed as lower-case, this is not useful as it is equivalent to [:lower:].
  • [:xdigit:] — Hexadecimal digits 0-9, a-f, A-F

The above named character classes must appear inside an enclosing [ and ], e.g. [[:digit:][:punct:]] to match a digit or punctuation character. Without the enclosing braces, [:digit:] will match the characters :, d, i, g, or t.

Neither the above character classes nor a character range may begin or end a character range. For example, the character class expressions [0-[:alpha:]] and [a-n-z] are invalid.

All other characters between the opening [ or [! and the closing ] are added to the character class, including the backslash \ character.

There is no way to express a character class containing a single ! character.

Important notes

  • Glob searches are not case sensitive.
  • Globbing patterns are “anchored” front and back by default. (This is a major difference from FCRE.)
  • All hostnames (rrnames) in the DNS dataset end in a ., which must be accounted for in globs.
    • Therefore, a search for *.com will not match any hostnames. A glob that searches in rrnames must end in something that matches a ., so *.com. would match what was intended.
  • All well-formed rdata we currently index in the DNS dataset ends in a . or a ", which should be accounted for in globs.
    • Therefore, a glob that searches in rdata should end in something that matches a . or a ".
  • There must be at least two consecutive non-wildcard characters in the pattern. The implicit front and back anchor counts as a non-wildcard character.

Examples

  • To match hostnames with a label containing the word “smoke”:

    • use glob *smoke* in a rrnames search
    • Examples of results:
      • smokeping.pdf.ac.
      • smoke.tesla.ac.
  • To match hostnames with a label containing the word “cider” but not containing “hard”:

    • use glob *cider* in a rrnames search, with an exclude filter of *hard*
    • Examples of results:
      • ciderpress.ca.
      • colombus.citycider2018.eventbrite.ca.
  • To match hostnames with a label ending in “www.” and a later label starting with “.com”

    • use glob *www.*.com* in a rrnames search
    • Examples of results:
      • www.example.com.
      • dev-www.subdomain.example.com.
      • www.example.com.cdn.net.
      • stage-www.dev.community.org.
  • To match hostnames starting with “www.” and ending in “.com.”

    • use glob www.*.com. in a rrnames search
    • Examples of results:
      • www.example.com.
      • www.subdomain.example.com.
  • To match hostnames starting with “www.” and ending with “.com” with no other dots in between,

    • this cannot be done in a general way using globs; use regular expression instead.
  • To match hostnames starting with “www” optionally preceded by a “dev-” or “stage-” prefix in a .net or .edu domain,

    • this cannot be done in a general way using globs; use regular expression instead.
  • To match TXT records encoding an SPF policy with a ~all default

    • use glob "v=spf1 * ~all" in a rdata search
    • Examples of results:
      • “v=spf1 a mx ~all”
      • “v=spf1 a 10.2.0.0/16 ~all”
  • To match single character domain names (which are really two character domain names when you add the implicit trailing ‘.’),

    • use glob ?. in an rrnames or rdata search
    • Examples of results:
      • a.
      • 0.
  • To match “bri” followed by exactly any three characters followed by “morning” followed by anything (or nothing) [a question mark will match exactly one character]

    • use glob bri???morning* in a rrnames search
    • Examples of results:
      • brightmorning.com
      • brightmorningtoday.com
  • To match “ns” followed by any single digit followed by anything (or nothing) and ending in “.net.”

    • use glob ns[0-9]*.net. in a rrnames search
    • Examples of results:
      • ns0.fsi.net
      • ns0abc.fsi.net

Additional Information