Class RegExp


  • public class RegExp
    extends java.lang.Object
    Regular Expression extension to Automaton.

    Regular expressions are built from the following abstract syntax:

    regexp ::= unionexp
    |
    unionexp ::= interexp | unionexp (union)
    | interexp
    interexp ::= concatexp & interexp (intersection) [OPTIONAL]
    | concatexp
    concatexp ::= repeatexp concatexp (concatenation)
    | repeatexp
    repeatexp ::= repeatexp ? (zero or one occurrence)
    | repeatexp * (zero or more occurrences)
    | repeatexp + (one or more occurrences)
    | repeatexp {n} (n occurrences)
    | repeatexp {n,} (n or more occurrences)
    | repeatexp {n,m} (n to m occurrences, including both)
    | complexp
    complexp ::= ~ complexp (complement) [OPTIONAL]
    | charclassexp
    charclassexp ::= [ charclasses ] (character class)
    | [^ charclasses ] (negated character class)
    | simpleexp
    charclasses ::= charclass charclasses
    | charclass
    charclass ::= charexp - charexp (character range, including end-points)
    | charexp
    simpleexp ::= charexp
    | . (any single character)
    | # (the empty language) [OPTIONAL]
    | @ (any string) [OPTIONAL]
    | " <Unicode string without double-quotes>  " (a string)
    | ( ) (the empty string)
    | ( unionexp ) (precedence override)
    | < <identifier> > (named automaton) [OPTIONAL]
    | <n-m> (numerical interval) [OPTIONAL]
    charexp ::= <Unicode character> (a single non-reserved character)
    | \ <Unicode character>  (a single character)

    The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int ALL
      Syntax flag, enables all optional regexp syntax.
      static int ANYSTRING
      Syntax flag, enables anystring (@).
      static int AUTOMATON
      Syntax flag, enables named automata (<identifier>).
      static int COMPLEMENT
      Syntax flag, enables complement (~).
      static int EMPTY
      Syntax flag, enables empty language (#).
      static int INTERSECTION
      Syntax flag, enables intersection (&).
      static int INTERVAL
      Syntax flag, enables numerical intervals ( <n-m>).
      static int NONE
      Syntax flag, enables no optional regexp syntax.
    • Constructor Summary

      Constructors 
      Constructor Description
      RegExp​(java.lang.String s)
      Constructs new RegExp from a string.
      RegExp​(java.lang.String s, int syntax_flags)
      Constructs new RegExp from a string.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.Set<java.lang.String> getIdentifiers()
      Returns set of automaton identifiers that occur in this regular expression.
      boolean setAllowMutate​(boolean flag)
      Sets or resets allow mutate flag.
      Automaton toAutomaton()
      Constructs new Automaton from this RegExp.
      Automaton toAutomaton​(java.util.Map<java.lang.String,​Automaton> automata)
      Constructs new Automaton from this RegExp.
      Automaton toAutomaton​(AutomatonProvider automaton_provider)
      Constructs new Automaton from this RegExp.
      java.lang.String toString()
      Constructs string from parsed regular expression.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • INTERSECTION

        public static final int INTERSECTION
        Syntax flag, enables intersection (&).
        See Also:
        Constant Field Values
      • COMPLEMENT

        public static final int COMPLEMENT
        Syntax flag, enables complement (~).
        See Also:
        Constant Field Values
      • EMPTY

        public static final int EMPTY
        Syntax flag, enables empty language (#).
        See Also:
        Constant Field Values
      • ANYSTRING

        public static final int ANYSTRING
        Syntax flag, enables anystring (@).
        See Also:
        Constant Field Values
      • AUTOMATON

        public static final int AUTOMATON
        Syntax flag, enables named automata (<identifier>).
        See Also:
        Constant Field Values
      • INTERVAL

        public static final int INTERVAL
        Syntax flag, enables numerical intervals ( <n-m>).
        See Also:
        Constant Field Values
      • ALL

        public static final int ALL
        Syntax flag, enables all optional regexp syntax.
        See Also:
        Constant Field Values
      • NONE

        public static final int NONE
        Syntax flag, enables no optional regexp syntax.
        See Also:
        Constant Field Values
    • Constructor Detail

      • RegExp

        public RegExp​(java.lang.String s)
               throws java.lang.IllegalArgumentException
        Constructs new RegExp from a string. Same as RegExp(s, ALL).
        Parameters:
        s - regexp string
        Throws:
        java.lang.IllegalArgumentException - if an error occured while parsing the regular expression
      • RegExp

        public RegExp​(java.lang.String s,
                      int syntax_flags)
               throws java.lang.IllegalArgumentException
        Constructs new RegExp from a string.
        Parameters:
        s - regexp string
        syntax_flags - boolean 'or' of optional syntax constructs to be enabled
        Throws:
        java.lang.IllegalArgumentException - if an error occured while parsing the regular expression
    • Method Detail

      • toAutomaton

        public Automaton toAutomaton()
        Constructs new Automaton from this RegExp. Same as toAutomaton(null) (empty automaton map).
      • toAutomaton

        public Automaton toAutomaton​(AutomatonProvider automaton_provider)
                              throws java.lang.IllegalArgumentException
        Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.
        Parameters:
        automaton_provider - provider of automata for named identifiers
        Throws:
        java.lang.IllegalArgumentException - if this regular expression uses a named identifier that is not available from the automaton provider
      • toAutomaton

        public Automaton toAutomaton​(java.util.Map<java.lang.String,​Automaton> automata)
                              throws java.lang.IllegalArgumentException
        Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.
        Parameters:
        automata - a map from automaton identifiers to automata (of type Automaton).
        Throws:
        java.lang.IllegalArgumentException - if this regular expression uses a named identifier that does not occur in the automaton map
      • setAllowMutate

        public boolean setAllowMutate​(boolean flag)
        Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. By default, the flag is not set.
        Parameters:
        flag - if true, the flag is set
        Returns:
        previous value of the flag
      • toString

        public java.lang.String toString()
        Constructs string from parsed regular expression.
        Overrides:
        toString in class java.lang.Object
      • getIdentifiers

        public java.util.Set<java.lang.String> getIdentifiers()
        Returns set of automaton identifiers that occur in this regular expression.