Class Util


  • public final class Util
    extends java.lang.Object
    Static helper methods.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Util.FSTPath<T>
      Represents a path in TopNSearcher.
      static class  Util.MinResult<T>
      Holds a single input (IntsRef) + output, returned by shortestPaths().
      static class  Util.TopNSearcher<T>
      Utility class to find top N shortest paths from start point(s).
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T> T get​(FST<T> fst, BytesRef input)
      Looks up the output for this input, or null if the input is not accepted
      static <T> T get​(FST<T> fst, IntsRef input)
      Looks up the output for this input, or null if the input is not accepted.
      static IntsRef getByOutput​(FST<java.lang.Long> fst, long targetOutput)
      Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending.
      static IntsRef getByOutput​(FST<java.lang.Long> fst, long targetOutput, FST.BytesReader in, FST.Arc<java.lang.Long> arc, FST.Arc<java.lang.Long> scratchArc, IntsRef result)
      Expert: like getByOutput(FST, long) except reusing BytesReader, initial and scratch Arc, and result.
      static <T> FST.Arc<T> readCeilArc​(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
      Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise return null.
      static <T> Util.MinResult<T>[] shortestPaths​(FST<T> fst, FST.Arc<T> fromNode, T startOutput, java.util.Comparator<T> comparator, int topN, boolean allowEmptyString)
      Starting from node, find the top N min cost completions to a final node.
      static BytesRef toBytesRef​(IntsRef input, BytesRef scratch)
      Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
      static <T> void toDot​(FST<T> fst, java.io.Writer out, boolean sameRank, boolean labelStates)
      Dumps an FST to a GraphViz's dot language description for visualization.
      static IntsRef toIntsRef​(BytesRef input, IntsRef scratch)
      Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
      static IntsRef toUTF16​(java.lang.CharSequence s, IntsRef scratch)
      Just maps each UTF16 unit (char) to the ints in an IntsRef.
      static IntsRef toUTF32​(char[] s, int offset, int length, IntsRef scratch)
      Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
      static IntsRef toUTF32​(java.lang.CharSequence s, IntsRef scratch)
      Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • get

        public static <T> T get​(FST<T> fst,
                                IntsRef input)
                         throws java.io.IOException
        Looks up the output for this input, or null if the input is not accepted.
        Throws:
        java.io.IOException
      • get

        public static <T> T get​(FST<T> fst,
                                BytesRef input)
                         throws java.io.IOException
        Looks up the output for this input, or null if the input is not accepted
        Throws:
        java.io.IOException
      • getByOutput

        public static IntsRef getByOutput​(FST<java.lang.Long> fst,
                                          long targetOutput)
                                   throws java.io.IOException
        Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist.

        NOTE: this only works with FST<Long>, only works when the outputs are ascending in order with the inputs. For example, simple ordinals (0, 1, 2, ...), or file offets (when appending to a file) fit this.

        Throws:
        java.io.IOException
      • getByOutput

        public static IntsRef getByOutput​(FST<java.lang.Long> fst,
                                          long targetOutput,
                                          FST.BytesReader in,
                                          FST.Arc<java.lang.Long> arc,
                                          FST.Arc<java.lang.Long> scratchArc,
                                          IntsRef result)
                                   throws java.io.IOException
        Expert: like getByOutput(FST, long) except reusing BytesReader, initial and scratch Arc, and result.
        Throws:
        java.io.IOException
      • shortestPaths

        public static <T> Util.MinResult<T>[] shortestPaths​(FST<T> fst,
                                                            FST.Arc<T> fromNode,
                                                            T startOutput,
                                                            java.util.Comparator<T> comparator,
                                                            int topN,
                                                            boolean allowEmptyString)
                                                     throws java.io.IOException
        Starting from node, find the top N min cost completions to a final node.
        Throws:
        java.io.IOException
      • toDot

        public static <T> void toDot​(FST<T> fst,
                                     java.io.Writer out,
                                     boolean sameRank,
                                     boolean labelStates)
                              throws java.io.IOException
        Dumps an FST to a GraphViz's dot language description for visualization. Example of use:
         PrintWriter pw = new PrintWriter("out.dot");
         Util.toDot(fst, pw, true, true);
         pw.close();
         
        and then, from command line:
         dot -Tpng -o out.png out.dot
         

        Note: larger FSTs (a few thousand nodes) won't even render, don't bother. If the FST is > 2.1 GB in size then this method will throw strange exceptions.

        Parameters:
        sameRank - If true, the resulting dot file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.
        labelStates - If true states will have labels equal to their offsets in their binary format. Expands the graph considerably.
        Throws:
        java.io.IOException
        See Also:
        "http://www.graphviz.org/"
      • toUTF16

        public static IntsRef toUTF16​(java.lang.CharSequence s,
                                      IntsRef scratch)
        Just maps each UTF16 unit (char) to the ints in an IntsRef.
      • toUTF32

        public static IntsRef toUTF32​(java.lang.CharSequence s,
                                      IntsRef scratch)
        Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
      • toUTF32

        public static IntsRef toUTF32​(char[] s,
                                      int offset,
                                      int length,
                                      IntsRef scratch)
        Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
      • toIntsRef

        public static IntsRef toIntsRef​(BytesRef input,
                                        IntsRef scratch)
        Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
      • toBytesRef

        public static BytesRef toBytesRef​(IntsRef input,
                                          BytesRef scratch)
        Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
      • readCeilArc

        public static <T> FST.Arc<T> readCeilArc​(int label,
                                                 FST<T> fst,
                                                 FST.Arc<T> follow,
                                                 FST.Arc<T> arc,
                                                 FST.BytesReader in)
                                          throws java.io.IOException
        Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise return null.
        Parameters:
        label - the label to ceil on
        fst - the fst to operate on
        follow - the arc to follow reading the label from
        arc - the arc to read into in place
        in - the fst's FST.BytesReader
        Throws:
        java.io.IOException