Class codecs


  • public class codecs
    extends java.lang.Object
    This class implements the codec registry and utility methods supporting codecs, such as those providing the standard replacement strategies ("ignore", "backslashreplace", etc.). The _codecs module relies heavily on apparatus implemented here, and therefore so does the Python codecs module (in Lib/codecs.py). It corresponds approximately to CPython's Python/codecs.c.

    The class also contains the inner methods of the standard Unicode codecs, available for transcoding of text at the Java level. These also are exposed through the _codecs module. In CPython, the implementations are found in Objects/unicodeobject.c.

    Since:
    Jython 2.0
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  codecs.CodecState  
    • Constructor Summary

      Constructors 
      Constructor Description
      codecs()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.StringBuilder backslashreplace​(int start, int end, java.lang.String toReplace)  
      static PyObject backslashreplace_errors​(PyObject[] args, java.lang.String[] kws)  
      static int calcNewPosition​(int size, PyObject errorTuple)
      Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position.
      static PyObject decode​(PyString v, java.lang.String encoding, java.lang.String errors)
      Decode the bytes v using the codec registered for the encoding.
      static PyObject decoding_error​(java.lang.String errors, java.lang.String encoding, java.lang.String toDecode, int start, int end, java.lang.String reason)
      Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered through register_error(String, PyObject).
      static java.lang.String encode​(PyString v, java.lang.String encoding, java.lang.String errors)
      Encode v using the codec registered for the encoding.
      static PyObject encoding_error​(java.lang.String errors, java.lang.String encoding, java.lang.String toEncode, int start, int end, java.lang.String reason)
      Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered through register_error(String, PyObject).
      static java.lang.String getDefaultEncoding()  
      static PyObject ignore_errors​(PyObject[] args, java.lang.String[] kws)  
      static int insertReplacementAndGetResume​(java.lang.StringBuilder partialDecode, java.lang.String errors, java.lang.String encoding, java.lang.String toDecode, int start, int end, java.lang.String reason)
      Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).
      static PyTuple lookup​(java.lang.String encoding)  
      static PyObject lookup_error​(java.lang.String handlerName)  
      static java.lang.String PyUnicode_DecodeASCII​(java.lang.String str, int size, java.lang.String errors)  
      static PyUnicode PyUnicode_DecodeIDNA​(java.lang.String input, java.lang.String errors)  
      static java.lang.String PyUnicode_DecodeLatin1​(java.lang.String str, int size, java.lang.String errors)  
      static PyUnicode PyUnicode_DecodePunycode​(java.lang.String input, java.lang.String errors)  
      static java.lang.String PyUnicode_DecodeRawUnicodeEscape​(java.lang.String str, java.lang.String errors)  
      static java.lang.String PyUnicode_DecodeUTF7​(java.lang.String bytes, java.lang.String errors)
      Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object.
      static java.lang.String PyUnicode_DecodeUTF7Stateful​(java.lang.String bytes, java.lang.String errors, int[] consumed)
      Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed.
      static java.lang.String PyUnicode_DecodeUTF8​(java.lang.String str, java.lang.String errors)  
      static java.lang.String PyUnicode_DecodeUTF8Stateful​(java.lang.String str, java.lang.String errors, int[] consumed)  
      static java.lang.String PyUnicode_EncodeASCII​(java.lang.String str, int size, java.lang.String errors)  
      static java.lang.String PyUnicode_EncodeIDNA​(PyUnicode input, java.lang.String errors)  
      static java.lang.String PyUnicode_EncodeLatin1​(java.lang.String str, int size, java.lang.String errors)  
      static java.lang.String PyUnicode_EncodePunycode​(PyUnicode input, java.lang.String errors)  
      static java.lang.String PyUnicode_EncodeRawUnicodeEscape​(java.lang.String str, java.lang.String errors, boolean modifed)  
      static java.lang.String PyUnicode_EncodeUTF7​(java.lang.String unicode, boolean base64SetO, boolean base64WhiteSpace, java.lang.String errors)
      Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String.
      static java.lang.String PyUnicode_EncodeUTF8​(java.lang.String str, java.lang.String errors)  
      static void register​(PyObject search_function)  
      static void register_error​(java.lang.String name, PyObject error)  
      static PyObject replace_errors​(PyObject[] args, java.lang.String[] kws)  
      static void setDefaultEncoding​(java.lang.String encoding)  
      static PyObject strict_errors​(PyObject[] args, java.lang.String[] kws)  
      static java.lang.StringBuilder xmlcharrefreplace​(int start, int end, java.lang.String toReplace)  
      static PyObject xmlcharrefreplace_errors​(PyObject[] args, java.lang.String[] kws)  
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • codecs

        public codecs()
    • Method Detail

      • getDefaultEncoding

        public static java.lang.String getDefaultEncoding()
      • setDefaultEncoding

        public static void setDefaultEncoding​(java.lang.String encoding)
      • lookup_error

        public static PyObject lookup_error​(java.lang.String handlerName)
      • register_error

        public static void register_error​(java.lang.String name,
                                          PyObject error)
      • register

        public static void register​(PyObject search_function)
      • lookup

        public static PyTuple lookup​(java.lang.String encoding)
      • decode

        public static PyObject decode​(PyString v,
                                      java.lang.String encoding,
                                      java.lang.String errors)
        Decode the bytes v using the codec registered for the encoding. The encoding defaults to the system default encoding (see getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject)). The default error policy is 'strict' meaning that encoding errors raise a ValueError. This method is exposed through the _codecs module as _codecs#decode(PyString, String, String).
        Parameters:
        v - bytes to be decoded
        encoding - name of encoding (to look up in codec registry)
        errors - error policy name (e.g. "ignore", "replace")
        Returns:
        Unicode string decoded from bytes
      • encode

        public static java.lang.String encode​(PyString v,
                                              java.lang.String encoding,
                                              java.lang.String errors)
        Encode v using the codec registered for the encoding. The encoding defaults to the system default encoding (see getDefaultEncoding()). The string errors may name a different error handling policy (built-in or registered with register_error(String, PyObject)). The default error policy is 'strict' meaning that encoding errors raise a ValueError.
        Parameters:
        v - unicode string to be encoded
        encoding - name of encoding (to look up in codec registry)
        errors - error policy name (e.g. "ignore")
        Returns:
        bytes object encoding v
      • strict_errors

        public static PyObject strict_errors​(PyObject[] args,
                                             java.lang.String[] kws)
      • ignore_errors

        public static PyObject ignore_errors​(PyObject[] args,
                                             java.lang.String[] kws)
      • replace_errors

        public static PyObject replace_errors​(PyObject[] args,
                                              java.lang.String[] kws)
      • xmlcharrefreplace_errors

        public static PyObject xmlcharrefreplace_errors​(PyObject[] args,
                                                        java.lang.String[] kws)
      • xmlcharrefreplace

        public static java.lang.StringBuilder xmlcharrefreplace​(int start,
                                                                int end,
                                                                java.lang.String toReplace)
      • backslashreplace_errors

        public static PyObject backslashreplace_errors​(PyObject[] args,
                                                       java.lang.String[] kws)
      • backslashreplace

        public static java.lang.StringBuilder backslashreplace​(int start,
                                                               int end,
                                                               java.lang.String toReplace)
      • PyUnicode_DecodeUTF7Stateful

        public static java.lang.String PyUnicode_DecodeUTF7Stateful​(java.lang.String bytes,
                                                                    java.lang.String errors,
                                                                    int[] consumed)
        Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed. The only state we preserve is our read position, i.e. how many bytes we have consumed. So if the input ends part way through a Base64 sequence the data reported as consumed is just that up to and not including the Base64 start marker ('+'). Performance will be poor (quadratic cost) on runs of Base64 data long enough to exceed the input quantum in incremental decoding. The returned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.
        Parameters:
        bytes - input represented as String (Jython PyString convention)
        errors - error policy name (e.g. "ignore", "replace")
        consumed - returns number of bytes consumed in element 0, or is null if a "final" call
        Returns:
        unicode result (as UTF-16 Java String)
      • PyUnicode_DecodeUTF7

        public static java.lang.String PyUnicode_DecodeUTF7​(java.lang.String bytes,
                                                            java.lang.String errors)
        Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object. The retruned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.
        Parameters:
        bytes - input represented as String (Jython PyString convention)
        errors - error policy name (e.g. "ignore", "replace")
        Returns:
        unicode result (as UTF-16 Java String)
      • PyUnicode_EncodeUTF7

        public static java.lang.String PyUnicode_EncodeUTF7​(java.lang.String unicode,
                                                            boolean base64SetO,
                                                            boolean base64WhiteSpace,
                                                            java.lang.String errors)
        Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String. (String representation for byte data is chosen so that it may immediately become a PyString.) This method differs from the CPython equivalent (in Object/unicodeobject.c) which works with an array of code points that are, in a wide build, Unicode code points.
        Parameters:
        unicode - to be encoded
        base64SetO - true if characters in "set O" should be translated to base64
        base64WhiteSpace - true if white-space characters should be translated to base64
        errors - error policy name (e.g. "ignore", "replace")
        Returns:
        bytes representing the encoded unicode string
      • PyUnicode_DecodeUTF8

        public static java.lang.String PyUnicode_DecodeUTF8​(java.lang.String str,
                                                            java.lang.String errors)
      • PyUnicode_DecodeUTF8Stateful

        public static java.lang.String PyUnicode_DecodeUTF8Stateful​(java.lang.String str,
                                                                    java.lang.String errors,
                                                                    int[] consumed)
      • PyUnicode_EncodeUTF8

        public static java.lang.String PyUnicode_EncodeUTF8​(java.lang.String str,
                                                            java.lang.String errors)
      • PyUnicode_DecodeASCII

        public static java.lang.String PyUnicode_DecodeASCII​(java.lang.String str,
                                                             int size,
                                                             java.lang.String errors)
      • PyUnicode_DecodeLatin1

        public static java.lang.String PyUnicode_DecodeLatin1​(java.lang.String str,
                                                              int size,
                                                              java.lang.String errors)
      • PyUnicode_EncodeASCII

        public static java.lang.String PyUnicode_EncodeASCII​(java.lang.String str,
                                                             int size,
                                                             java.lang.String errors)
      • PyUnicode_EncodeLatin1

        public static java.lang.String PyUnicode_EncodeLatin1​(java.lang.String str,
                                                              int size,
                                                              java.lang.String errors)
      • PyUnicode_EncodeRawUnicodeEscape

        public static java.lang.String PyUnicode_EncodeRawUnicodeEscape​(java.lang.String str,
                                                                        java.lang.String errors,
                                                                        boolean modifed)
      • PyUnicode_DecodeRawUnicodeEscape

        public static java.lang.String PyUnicode_DecodeRawUnicodeEscape​(java.lang.String str,
                                                                        java.lang.String errors)
      • PyUnicode_EncodePunycode

        public static java.lang.String PyUnicode_EncodePunycode​(PyUnicode input,
                                                                java.lang.String errors)
      • PyUnicode_DecodePunycode

        public static PyUnicode PyUnicode_DecodePunycode​(java.lang.String input,
                                                         java.lang.String errors)
      • PyUnicode_EncodeIDNA

        public static java.lang.String PyUnicode_EncodeIDNA​(PyUnicode input,
                                                            java.lang.String errors)
      • PyUnicode_DecodeIDNA

        public static PyUnicode PyUnicode_DecodeIDNA​(java.lang.String input,
                                                     java.lang.String errors)
      • encoding_error

        public static PyObject encoding_error​(java.lang.String errors,
                                              java.lang.String encoding,
                                              java.lang.String toEncode,
                                              int start,
                                              int end,
                                              java.lang.String reason)
        Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered through register_error(String, PyObject). The return value is the return from the error handler indicating the replacement codec input and the the position at which to resume encoding. Invokes the mechanism described in PEP-293.
        Parameters:
        errors - name of the error policy (or null meaning "strict")
        encoding - name of encoding that encountered the error
        toEncode - unicode string being encoded
        start - index of first char it couldn't encode
        end - index+1 of last char it couldn't encode (usually becomes the resume point)
        reason - contribution to error message if any
        Returns:
        must be a tuple (replacement_unicode, resume_index)
      • insertReplacementAndGetResume

        public static int insertReplacementAndGetResume​(java.lang.StringBuilder partialDecode,
                                                        java.lang.String errors,
                                                        java.lang.String encoding,
                                                        java.lang.String toDecode,
                                                        int start,
                                                        int end,
                                                        java.lang.String reason)
        Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).
        Parameters:
        partialDecode - output buffer of unicode (as UTF-16) that the codec is building
        errors - name of the error policy (or null meaning "strict")
        encoding - name of encoding that encountered the error
        toDecode - bytes being decoded
        start - index of first byte it couldn't decode
        end - index+1 of last byte it couldn't decode (usually becomes the resume point)
        reason - contribution to error message if any
        Returns:
        the resume position: index of next byte to decode
      • decoding_error

        public static PyObject decoding_error​(java.lang.String errors,
                                              java.lang.String encoding,
                                              java.lang.String toDecode,
                                              int start,
                                              int end,
                                              java.lang.String reason)
        Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered through register_error(String, PyObject). The return value is the return from the error handler indicating the replacement codec output and the the position at which to resume decoding. Invokes the mechanism described in PEP-293.
        Parameters:
        errors - name of the error policy (or null meaning "strict")
        encoding - name of encoding that encountered the error
        toDecode - bytes being decoded
        start - index of first byte it couldn't decode
        end - index+1 of last byte it couldn't decode (usually becomes the resume point)
        reason - contribution to error message if any
        Returns:
        must be a tuple (replacement_unicode, resume_index)
      • calcNewPosition

        public static int calcNewPosition​(int size,
                                          PyObject errorTuple)
        Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position. Negative indexes in the error handler return are interpreted as "from the end". If the result would be out of bounds in the input, an IndexError exception is raised.
        Parameters:
        size - of byte buffer being decoded
        errorTuple - returned from error handler
        Returns:
        absolute resume position.