Package org.python.core
Class codecs
- java.lang.Object
-
- org.python.core.codecs
-
public class codecs extends java.lang.Object
This class implements the codec registry and utility methods supporting codecs, such as those providing the standard replacement strategies ("ignore", "backslashreplace", etc.). The _codecs module relies heavily on apparatus implemented here, and therefore so does the Pythoncodecs
module (inLib/codecs.py
). It corresponds approximately to CPython'sPython/codecs.c
.The class also contains the inner methods of the standard Unicode codecs, available for transcoding of text at the Java level. These also are exposed through the
_codecs
module. In CPython, the implementations are found inObjects/unicodeobject.c
.- Since:
- Jython 2.0
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
codecs.CodecState
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
BACKSLASHREPLACE
static java.lang.String
IGNORE
static java.lang.String
REPLACE
static java.lang.String
XMLCHARREFREPLACE
-
Constructor Summary
Constructors Constructor Description codecs()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.StringBuilder
backslashreplace(int start, int end, java.lang.String toReplace)
static PyObject
backslashreplace_errors(PyObject[] args, java.lang.String[] kws)
static int
calcNewPosition(int size, PyObject errorTuple)
Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position.static PyObject
decode(PyString v, java.lang.String encoding, java.lang.String errors)
Decode the bytesv
using the codec registered for theencoding
.static PyObject
decoding_error(java.lang.String errors, java.lang.String encoding, java.lang.String toDecode, int start, int end, java.lang.String reason)
Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered throughregister_error(String, PyObject)
.static java.lang.String
encode(PyString v, java.lang.String encoding, java.lang.String errors)
Encodev
using the codec registered for theencoding
.static PyObject
encoding_error(java.lang.String errors, java.lang.String encoding, java.lang.String toEncode, int start, int end, java.lang.String reason)
Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered throughregister_error(String, PyObject)
.static java.lang.String
getDefaultEncoding()
static PyObject
ignore_errors(PyObject[] args, java.lang.String[] kws)
static int
insertReplacementAndGetResume(java.lang.StringBuilder partialDecode, java.lang.String errors, java.lang.String encoding, java.lang.String toDecode, int start, int end, java.lang.String reason)
Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).static PyTuple
lookup(java.lang.String encoding)
static PyObject
lookup_error(java.lang.String handlerName)
static java.lang.String
PyUnicode_DecodeASCII(java.lang.String str, int size, java.lang.String errors)
static PyUnicode
PyUnicode_DecodeIDNA(java.lang.String input, java.lang.String errors)
static java.lang.String
PyUnicode_DecodeLatin1(java.lang.String str, int size, java.lang.String errors)
static PyUnicode
PyUnicode_DecodePunycode(java.lang.String input, java.lang.String errors)
static java.lang.String
PyUnicode_DecodeRawUnicodeEscape(java.lang.String str, java.lang.String errors)
static java.lang.String
PyUnicode_DecodeUTF7(java.lang.String bytes, java.lang.String errors)
Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object.static java.lang.String
PyUnicode_DecodeUTF7Stateful(java.lang.String bytes, java.lang.String errors, int[] consumed)
Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed.static java.lang.String
PyUnicode_DecodeUTF8(java.lang.String str, java.lang.String errors)
static java.lang.String
PyUnicode_DecodeUTF8Stateful(java.lang.String str, java.lang.String errors, int[] consumed)
static java.lang.String
PyUnicode_EncodeASCII(java.lang.String str, int size, java.lang.String errors)
static java.lang.String
PyUnicode_EncodeIDNA(PyUnicode input, java.lang.String errors)
static java.lang.String
PyUnicode_EncodeLatin1(java.lang.String str, int size, java.lang.String errors)
static java.lang.String
PyUnicode_EncodePunycode(PyUnicode input, java.lang.String errors)
static java.lang.String
PyUnicode_EncodeRawUnicodeEscape(java.lang.String str, java.lang.String errors, boolean modifed)
static java.lang.String
PyUnicode_EncodeUTF7(java.lang.String unicode, boolean base64SetO, boolean base64WhiteSpace, java.lang.String errors)
Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String.static java.lang.String
PyUnicode_EncodeUTF8(java.lang.String str, java.lang.String errors)
static void
register(PyObject search_function)
static void
register_error(java.lang.String name, PyObject error)
static PyObject
replace_errors(PyObject[] args, java.lang.String[] kws)
static void
setDefaultEncoding(java.lang.String encoding)
static PyObject
strict_errors(PyObject[] args, java.lang.String[] kws)
static java.lang.StringBuilder
xmlcharrefreplace(int start, int end, java.lang.String toReplace)
static PyObject
xmlcharrefreplace_errors(PyObject[] args, java.lang.String[] kws)
-
-
-
Field Detail
-
BACKSLASHREPLACE
public static final java.lang.String BACKSLASHREPLACE
- See Also:
- Constant Field Values
-
IGNORE
public static final java.lang.String IGNORE
- See Also:
- Constant Field Values
-
REPLACE
public static final java.lang.String REPLACE
- See Also:
- Constant Field Values
-
XMLCHARREFREPLACE
public static final java.lang.String XMLCHARREFREPLACE
- See Also:
- Constant Field Values
-
-
Method Detail
-
getDefaultEncoding
public static java.lang.String getDefaultEncoding()
-
setDefaultEncoding
public static void setDefaultEncoding(java.lang.String encoding)
-
lookup_error
public static PyObject lookup_error(java.lang.String handlerName)
-
register_error
public static void register_error(java.lang.String name, PyObject error)
-
register
public static void register(PyObject search_function)
-
lookup
public static PyTuple lookup(java.lang.String encoding)
-
decode
public static PyObject decode(PyString v, java.lang.String encoding, java.lang.String errors)
Decode the bytesv
using the codec registered for theencoding
. Theencoding
defaults to the system default encoding (seegetDefaultEncoding()
). The stringerrors
may name a different error handling policy (built-in or registered withregister_error(String, PyObject)
). The default error policy is 'strict' meaning that encoding errors raise aValueError
. This method is exposed through the _codecs module as_codecs#decode(PyString, String, String)
.- Parameters:
v
- bytes to be decodedencoding
- name of encoding (to look up in codec registry)errors
- error policy name (e.g. "ignore", "replace")- Returns:
- Unicode string decoded from
bytes
-
encode
public static java.lang.String encode(PyString v, java.lang.String encoding, java.lang.String errors)
Encodev
using the codec registered for theencoding
. Theencoding
defaults to the system default encoding (seegetDefaultEncoding()
). The stringerrors
may name a different error handling policy (built-in or registered withregister_error(String, PyObject)
). The default error policy is 'strict' meaning that encoding errors raise aValueError
.- Parameters:
v
- unicode string to be encodedencoding
- name of encoding (to look up in codec registry)errors
- error policy name (e.g. "ignore")- Returns:
- bytes object encoding
v
-
xmlcharrefreplace_errors
public static PyObject xmlcharrefreplace_errors(PyObject[] args, java.lang.String[] kws)
-
xmlcharrefreplace
public static java.lang.StringBuilder xmlcharrefreplace(int start, int end, java.lang.String toReplace)
-
backslashreplace_errors
public static PyObject backslashreplace_errors(PyObject[] args, java.lang.String[] kws)
-
backslashreplace
public static java.lang.StringBuilder backslashreplace(int start, int end, java.lang.String toReplace)
-
PyUnicode_DecodeUTF7Stateful
public static java.lang.String PyUnicode_DecodeUTF7Stateful(java.lang.String bytes, java.lang.String errors, int[] consumed)
Decode (perhaps partially) a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object, and amount of input consumed. The only state we preserve is our read position, i.e. how many bytes we have consumed. So if the input ends part way through a Base64 sequence the data reported as consumed is just that up to and not including the Base64 start marker ('+'). Performance will be poor (quadratic cost) on runs of Base64 data long enough to exceed the input quantum in incremental decoding. The returned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.- Parameters:
bytes
- input represented as String (Jython PyString convention)errors
- error policy name (e.g. "ignore", "replace")consumed
- returns number of bytes consumed in element 0, or is null if a "final" call- Returns:
- unicode result (as UTF-16 Java String)
-
PyUnicode_DecodeUTF7
public static java.lang.String PyUnicode_DecodeUTF7(java.lang.String bytes, java.lang.String errors)
Decode completely a sequence of bytes representing the UTF-7 encoded form of a Unicode string and return the (Jython internal representation of) the unicode object. The retruned Java String is a UTF-16 representation of the Unicode result, in line with Java conventions. Unicode characters above the BMP are represented as surrogate pairs.- Parameters:
bytes
- input represented as String (Jython PyString convention)errors
- error policy name (e.g. "ignore", "replace")- Returns:
- unicode result (as UTF-16 Java String)
-
PyUnicode_EncodeUTF7
public static java.lang.String PyUnicode_EncodeUTF7(java.lang.String unicode, boolean base64SetO, boolean base64WhiteSpace, java.lang.String errors)
Encode a UTF-16 Java String as UTF-7 bytes represented by the low bytes of the characters in a String. (String representation for byte data is chosen so that it may immediately become a PyString.) This method differs from the CPython equivalent (inObject/unicodeobject.c
) which works with an array of code points that are, in a wide build, Unicode code points.- Parameters:
unicode
- to be encodedbase64SetO
- true if characters in "set O" should be translated to base64base64WhiteSpace
- true if white-space characters should be translated to base64errors
- error policy name (e.g. "ignore", "replace")- Returns:
- bytes representing the encoded unicode string
-
PyUnicode_DecodeUTF8
public static java.lang.String PyUnicode_DecodeUTF8(java.lang.String str, java.lang.String errors)
-
PyUnicode_DecodeUTF8Stateful
public static java.lang.String PyUnicode_DecodeUTF8Stateful(java.lang.String str, java.lang.String errors, int[] consumed)
-
PyUnicode_EncodeUTF8
public static java.lang.String PyUnicode_EncodeUTF8(java.lang.String str, java.lang.String errors)
-
PyUnicode_DecodeASCII
public static java.lang.String PyUnicode_DecodeASCII(java.lang.String str, int size, java.lang.String errors)
-
PyUnicode_DecodeLatin1
public static java.lang.String PyUnicode_DecodeLatin1(java.lang.String str, int size, java.lang.String errors)
-
PyUnicode_EncodeASCII
public static java.lang.String PyUnicode_EncodeASCII(java.lang.String str, int size, java.lang.String errors)
-
PyUnicode_EncodeLatin1
public static java.lang.String PyUnicode_EncodeLatin1(java.lang.String str, int size, java.lang.String errors)
-
PyUnicode_EncodeRawUnicodeEscape
public static java.lang.String PyUnicode_EncodeRawUnicodeEscape(java.lang.String str, java.lang.String errors, boolean modifed)
-
PyUnicode_DecodeRawUnicodeEscape
public static java.lang.String PyUnicode_DecodeRawUnicodeEscape(java.lang.String str, java.lang.String errors)
-
PyUnicode_EncodePunycode
public static java.lang.String PyUnicode_EncodePunycode(PyUnicode input, java.lang.String errors)
-
PyUnicode_DecodePunycode
public static PyUnicode PyUnicode_DecodePunycode(java.lang.String input, java.lang.String errors)
-
PyUnicode_EncodeIDNA
public static java.lang.String PyUnicode_EncodeIDNA(PyUnicode input, java.lang.String errors)
-
PyUnicode_DecodeIDNA
public static PyUnicode PyUnicode_DecodeIDNA(java.lang.String input, java.lang.String errors)
-
encoding_error
public static PyObject encoding_error(java.lang.String errors, java.lang.String encoding, java.lang.String toEncode, int start, int end, java.lang.String reason)
Invoke a user-defined error-handling mechanism, for errors encountered during encoding, as registered throughregister_error(String, PyObject)
. The return value is the return from the error handler indicating the replacement codec input and the the position at which to resume encoding. Invokes the mechanism described in PEP-293.- Parameters:
errors
- name of the error policy (or null meaning "strict")encoding
- name of encoding that encountered the errortoEncode
- unicode string being encodedstart
- index of first char it couldn't encodeend
- index+1 of last char it couldn't encode (usually becomes the resume point)reason
- contribution to error message if any- Returns:
- must be a tuple
(replacement_unicode, resume_index)
-
insertReplacementAndGetResume
public static int insertReplacementAndGetResume(java.lang.StringBuilder partialDecode, java.lang.String errors, java.lang.String encoding, java.lang.String toDecode, int start, int end, java.lang.String reason)
Handler for errors encountered during decoding, adjusting the output buffer contents and returning the correct position to resume decoding (if the handler does not simply raise an exception).- Parameters:
partialDecode
- output buffer of unicode (as UTF-16) that the codec is buildingerrors
- name of the error policy (or null meaning "strict")encoding
- name of encoding that encountered the errortoDecode
- bytes being decodedstart
- index of first byte it couldn't decodeend
- index+1 of last byte it couldn't decode (usually becomes the resume point)reason
- contribution to error message if any- Returns:
- the resume position: index of next byte to decode
-
decoding_error
public static PyObject decoding_error(java.lang.String errors, java.lang.String encoding, java.lang.String toDecode, int start, int end, java.lang.String reason)
Invoke a user-defined error-handling mechanism, for errors encountered during decoding, as registered throughregister_error(String, PyObject)
. The return value is the return from the error handler indicating the replacement codec output and the the position at which to resume decoding. Invokes the mechanism described in PEP-293.- Parameters:
errors
- name of the error policy (or null meaning "strict")encoding
- name of encoding that encountered the errortoDecode
- bytes being decodedstart
- index of first byte it couldn't decodeend
- index+1 of last byte it couldn't decode (usually becomes the resume point)reason
- contribution to error message if any- Returns:
- must be a tuple
(replacement_unicode, resume_index)
-
calcNewPosition
public static int calcNewPosition(int size, PyObject errorTuple)
Given the return from some codec error handler (invoked while encoding or decoding), which specifies a resume position, and the length of the input being encoded or decoded, check and interpret the resume position. Negative indexes in the error handler return are interpreted as "from the end". If the result would be out of bounds in the input, anIndexError
exception is raised.- Parameters:
size
- of byte buffer being decodederrorTuple
- returned from error handler- Returns:
- absolute resume position.
-
-