We used to convert the unicode object directly to a string in the server
encoding by calling Python's PyUnicode_AsEncodedString function. In other
words, we used Python's routines to do the encoding. However, that has a
few problems. First of all, it required keeping a mapping table of Python
encoding names and PostgreSQL encodings. But the real killer was that Python
doesn't support EUC_TW and MULE_INTERNAL encodings at all.
Instead, convert the Python unicode object to UTF-8, and use PostgreSQL's
encoding conversion functions to convert from UTF-8 to server encoding. We
were already doing the same in the other direction in PLyUnicode_FromString,
so this is more consistent, too.
Note: This makes SQL_ASCII to behave more leniently. We used to map
SQL_ASCII to Python's 'ascii', which on Python means strict 7-bit ASCII
only, so you got an error if the python string contained anything but pure
ASCII. You no longer get an error; you get the UTF-8 representation of the
string instead.
Backpatch to 9.0, where these conversions were introduced.
Jan Urbański
That caused the plpython_unicode regression test to fail on SQL_ASCII
encoding, as evidenced by the buildfarm. The reason is that with the patch,
you don't get the detail in the error message that you got before. That
detail is actually very informative, so rather than just adjust the expected
output, let's revert that part of the patch for now to make the buildfarm
green again, and figure out some other way to avoid the recursion of
PLy_elog() that doesn't lose the detail.
Windows encodings, "win1252" and so forth, are named differently in Python,
like "cp1252". Also, if the PyUnicode_AsEncodedString() function call fails
for some reason, use a plain ereport(), not a PLy_elog(), to report that
error. That avoids recursion and crash, if PLy_elog() tries to call
PLyUnicode_Bytes() again.
This fixes bug reported by Asif Naeem. Backpatch down to 9.0, before that
plpython didn't even try these conversions.
Jan Urbański, with minor comment improvements by me.
Module initialization functions in Python 3 must have external
linkage, because PyMODINIT_FUNC does dllexport on Windows-like
platforms. Without this change, the build with Python 3 fails on
Windows.
Apparently there is no buildfarm critter exercising this case after all,
because it fails in several places. With this patch, build, install,
check-world, and installcheck-world pass for me on OS X.
We must stay in the function's SPI context until done calling the iterator
that returns the set result. Otherwise, any attempt to invoke SPI features
in the python code called by the iterator will malfunction. Diagnosis and
patch by Jan Urbanski, per bug report from Jean-Baptiste Quenot.
Back-patch to 8.2; there was no support for SRFs in previous versions of
plpython.
This is reproducibly possible in Python 2.7 if the user turned
PendingDeprecationWarning into an error, but it's theoretically also possible
in earlier versions in case of exceptional conditions.
backpatched to 8.0
(_PG_init should be called only once anyway, but as long as it's got an
internal guard against repeat calls, that should be in front of the
version check.)
memory if the result had zero rows, and also if there was any sort of error
while converting the result tuples into Python data. Reported and partially
fixed by Andres Freund.
Back-patch to all supported versions. Note: I haven't tested the 7.4 fix.
7.4's configure check for python is so obsolete it doesn't work on my
current machines :-(. The logic change is pretty straightforward though.
In PLy_spi_execute_plan, use the data-type specific Python-to-PostgreSQL
conversion function instead of passing everything through InputFunctionCall
as a string. The equivalent fix was already done months ago for function
parameters and return values, but this other gateway between Python and
PostgreSQL was apparently forgotten. As a result, data types that need
special treatment, such as bytea, would misbehave when used with
plpy.execute.
old memory context in plpython. Before only one of them was marked
volatile, but per report from Zdenek Kotala, some compilers do the
wrong thing here.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
Mimic the Python interpreter's own logic for printing exceptions instead
of just using the straight str() call, so that
you get
plpy.SPIError
instead of
<class 'plpy.SPIError'>
and for built-in exceptions merely
UnicodeEncodeError
Besides looking better this cuts down on the endless version differences
in the regression test expected files.
Behaves more or less unchanged compared to Python 2, but the new language
variant is called plpython3u. Documentation describing the naming scheme
is included.
In PLy_output(), when the elog() call in the TRY branch throws an exception
(this can happen when a statement timeout kicks in, for example), the
PyErr_SetString() call in the CATCH branch can cause a segfault, because the
Py_XDECREF(so) call before it releases memory that is still used by the sv
variable that PyErr_SetString() uses as argument, because sv points into
memory owned by so.
Backpatched back to 8.0, where this code was introduced.
I also threw in a couple of volatile declarations for variables that are used
before and after the TRY. I don't think they caused the crash that I
observed, but they could become issues.
Check calls of PyUnicode_AsEncodedString() for NULL return, probably
because the encoding name is not known. Add special treatment for
SQL_ASCII, which Python definitely does not know.
Since using SQL_ASCII produces errors in the regression tests when
non-ASCII characters are involved, we have to put back various regression
test result variants.
PL/Python now accepts Unicode objects where it previously only accepted string
objects (for example, as return value). Unicode objects are converted to the
PostgreSQL server encoding as necessary.
This change is also necessary for future Python 3 support, which treats all
strings as Unicode objects.
Since this removes the error conditions that the plpython_unicode test file
tested for, the alternative result files are no longer necessary.
Before, PL/Python converted data between SQL and Python by going
through a C string representation. This broke for bytea in two ways:
- On input (function parameters), you would get a Python string that
contains bytea's particular external representation with backslashes
etc., instead of a sequence of bytes, which is what you would expect
in a Python environment. This problem is exacerbated by the new
bytea output format.
- On output (function return value), null bytes in the Python string
would cause truncation before the data gets stored into a bytea
datum.
This is now fixed by converting directly between the PostgreSQL datum
and the Python representation.
The required generalized infrastructure also allows for other
improvements in passing:
- When returning a boolean value, the SQL datum is now true if and
only if Python considers the value that was passed out of the
PL/Python function to be true. Previously, this determination was
left to the boolean data type input function. So, now returning
'foo' results in true, because Python considers it true, rather than
false because PostgreSQL considers it false.
- On input, we can convert the integer and float types directly to
their Python equivalents without having to go through an
intermediate string representation.
original patch by Caleb Welton, with updates by myself