Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
*
|
|
|
|
|
* tqueue.c
|
|
|
|
|
* Use shm_mq to send & receive tuples between parallel backends
|
|
|
|
|
*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Most of the complexity in this module arises from transient RECORD types,
|
|
|
|
|
* which all have type RECORDOID and are distinguished by typmod numbers
|
|
|
|
|
* that are managed per-backend (see src/backend/utils/cache/typcache.c).
|
|
|
|
|
* The sender's set of RECORD typmod assignments probably doesn't match the
|
|
|
|
|
* receiver's. To deal with this, we make the sender send a description
|
|
|
|
|
* of each transient RECORD type appearing in the data it sends. The
|
|
|
|
|
* receiver finds or creates a matching type in its own typcache, and then
|
|
|
|
|
* maps the sender's typmod for that type to its own typmod.
|
|
|
|
|
*
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
* A DestReceiver of type DestTupleQueue, which is a TQueueDestReceiver
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* under the hood, writes tuples from the executor to a shm_mq. If
|
|
|
|
|
* necessary, it also writes control messages describing transient
|
|
|
|
|
* record types used within the tuple.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* A TupleQueueReader reads tuples, and control messages if any are sent,
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* from a shm_mq and returns the tuples. If transient record types are
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* in use, it registers those types locally based on the control messages
|
|
|
|
|
* and rewrites the typmods sent by the remote side to the corresponding
|
|
|
|
|
* local record typmods.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*
|
2016-01-02 13:33:40 -05:00
|
|
|
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
|
*
|
|
|
|
|
* IDENTIFICATION
|
|
|
|
|
* src/backend/executor/tqueue.c
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
|
|
|
|
|
#include "access/htup_details.h"
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
#include "catalog/pg_type.h"
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
#include "executor/tqueue.h"
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
#include "funcapi.h"
|
|
|
|
|
#include "lib/stringinfo.h"
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
#include "miscadmin.h"
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
#include "utils/array.h"
|
|
|
|
|
#include "utils/lsyscache.h"
|
|
|
|
|
#include "utils/memutils.h"
|
|
|
|
|
#include "utils/rangetypes.h"
|
|
|
|
|
#include "utils/syscache.h"
|
|
|
|
|
#include "utils/typcache.h"
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* The data transferred through the shm_mq is divided into messages.
|
|
|
|
|
* One-byte messages are mode-switch messages, telling the receiver to switch
|
|
|
|
|
* between "control" and "data" modes. (We always start up in "data" mode.)
|
|
|
|
|
* Otherwise, when in "data" mode, each message is a tuple. When in "control"
|
|
|
|
|
* mode, each message defines one transient-typmod-to-tupledesc mapping to
|
|
|
|
|
* let us interpret future tuples. Both of those cases certainly require
|
|
|
|
|
* more than one byte, so no confusion is possible.
|
|
|
|
|
*/
|
|
|
|
|
#define TUPLE_QUEUE_MODE_CONTROL 'c' /* mode-switch message contents */
|
|
|
|
|
#define TUPLE_QUEUE_MODE_DATA 'd'
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Both the sender and receiver build trees of TupleRemapInfo nodes to help
|
|
|
|
|
* them identify which (sub) fields of transmitted tuples are composite and
|
|
|
|
|
* may thus need remap processing. We might need to look within arrays and
|
|
|
|
|
* ranges, not only composites, to find composite sub-fields. A NULL
|
|
|
|
|
* TupleRemapInfo pointer indicates that it is known that the described field
|
|
|
|
|
* is not composite and has no composite substructure.
|
|
|
|
|
*
|
|
|
|
|
* Note that we currently have to look at each composite field at runtime,
|
|
|
|
|
* even if we believe it's of a named composite type (i.e., not RECORD).
|
|
|
|
|
* This is because we allow the actual value to be a compatible transient
|
|
|
|
|
* RECORD type. That's grossly inefficient, and it would be good to get
|
|
|
|
|
* rid of the requirement, but it's not clear what would need to change.
|
|
|
|
|
*
|
|
|
|
|
* Also, we allow the top-level tuple structure, as well as the actual
|
|
|
|
|
* structure of composite subfields, to change from one tuple to the next
|
|
|
|
|
* at runtime. This may well be entirely historical, but it's mostly free
|
|
|
|
|
* to support given the previous requirement; and other places in the system
|
|
|
|
|
* also permit this, so it's not entirely clear if we could drop it.
|
|
|
|
|
*/
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
typedef enum
|
|
|
|
|
{
|
|
|
|
|
TQUEUE_REMAP_ARRAY, /* array */
|
|
|
|
|
TQUEUE_REMAP_RANGE, /* range */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQUEUE_REMAP_RECORD /* composite type, named or transient */
|
|
|
|
|
} TupleRemapClass;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
typedef struct TupleRemapInfo TupleRemapInfo;
|
|
|
|
|
|
|
|
|
|
typedef struct ArrayRemapInfo
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
int16 typlen; /* array element type's storage properties */
|
|
|
|
|
bool typbyval;
|
|
|
|
|
char typalign;
|
|
|
|
|
TupleRemapInfo *element_remap; /* array element type's remap info */
|
|
|
|
|
} ArrayRemapInfo;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
typedef struct RangeRemapInfo
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TypeCacheEntry *typcache; /* range type's typcache entry */
|
|
|
|
|
TupleRemapInfo *bound_remap; /* range bound type's remap info */
|
|
|
|
|
} RangeRemapInfo;
|
|
|
|
|
|
|
|
|
|
typedef struct RecordRemapInfo
|
|
|
|
|
{
|
|
|
|
|
/* Original (remote) type ID info last seen for this composite field */
|
|
|
|
|
Oid rectypid;
|
|
|
|
|
int32 rectypmod;
|
|
|
|
|
/* Local RECORD typmod, or -1 if unset; not used on sender side */
|
|
|
|
|
int32 localtypmod;
|
|
|
|
|
/* If no fields of the record require remapping, these are NULL: */
|
|
|
|
|
TupleDesc tupledesc; /* copy of record's tupdesc */
|
|
|
|
|
TupleRemapInfo **field_remap; /* each field's remap info */
|
|
|
|
|
} RecordRemapInfo;
|
|
|
|
|
|
|
|
|
|
struct TupleRemapInfo
|
|
|
|
|
{
|
|
|
|
|
TupleRemapClass remapclass;
|
|
|
|
|
union
|
|
|
|
|
{
|
|
|
|
|
ArrayRemapInfo arr;
|
|
|
|
|
RangeRemapInfo rng;
|
|
|
|
|
RecordRemapInfo rec;
|
|
|
|
|
} u;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* DestReceiver object's private contents
|
|
|
|
|
*
|
|
|
|
|
* queue and tupledesc are pointers to data supplied by DestReceiver's caller.
|
|
|
|
|
* The recordhtab and remap info are owned by the DestReceiver and are kept
|
|
|
|
|
* in mycontext. tmpcontext is a tuple-lifespan context to hold cruft
|
|
|
|
|
* created while traversing each tuple to find record subfields.
|
|
|
|
|
*/
|
|
|
|
|
typedef struct TQueueDestReceiver
|
|
|
|
|
{
|
|
|
|
|
DestReceiver pub; /* public fields */
|
|
|
|
|
shm_mq_handle *queue; /* shm_mq to send to */
|
|
|
|
|
MemoryContext mycontext; /* context containing TQueueDestReceiver */
|
|
|
|
|
MemoryContext tmpcontext; /* per-tuple context, if needed */
|
|
|
|
|
HTAB *recordhtab; /* table of transmitted typmods, if needed */
|
|
|
|
|
char mode; /* current message mode */
|
|
|
|
|
TupleDesc tupledesc; /* current top-level tuple descriptor */
|
|
|
|
|
TupleRemapInfo **field_remapinfo; /* current top-level remap info */
|
2016-06-09 18:02:36 -04:00
|
|
|
} TQueueDestReceiver;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/*
|
|
|
|
|
* Hash table entries for mapping remote to local typmods.
|
|
|
|
|
*/
|
|
|
|
|
typedef struct RecordTypmodMap
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
int32 remotetypmod; /* hash key (must be first!) */
|
|
|
|
|
int32 localtypmod;
|
|
|
|
|
} RecordTypmodMap;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/*
|
|
|
|
|
* TupleQueueReader object's private contents
|
|
|
|
|
*
|
|
|
|
|
* queue and tupledesc are pointers to data supplied by reader's caller.
|
|
|
|
|
* The typmodmap and remap info are owned by the TupleQueueReader and
|
|
|
|
|
* are kept in mycontext.
|
|
|
|
|
*
|
|
|
|
|
* "typedef struct TupleQueueReader TupleQueueReader" is in tqueue.h
|
|
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
struct TupleQueueReader
|
|
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
shm_mq_handle *queue; /* shm_mq to receive from */
|
|
|
|
|
MemoryContext mycontext; /* context containing TupleQueueReader */
|
|
|
|
|
HTAB *typmodmap; /* RecordTypmodMap hashtable, if needed */
|
|
|
|
|
char mode; /* current message mode */
|
|
|
|
|
TupleDesc tupledesc; /* current top-level tuple descriptor */
|
|
|
|
|
TupleRemapInfo **field_remapinfo; /* current top-level remap info */
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
};
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Local function prototypes */
|
|
|
|
|
static void TQExamine(TQueueDestReceiver *tqueue,
|
|
|
|
|
TupleRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQExamineArray(TQueueDestReceiver *tqueue,
|
|
|
|
|
ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQExamineRange(TQueueDestReceiver *tqueue,
|
|
|
|
|
RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQExamineRecord(TQueueDestReceiver *tqueue,
|
|
|
|
|
RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQSendRecordInfo(TQueueDestReceiver *tqueue, int32 typmod,
|
|
|
|
|
TupleDesc tupledesc);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
static void TupleQueueHandleControlMessage(TupleQueueReader *reader,
|
|
|
|
|
Size nbytes, char *data);
|
|
|
|
|
static HeapTuple TupleQueueHandleDataMessage(TupleQueueReader *reader,
|
|
|
|
|
Size nbytes, HeapTupleHeader data);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
static HeapTuple TQRemapTuple(TupleQueueReader *reader,
|
|
|
|
|
TupleDesc tupledesc,
|
|
|
|
|
TupleRemapInfo **field_remapinfo,
|
|
|
|
|
HeapTuple tuple);
|
|
|
|
|
static Datum TQRemap(TupleQueueReader *reader, TupleRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static Datum TQRemapArray(TupleQueueReader *reader, ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static Datum TQRemapRange(TupleQueueReader *reader, RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static Datum TQRemapRecord(TupleQueueReader *reader, RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static TupleRemapInfo *BuildTupleRemapInfo(Oid typid, MemoryContext mycontext);
|
|
|
|
|
static TupleRemapInfo *BuildArrayRemapInfo(Oid elemtypid,
|
|
|
|
|
MemoryContext mycontext);
|
|
|
|
|
static TupleRemapInfo *BuildRangeRemapInfo(Oid rngtypid,
|
|
|
|
|
MemoryContext mycontext);
|
|
|
|
|
static TupleRemapInfo **BuildFieldRemapInfo(TupleDesc tupledesc,
|
|
|
|
|
MemoryContext mycontext);
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Receive a tuple from a query, and send it to the designated shm_mq.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Returns TRUE if successful, FALSE if shm_mq has been detached.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*/
|
2016-06-06 14:52:58 -04:00
|
|
|
static bool
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
|
|
|
|
|
{
|
|
|
|
|
TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
TupleDesc tupledesc = slot->tts_tupleDescriptor;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
HeapTuple tuple;
|
2016-06-06 14:52:58 -04:00
|
|
|
shm_mq_result result;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* If first time through, compute remapping info for the top-level fields.
|
|
|
|
|
* On later calls, if the tupledesc has changed, set up for the new
|
|
|
|
|
* tupledesc. (This is a strange test both because the executor really
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* shouldn't change the tupledesc, and also because it would be unsafe if
|
|
|
|
|
* the old tupledesc could be freed and a new one allocated at the same
|
2015-11-18 08:25:33 -05:00
|
|
|
* address. But since some very old code in printtup.c uses a similar
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* approach, we adopt it here as well.)
|
|
|
|
|
*
|
|
|
|
|
* Here and elsewhere in this module, when replacing remapping info we
|
|
|
|
|
* pfree the top-level object because that's easy, but we don't bother to
|
|
|
|
|
* recursively free any substructure. This would lead to query-lifespan
|
|
|
|
|
* memory leaks if the mapping info actually changed frequently, but since
|
|
|
|
|
* we don't expect that to happen, it doesn't seem worth expending code to
|
|
|
|
|
* prevent it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
2015-11-18 08:25:33 -05:00
|
|
|
if (tqueue->tupledesc != tupledesc)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (tqueue->field_remapinfo != NULL)
|
|
|
|
|
pfree(tqueue->field_remapinfo);
|
|
|
|
|
tqueue->field_remapinfo = BuildFieldRemapInfo(tupledesc,
|
|
|
|
|
tqueue->mycontext);
|
2015-11-18 08:25:33 -05:00
|
|
|
tqueue->tupledesc = tupledesc;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* When, because of the types being transmitted, no record typmod mapping
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* can be needed, we can skip a good deal of work.
|
|
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (tqueue->field_remapinfo != NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TupleRemapInfo **remapinfo = tqueue->field_remapinfo;
|
|
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
MemoryContext oldcontext = NULL;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Deform the tuple so we can examine fields, if not done already. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
slot_getallattrs(slot);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Iterate over each attribute and search it for transient typmods. */
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
/* Ignore nulls and types that don't need special handling. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (slot->tts_isnull[i] || remapinfo[i] == NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
/* Switch to temporary memory context to avoid leaking. */
|
|
|
|
|
if (oldcontext == NULL)
|
|
|
|
|
{
|
|
|
|
|
if (tqueue->tmpcontext == NULL)
|
|
|
|
|
tqueue->tmpcontext =
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
AllocSetContextCreate(tqueue->mycontext,
|
|
|
|
|
"tqueue sender temp context",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
oldcontext = MemoryContextSwitchTo(tqueue->tmpcontext);
|
|
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Examine the value. */
|
|
|
|
|
TQExamine(tqueue, remapinfo[i], slot->tts_values[i]);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* If we used the temp context, reset it and restore prior context. */
|
|
|
|
|
if (oldcontext != NULL)
|
|
|
|
|
{
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
MemoryContextReset(tqueue->tmpcontext);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* If we entered control mode, switch back to data mode. */
|
|
|
|
|
if (tqueue->mode != TUPLE_QUEUE_MODE_DATA)
|
|
|
|
|
{
|
|
|
|
|
tqueue->mode = TUPLE_QUEUE_MODE_DATA;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
shm_mq_send(tqueue->queue, sizeof(char), &tqueue->mode, false);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Send the tuple itself. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
tuple = ExecMaterializeSlot(slot);
|
|
|
|
|
result = shm_mq_send(tqueue->queue, tuple->t_len, tuple->t_data, false);
|
2016-06-06 14:52:58 -04:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Check for failure. */
|
2016-06-06 14:52:58 -04:00
|
|
|
if (result == SHM_MQ_DETACHED)
|
|
|
|
|
return false;
|
|
|
|
|
else if (result != SHM_MQ_SUCCESS)
|
|
|
|
|
ereport(ERROR,
|
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
errmsg("could not send tuple to shared-memory queue")));
|
2016-06-06 14:52:58 -04:00
|
|
|
|
|
|
|
|
return true;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Examine the given datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
|
|
|
|
*
|
|
|
|
|
* remapinfo is previously-computed remapping info about the datum's type.
|
|
|
|
|
*
|
|
|
|
|
* This function just dispatches based on the remap class.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamine(TQueueDestReceiver *tqueue, TupleRemapInfo *remapinfo, Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* This is recursive, so it could be driven to stack overflow. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
check_stack_depth();
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
switch (remapinfo->remapclass)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
case TQUEUE_REMAP_ARRAY:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamineArray(tqueue, &remapinfo->u.arr, value);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
break;
|
|
|
|
|
case TQUEUE_REMAP_RANGE:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamineRange(tqueue, &remapinfo->u.rng, value);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
break;
|
|
|
|
|
case TQUEUE_REMAP_RECORD:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamineRecord(tqueue, &remapinfo->u.rec, value);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Examine a record datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamineRecord(TQueueDestReceiver *tqueue, RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
HeapTupleHeader tup;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Oid typid;
|
|
|
|
|
int32 typmod;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
TupleDesc tupledesc;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Extract type OID and typmod from tuple. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
tup = DatumGetHeapTupleHeader(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
typid = HeapTupleHeaderGetTypeId(tup);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
typmod = HeapTupleHeaderGetTypMod(tup);
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* If first time through, or if this isn't the same composite type as last
|
|
|
|
|
* time, consider sending a control message, and then look up the
|
|
|
|
|
* necessary information for examining the fields.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (typid != remapinfo->rectypid || typmod != remapinfo->rectypmod)
|
|
|
|
|
{
|
|
|
|
|
/* Free any old data. */
|
|
|
|
|
if (remapinfo->tupledesc != NULL)
|
|
|
|
|
FreeTupleDesc(remapinfo->tupledesc);
|
|
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
pfree(remapinfo->field_remap);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Look up tuple descriptor in typcache. */
|
|
|
|
|
tupledesc = lookup_rowtype_tupdesc(typid, typmod);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If this is a transient record type, send the tupledesc in a control
|
|
|
|
|
* message. (TQSendRecordInfo is smart enough to do this only once
|
|
|
|
|
* per typmod.)
|
|
|
|
|
*/
|
|
|
|
|
if (typid == RECORDOID)
|
|
|
|
|
TQSendRecordInfo(tqueue, typmod, tupledesc);
|
|
|
|
|
|
|
|
|
|
/* Figure out whether fields need recursive processing. */
|
|
|
|
|
remapinfo->field_remap = BuildFieldRemapInfo(tupledesc,
|
|
|
|
|
tqueue->mycontext);
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* We need to inspect the record contents, so save a copy of the
|
|
|
|
|
* tupdesc. (We could possibly just reference the typcache's
|
|
|
|
|
* copy, but then it's problematic when to release the refcount.)
|
|
|
|
|
*/
|
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(tqueue->mycontext);
|
|
|
|
|
|
|
|
|
|
remapinfo->tupledesc = CreateTupleDescCopy(tupledesc);
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
/* No fields of the record require remapping. */
|
|
|
|
|
remapinfo->tupledesc = NULL;
|
|
|
|
|
}
|
|
|
|
|
remapinfo->rectypid = typid;
|
|
|
|
|
remapinfo->rectypmod = typmod;
|
|
|
|
|
|
|
|
|
|
/* Release reference count acquired by lookup_rowtype_tupdesc. */
|
|
|
|
|
DecrTupleDescRefCount(tupledesc);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* If field remapping is required, deform the tuple and examine each
|
|
|
|
|
* field.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (remapinfo->field_remap != NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
Datum *values;
|
|
|
|
|
bool *isnull;
|
|
|
|
|
HeapTupleData tdata;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Deform the tuple so we can check each column within. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
tupledesc = remapinfo->tupledesc;
|
|
|
|
|
values = (Datum *) palloc(tupledesc->natts * sizeof(Datum));
|
|
|
|
|
isnull = (bool *) palloc(tupledesc->natts * sizeof(bool));
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
tdata.t_len = HeapTupleHeaderGetDatumLength(tup);
|
|
|
|
|
ItemPointerSetInvalid(&(tdata.t_self));
|
|
|
|
|
tdata.t_tableOid = InvalidOid;
|
|
|
|
|
tdata.t_data = tup;
|
|
|
|
|
heap_deform_tuple(&tdata, tupledesc, values, isnull);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Recursively check each interesting non-NULL attribute. */
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
|
|
|
|
{
|
|
|
|
|
if (!isnull[i] && remapinfo->field_remap[i])
|
|
|
|
|
TQExamine(tqueue, remapinfo->field_remap[i], values[i]);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Need not clean up, since we're in a short-lived context. */
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Examine an array datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamineArray(TQueueDestReceiver *tqueue, ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
ArrayType *arr = DatumGetArrayTypeP(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Oid typid = ARR_ELEMTYPE(arr);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
Datum *elem_values;
|
|
|
|
|
bool *elem_nulls;
|
|
|
|
|
int num_elems;
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/* Deconstruct the array. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
deconstruct_array(arr, typid, remapinfo->typlen,
|
|
|
|
|
remapinfo->typbyval, remapinfo->typalign,
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
&elem_values, &elem_nulls, &num_elems);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Examine each element. */
|
|
|
|
|
for (i = 0; i < num_elems; i++)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
if (!elem_nulls[i])
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamine(tqueue, remapinfo->element_remap, elem_values[i]);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Examine a range datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamineRange(TQueueDestReceiver *tqueue, RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
RangeType *range = DatumGetRangeType(value);
|
|
|
|
|
RangeBound lower;
|
|
|
|
|
RangeBound upper;
|
|
|
|
|
bool empty;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Extract the lower and upper bounds. */
|
|
|
|
|
range_deserialize(remapinfo->typcache, range, &lower, &upper, &empty);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Nothing to do for an empty range. */
|
|
|
|
|
if (empty)
|
|
|
|
|
return;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Examine each bound, if present. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
if (!upper.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamine(tqueue, remapinfo->bound_remap, upper.val);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
if (!lower.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQExamine(tqueue, remapinfo->bound_remap, lower.val);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Send tuple descriptor information for a transient typmod, unless we've
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* already done so previously.
|
|
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQSendRecordInfo(TQueueDestReceiver *tqueue, int32 typmod, TupleDesc tupledesc)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
StringInfoData buf;
|
|
|
|
|
bool found;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Initialize hash table if not done yet. */
|
|
|
|
|
if (tqueue->recordhtab == NULL)
|
|
|
|
|
{
|
|
|
|
|
HASHCTL ctl;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
MemSet(&ctl, 0, sizeof(ctl));
|
|
|
|
|
/* Hash table entries are just typmods */
|
|
|
|
|
ctl.keysize = sizeof(int32);
|
|
|
|
|
ctl.entrysize = sizeof(int32);
|
|
|
|
|
ctl.hcxt = tqueue->mycontext;
|
|
|
|
|
tqueue->recordhtab = hash_create("tqueue sender record type hashtable",
|
2016-07-28 02:08:52 -04:00
|
|
|
100, &ctl,
|
|
|
|
|
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Have we already seen this record type? If not, must report it. */
|
|
|
|
|
hash_search(tqueue->recordhtab, &typmod, HASH_ENTER, &found);
|
|
|
|
|
if (found)
|
|
|
|
|
return;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
elog(DEBUG3, "sending tqueue control message for record typmod %d", typmod);
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/* If message queue is in data mode, switch to control mode. */
|
|
|
|
|
if (tqueue->mode != TUPLE_QUEUE_MODE_CONTROL)
|
|
|
|
|
{
|
|
|
|
|
tqueue->mode = TUPLE_QUEUE_MODE_CONTROL;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
shm_mq_send(tqueue->queue, sizeof(char), &tqueue->mode, false);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Assemble a control message. */
|
|
|
|
|
initStringInfo(&buf);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
appendBinaryStringInfo(&buf, (char *) &typmod, sizeof(int32));
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
appendBinaryStringInfo(&buf, (char *) &tupledesc->natts, sizeof(int));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
appendBinaryStringInfo(&buf, (char *) &tupledesc->tdhasoid, sizeof(bool));
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
appendBinaryStringInfo(&buf, (char *) tupledesc->attrs[i],
|
|
|
|
|
sizeof(FormData_pg_attribute));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Send control message. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
shm_mq_send(tqueue->queue, buf.len, buf.data, false);
|
|
|
|
|
|
|
|
|
|
/* We assume it's OK to leak buf because we're in a short-lived context. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
/*
|
|
|
|
|
* Prepare to receive tuples from executor.
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
tqueueStartupReceiver(DestReceiver *self, int operation, TupleDesc typeinfo)
|
|
|
|
|
{
|
|
|
|
|
/* do nothing */
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Clean up at end of an executor run
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
tqueueShutdownReceiver(DestReceiver *self)
|
|
|
|
|
{
|
2015-09-28 21:55:57 -04:00
|
|
|
TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
shm_mq_detach(shm_mq_get_queue(tqueue->queue));
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Destroy receiver when done with it
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
tqueueDestroyReceiver(DestReceiver *self)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
|
|
|
|
|
|
|
|
|
|
if (tqueue->tmpcontext != NULL)
|
|
|
|
|
MemoryContextDelete(tqueue->tmpcontext);
|
|
|
|
|
if (tqueue->recordhtab != NULL)
|
|
|
|
|
hash_destroy(tqueue->recordhtab);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (tqueue->field_remapinfo != NULL)
|
|
|
|
|
pfree(tqueue->field_remapinfo);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
pfree(self);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Create a DestReceiver that writes tuples to a tuple queue.
|
|
|
|
|
*/
|
|
|
|
|
DestReceiver *
|
|
|
|
|
CreateTupleQueueDestReceiver(shm_mq_handle *handle)
|
|
|
|
|
{
|
|
|
|
|
TQueueDestReceiver *self;
|
|
|
|
|
|
|
|
|
|
self = (TQueueDestReceiver *) palloc0(sizeof(TQueueDestReceiver));
|
|
|
|
|
|
|
|
|
|
self->pub.receiveSlot = tqueueReceiveSlot;
|
|
|
|
|
self->pub.rStartup = tqueueStartupReceiver;
|
|
|
|
|
self->pub.rShutdown = tqueueShutdownReceiver;
|
|
|
|
|
self->pub.rDestroy = tqueueDestroyReceiver;
|
|
|
|
|
self->pub.mydest = DestTupleQueue;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
self->queue = handle;
|
|
|
|
|
self->mycontext = CurrentMemoryContext;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
self->tmpcontext = NULL;
|
|
|
|
|
self->recordhtab = NULL;
|
|
|
|
|
self->mode = TUPLE_QUEUE_MODE_DATA;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Top-level tupledesc is not known yet */
|
|
|
|
|
self->tupledesc = NULL;
|
|
|
|
|
self->field_remapinfo = NULL;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
|
|
|
|
return (DestReceiver *) self;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* Create a tuple queue reader.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
TupleQueueReader *
|
|
|
|
|
CreateTupleQueueReader(shm_mq_handle *handle, TupleDesc tupledesc)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
TupleQueueReader *reader = palloc0(sizeof(TupleQueueReader));
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
reader->queue = handle;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
reader->mycontext = CurrentMemoryContext;
|
|
|
|
|
reader->typmodmap = NULL;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
reader->mode = TUPLE_QUEUE_MODE_DATA;
|
|
|
|
|
reader->tupledesc = tupledesc;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
reader->field_remapinfo = BuildFieldRemapInfo(tupledesc, reader->mycontext);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
return reader;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* Destroy a tuple queue reader.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*/
|
|
|
|
|
void
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
DestroyTupleQueueReader(TupleQueueReader *reader)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
shm_mq_detach(shm_mq_get_queue(reader->queue));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (reader->typmodmap != NULL)
|
|
|
|
|
hash_destroy(reader->typmodmap);
|
|
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (reader->field_remapinfo != NULL)
|
|
|
|
|
pfree(reader->field_remapinfo);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
pfree(reader);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Fetch a tuple from a tuple queue reader.
|
|
|
|
|
*
|
2016-07-29 19:31:06 -04:00
|
|
|
* The return value is NULL if there are no remaining tuples or if
|
|
|
|
|
* nowait = true and no tuple is ready to return. *done, if not NULL,
|
|
|
|
|
* is set to true when there are no remaining tuples and otherwise to false.
|
|
|
|
|
*
|
|
|
|
|
* The returned tuple, if any, is allocated in CurrentMemoryContext.
|
|
|
|
|
* That should be a short-lived (tuple-lifespan) context, because we are
|
|
|
|
|
* pretty cavalier about leaking memory in that context if we have to do
|
|
|
|
|
* tuple remapping.
|
|
|
|
|
*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* Even when shm_mq_receive() returns SHM_MQ_WOULD_BLOCK, this can still
|
|
|
|
|
* accumulate bytes from a partially-read message, so it's useful to call
|
|
|
|
|
* this with nowait = true even if nothing is returned.
|
|
|
|
|
*/
|
|
|
|
|
HeapTuple
|
|
|
|
|
TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
|
|
|
|
|
{
|
|
|
|
|
shm_mq_result result;
|
|
|
|
|
|
|
|
|
|
if (done != NULL)
|
|
|
|
|
*done = false;
|
|
|
|
|
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
Size nbytes;
|
|
|
|
|
void *data;
|
|
|
|
|
|
|
|
|
|
/* Attempt to read a message. */
|
2015-12-18 12:37:43 -05:00
|
|
|
result = shm_mq_receive(reader->queue, &nbytes, &data, nowait);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* If queue is detached, set *done and return NULL. */
|
|
|
|
|
if (result == SHM_MQ_DETACHED)
|
|
|
|
|
{
|
|
|
|
|
if (done != NULL)
|
|
|
|
|
*done = true;
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* In non-blocking mode, bail out if no message ready yet. */
|
|
|
|
|
if (result == SHM_MQ_WOULD_BLOCK)
|
|
|
|
|
return NULL;
|
|
|
|
|
Assert(result == SHM_MQ_SUCCESS);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* We got a message (see message spec at top of file). Process it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
if (nbytes == 1)
|
|
|
|
|
{
|
|
|
|
|
/* Mode switch message. */
|
|
|
|
|
reader->mode = ((char *) data)[0];
|
|
|
|
|
}
|
|
|
|
|
else if (reader->mode == TUPLE_QUEUE_MODE_DATA)
|
|
|
|
|
{
|
|
|
|
|
/* Tuple data. */
|
|
|
|
|
return TupleQueueHandleDataMessage(reader, nbytes, data);
|
|
|
|
|
}
|
|
|
|
|
else if (reader->mode == TUPLE_QUEUE_MODE_CONTROL)
|
|
|
|
|
{
|
|
|
|
|
/* Control message, describing a transient record type. */
|
|
|
|
|
TupleQueueHandleControlMessage(reader, nbytes, data);
|
|
|
|
|
}
|
|
|
|
|
else
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
elog(ERROR, "unrecognized tqueue mode: %d", (int) reader->mode);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
* Handle a data message - that is, a tuple - from the remote side.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
static HeapTuple
|
|
|
|
|
TupleQueueHandleDataMessage(TupleQueueReader *reader,
|
|
|
|
|
Size nbytes,
|
|
|
|
|
HeapTupleHeader data)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
HeapTupleData htup;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/*
|
|
|
|
|
* Set up a dummy HeapTupleData pointing to the data from the shm_mq
|
|
|
|
|
* (which had better be sufficiently aligned).
|
|
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
ItemPointerSetInvalid(&htup.t_self);
|
|
|
|
|
htup.t_tableOid = InvalidOid;
|
|
|
|
|
htup.t_len = nbytes;
|
|
|
|
|
htup.t_data = data;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/*
|
|
|
|
|
* Either just copy the data into a regular palloc'd tuple, or remap it,
|
|
|
|
|
* as required.
|
|
|
|
|
*/
|
|
|
|
|
return TQRemapTuple(reader,
|
|
|
|
|
reader->tupledesc,
|
|
|
|
|
reader->field_remapinfo,
|
|
|
|
|
&htup);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Copy the given tuple, remapping any transient typmods contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static HeapTuple
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQRemapTuple(TupleQueueReader *reader,
|
|
|
|
|
TupleDesc tupledesc,
|
|
|
|
|
TupleRemapInfo **field_remapinfo,
|
|
|
|
|
HeapTuple tuple)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
Datum *values;
|
|
|
|
|
bool *isnull;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
bool changed = false;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If no remapping is necessary, just copy the tuple into a single
|
|
|
|
|
* palloc'd chunk, as caller will expect.
|
|
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (field_remapinfo == NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
return heap_copytuple(tuple);
|
|
|
|
|
|
|
|
|
|
/* Deform tuple so we can remap record typmods for individual attrs. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
values = (Datum *) palloc(tupledesc->natts * sizeof(Datum));
|
|
|
|
|
isnull = (bool *) palloc(tupledesc->natts * sizeof(bool));
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
heap_deform_tuple(tuple, tupledesc, values, isnull);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Recursively process each interesting non-NULL attribute. */
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (isnull[i] || field_remapinfo[i] == NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
continue;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
values[i] = TQRemap(reader, field_remapinfo[i], values[i], &changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Reconstruct the modified tuple, if anything was modified. */
|
|
|
|
|
if (changed)
|
|
|
|
|
return heap_form_tuple(tupledesc, values, isnull);
|
|
|
|
|
else
|
|
|
|
|
return heap_copytuple(tuple);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Process the given datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
|
|
|
|
*
|
|
|
|
|
* remapinfo is previously-computed remapping info about the datum's type.
|
|
|
|
|
*
|
|
|
|
|
* This function just dispatches based on the remap class.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQRemap(TupleQueueReader *reader, TupleRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* This is recursive, so it could be driven to stack overflow. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
check_stack_depth();
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
switch (remapinfo->remapclass)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
case TQUEUE_REMAP_ARRAY:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
return TQRemapArray(reader, &remapinfo->u.arr, value, changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
case TQUEUE_REMAP_RANGE:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
return TQRemapRange(reader, &remapinfo->u.rng, value, changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
case TQUEUE_REMAP_RECORD:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
return TQRemapRecord(reader, &remapinfo->u.rec, value, changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
2015-11-06 23:04:21 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
elog(ERROR, "unrecognized tqueue remap class: %d",
|
|
|
|
|
(int) remapinfo->remapclass);
|
2015-11-09 10:45:32 -05:00
|
|
|
return (Datum) 0;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Process the given array datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQRemapArray(TupleQueueReader *reader, ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
ArrayType *arr = DatumGetArrayTypeP(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Oid typid = ARR_ELEMTYPE(arr);
|
|
|
|
|
bool element_changed = false;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
Datum *elem_values;
|
|
|
|
|
bool *elem_nulls;
|
|
|
|
|
int num_elems;
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/* Deconstruct the array. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
deconstruct_array(arr, typid, remapinfo->typlen,
|
|
|
|
|
remapinfo->typbyval, remapinfo->typalign,
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
&elem_values, &elem_nulls, &num_elems);
|
|
|
|
|
|
|
|
|
|
/* Remap each element. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
for (i = 0; i < num_elems; i++)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
if (!elem_nulls[i])
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
elem_values[i] = TQRemap(reader,
|
|
|
|
|
remapinfo->element_remap,
|
|
|
|
|
elem_values[i],
|
|
|
|
|
&element_changed);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (element_changed)
|
|
|
|
|
{
|
|
|
|
|
/* Reconstruct and return the array. */
|
|
|
|
|
*changed = true;
|
|
|
|
|
arr = construct_md_array(elem_values, elem_nulls,
|
|
|
|
|
ARR_NDIM(arr), ARR_DIMS(arr), ARR_LBOUND(arr),
|
|
|
|
|
typid, remapinfo->typlen,
|
|
|
|
|
remapinfo->typbyval, remapinfo->typalign);
|
|
|
|
|
return PointerGetDatum(arr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Else just return the value as-is. */
|
|
|
|
|
return value;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Process the given range datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQRemapRange(TupleQueueReader *reader, RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
RangeType *range = DatumGetRangeType(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
bool bound_changed = false;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
RangeBound lower;
|
|
|
|
|
RangeBound upper;
|
|
|
|
|
bool empty;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Extract the lower and upper bounds. */
|
|
|
|
|
range_deserialize(remapinfo->typcache, range, &lower, &upper, &empty);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Nothing to do for an empty range. */
|
|
|
|
|
if (empty)
|
|
|
|
|
return value;
|
|
|
|
|
|
|
|
|
|
/* Remap each bound, if present. */
|
|
|
|
|
if (!upper.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
upper.val = TQRemap(reader, remapinfo->bound_remap,
|
|
|
|
|
upper.val, &bound_changed);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
if (!lower.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
lower.val = TQRemap(reader, remapinfo->bound_remap,
|
|
|
|
|
lower.val, &bound_changed);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (bound_changed)
|
|
|
|
|
{
|
|
|
|
|
/* Reserialize. */
|
|
|
|
|
*changed = true;
|
|
|
|
|
range = range_serialize(remapinfo->typcache, &lower, &upper, empty);
|
|
|
|
|
return RangeTypeGetDatum(range);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Else just return the value as-is. */
|
|
|
|
|
return value;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Process the given record datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
|
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TQRemapRecord(TupleQueueReader *reader, RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
|
|
|
|
HeapTupleHeader tup;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Oid typid;
|
|
|
|
|
int32 typmod;
|
|
|
|
|
bool changed_typmod;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
TupleDesc tupledesc;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Extract type OID and typmod from tuple. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
tup = DatumGetHeapTupleHeader(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
typid = HeapTupleHeaderGetTypeId(tup);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
typmod = HeapTupleHeaderGetTypMod(tup);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/*
|
|
|
|
|
* If first time through, or if this isn't the same composite type as last
|
|
|
|
|
* time, identify the required typmod mapping, and then look up the
|
|
|
|
|
* necessary information for processing the fields.
|
|
|
|
|
*/
|
|
|
|
|
if (typid != remapinfo->rectypid || typmod != remapinfo->rectypmod)
|
|
|
|
|
{
|
|
|
|
|
/* Free any old data. */
|
|
|
|
|
if (remapinfo->tupledesc != NULL)
|
|
|
|
|
FreeTupleDesc(remapinfo->tupledesc);
|
|
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
pfree(remapinfo->field_remap);
|
|
|
|
|
|
|
|
|
|
/* If transient record type, look up matching local typmod. */
|
|
|
|
|
if (typid == RECORDOID)
|
|
|
|
|
{
|
|
|
|
|
RecordTypmodMap *mapent;
|
|
|
|
|
|
|
|
|
|
Assert(reader->typmodmap != NULL);
|
|
|
|
|
mapent = hash_search(reader->typmodmap, &typmod,
|
|
|
|
|
HASH_FIND, NULL);
|
|
|
|
|
if (mapent == NULL)
|
|
|
|
|
elog(ERROR, "tqueue received unrecognized remote typmod %d",
|
|
|
|
|
typmod);
|
|
|
|
|
remapinfo->localtypmod = mapent->localtypmod;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
remapinfo->localtypmod = -1;
|
|
|
|
|
|
|
|
|
|
/* Look up tuple descriptor in typcache. */
|
|
|
|
|
tupledesc = lookup_rowtype_tupdesc(typid, remapinfo->localtypmod);
|
|
|
|
|
|
|
|
|
|
/* Figure out whether fields need recursive processing. */
|
|
|
|
|
remapinfo->field_remap = BuildFieldRemapInfo(tupledesc,
|
|
|
|
|
reader->mycontext);
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* We need to inspect the record contents, so save a copy of the
|
|
|
|
|
* tupdesc. (We could possibly just reference the typcache's
|
|
|
|
|
* copy, but then it's problematic when to release the refcount.)
|
|
|
|
|
*/
|
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(reader->mycontext);
|
|
|
|
|
|
|
|
|
|
remapinfo->tupledesc = CreateTupleDescCopy(tupledesc);
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
/* No fields of the record require remapping. */
|
|
|
|
|
remapinfo->tupledesc = NULL;
|
|
|
|
|
}
|
|
|
|
|
remapinfo->rectypid = typid;
|
|
|
|
|
remapinfo->rectypmod = typmod;
|
|
|
|
|
|
|
|
|
|
/* Release reference count acquired by lookup_rowtype_tupdesc. */
|
|
|
|
|
DecrTupleDescRefCount(tupledesc);
|
|
|
|
|
}
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/* If transient record, replace remote typmod with local typmod. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (typid == RECORDOID && typmod != remapinfo->localtypmod)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
typmod = remapinfo->localtypmod;
|
|
|
|
|
changed_typmod = true;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
else
|
|
|
|
|
changed_typmod = false;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* If we need to change the typmod, or if there are any potentially
|
|
|
|
|
* remappable fields, replace the tuple.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (changed_typmod || remapinfo->field_remap != NULL)
|
|
|
|
|
{
|
|
|
|
|
HeapTupleData htup;
|
|
|
|
|
HeapTuple atup;
|
|
|
|
|
|
|
|
|
|
/* For now, assume we always need to change the tuple in this case. */
|
|
|
|
|
*changed = true;
|
|
|
|
|
|
|
|
|
|
/* Copy tuple, possibly remapping contained fields. */
|
|
|
|
|
ItemPointerSetInvalid(&htup.t_self);
|
|
|
|
|
htup.t_tableOid = InvalidOid;
|
|
|
|
|
htup.t_len = HeapTupleHeaderGetDatumLength(tup);
|
|
|
|
|
htup.t_data = tup;
|
|
|
|
|
atup = TQRemapTuple(reader,
|
|
|
|
|
remapinfo->tupledesc,
|
|
|
|
|
remapinfo->field_remap,
|
|
|
|
|
&htup);
|
|
|
|
|
|
|
|
|
|
/* Apply the correct labeling for a local Datum. */
|
|
|
|
|
HeapTupleHeaderSetTypeId(atup->t_data, typid);
|
|
|
|
|
HeapTupleHeaderSetTypMod(atup->t_data, typmod);
|
|
|
|
|
HeapTupleHeaderSetDatumLength(atup->t_data, htup.t_len);
|
|
|
|
|
|
|
|
|
|
/* And return the results. */
|
|
|
|
|
return HeapTupleHeaderGetDatum(atup->t_data);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Else just return the value as-is. */
|
|
|
|
|
return value;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/*
|
|
|
|
|
* Handle a control message from the tuple queue reader.
|
|
|
|
|
*
|
|
|
|
|
* Control messages are sent when the remote side is sending tuples that
|
|
|
|
|
* contain transient record types. We need to arrange to bless those
|
|
|
|
|
* record types locally and translate between remote and local typmods.
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
TupleQueueHandleControlMessage(TupleQueueReader *reader, Size nbytes,
|
|
|
|
|
char *data)
|
|
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
int32 remotetypmod;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
int natts;
|
|
|
|
|
bool hasoid;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Size offset = 0;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
Form_pg_attribute *attrs;
|
|
|
|
|
TupleDesc tupledesc;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
RecordTypmodMap *mapent;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
bool found;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Extract remote typmod. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
memcpy(&remotetypmod, &data[offset], sizeof(int32));
|
|
|
|
|
offset += sizeof(int32);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Extract attribute count. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
memcpy(&natts, &data[offset], sizeof(int));
|
|
|
|
|
offset += sizeof(int);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
|
|
|
|
/* Extract hasoid flag. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
memcpy(&hasoid, &data[offset], sizeof(bool));
|
|
|
|
|
offset += sizeof(bool);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Extract attribute details. The tupledesc made here is just transient. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
attrs = palloc(natts * sizeof(Form_pg_attribute));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
for (i = 0; i < natts; i++)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
attrs[i] = palloc(sizeof(FormData_pg_attribute));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
memcpy(attrs[i], &data[offset], sizeof(FormData_pg_attribute));
|
|
|
|
|
offset += sizeof(FormData_pg_attribute);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
/* We should have read the whole message. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Assert(offset == nbytes);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Construct TupleDesc, and assign a local typmod. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
tupledesc = CreateTupleDesc(natts, hasoid, attrs);
|
|
|
|
|
tupledesc = BlessTupleDesc(tupledesc);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Create mapping hashtable if it doesn't exist already. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
if (reader->typmodmap == NULL)
|
|
|
|
|
{
|
|
|
|
|
HASHCTL ctl;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
MemSet(&ctl, 0, sizeof(ctl));
|
|
|
|
|
ctl.keysize = sizeof(int32);
|
|
|
|
|
ctl.entrysize = sizeof(RecordTypmodMap);
|
|
|
|
|
ctl.hcxt = reader->mycontext;
|
|
|
|
|
reader->typmodmap = hash_create("tqueue receiver record type hashtable",
|
2016-07-28 02:08:52 -04:00
|
|
|
100, &ctl,
|
|
|
|
|
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Create map entry. */
|
|
|
|
|
mapent = hash_search(reader->typmodmap, &remotetypmod, HASH_ENTER,
|
|
|
|
|
&found);
|
|
|
|
|
if (found)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
elog(ERROR, "duplicate tqueue control message for typmod %d",
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
remotetypmod);
|
|
|
|
|
mapent->localtypmod = tupledesc->tdtypmod;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
|
|
|
|
|
elog(DEBUG3, "tqueue mapping remote typmod %d to local typmod %d",
|
|
|
|
|
remotetypmod, mapent->localtypmod);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Build remap info for the specified data type, storing it in mycontext.
|
|
|
|
|
* Returns NULL if neither the type nor any subtype could require remapping.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
static TupleRemapInfo *
|
|
|
|
|
BuildTupleRemapInfo(Oid typid, MemoryContext mycontext)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
HeapTuple tup;
|
|
|
|
|
Form_pg_type typ;
|
|
|
|
|
|
|
|
|
|
/* This is recursive, so it could be driven to stack overflow. */
|
|
|
|
|
check_stack_depth();
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
restart:
|
|
|
|
|
tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
|
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
|
elog(ERROR, "cache lookup failed for type %u", typid);
|
|
|
|
|
typ = (Form_pg_type) GETSTRUCT(tup);
|
|
|
|
|
|
|
|
|
|
/* Look through domains to underlying base type. */
|
|
|
|
|
if (typ->typtype == TYPTYPE_DOMAIN)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
typid = typ->typbasetype;
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
goto restart;
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* If it's a true array type, deal with it that way. */
|
|
|
|
|
if (OidIsValid(typ->typelem) && typ->typlen == -1)
|
|
|
|
|
{
|
|
|
|
|
typid = typ->typelem;
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return BuildArrayRemapInfo(typid, mycontext);
|
|
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Similarly, deal with ranges appropriately. */
|
|
|
|
|
if (typ->typtype == TYPTYPE_RANGE)
|
|
|
|
|
{
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return BuildRangeRemapInfo(typid, mycontext);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/*
|
|
|
|
|
* If it's a composite type (including RECORD), set up for remapping. We
|
|
|
|
|
* don't attempt to determine the status of subfields here, since we do
|
|
|
|
|
* not have enough information yet; just mark everything invalid.
|
|
|
|
|
*/
|
|
|
|
|
if (typ->typtype == TYPTYPE_COMPOSITE || typid == RECORDOID)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TupleRemapInfo *remapinfo;
|
|
|
|
|
|
|
|
|
|
remapinfo = (TupleRemapInfo *)
|
|
|
|
|
MemoryContextAlloc(mycontext, sizeof(TupleRemapInfo));
|
|
|
|
|
remapinfo->remapclass = TQUEUE_REMAP_RECORD;
|
|
|
|
|
remapinfo->u.rec.rectypid = InvalidOid;
|
|
|
|
|
remapinfo->u.rec.rectypmod = -1;
|
|
|
|
|
remapinfo->u.rec.localtypmod = -1;
|
|
|
|
|
remapinfo->u.rec.tupledesc = NULL;
|
|
|
|
|
remapinfo->u.rec.field_remap = NULL;
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return remapinfo;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Nothing else can possibly need remapping attention. */
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static TupleRemapInfo *
|
|
|
|
|
BuildArrayRemapInfo(Oid elemtypid, MemoryContext mycontext)
|
|
|
|
|
{
|
|
|
|
|
TupleRemapInfo *remapinfo;
|
|
|
|
|
TupleRemapInfo *element_remapinfo;
|
|
|
|
|
|
|
|
|
|
/* See if element type requires remapping. */
|
|
|
|
|
element_remapinfo = BuildTupleRemapInfo(elemtypid, mycontext);
|
|
|
|
|
/* If not, the array doesn't either. */
|
|
|
|
|
if (element_remapinfo == NULL)
|
|
|
|
|
return NULL;
|
|
|
|
|
/* OK, set up to remap the array. */
|
|
|
|
|
remapinfo = (TupleRemapInfo *)
|
|
|
|
|
MemoryContextAlloc(mycontext, sizeof(TupleRemapInfo));
|
|
|
|
|
remapinfo->remapclass = TQUEUE_REMAP_ARRAY;
|
|
|
|
|
get_typlenbyvalalign(elemtypid,
|
|
|
|
|
&remapinfo->u.arr.typlen,
|
|
|
|
|
&remapinfo->u.arr.typbyval,
|
|
|
|
|
&remapinfo->u.arr.typalign);
|
|
|
|
|
remapinfo->u.arr.element_remap = element_remapinfo;
|
|
|
|
|
return remapinfo;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static TupleRemapInfo *
|
|
|
|
|
BuildRangeRemapInfo(Oid rngtypid, MemoryContext mycontext)
|
|
|
|
|
{
|
|
|
|
|
TupleRemapInfo *remapinfo;
|
|
|
|
|
TupleRemapInfo *bound_remapinfo;
|
|
|
|
|
TypeCacheEntry *typcache;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Get range info from the typcache. We assume this pointer will stay
|
|
|
|
|
* valid for the duration of the query.
|
|
|
|
|
*/
|
|
|
|
|
typcache = lookup_type_cache(rngtypid, TYPECACHE_RANGE_INFO);
|
|
|
|
|
if (typcache->rngelemtype == NULL)
|
|
|
|
|
elog(ERROR, "type %u is not a range type", rngtypid);
|
|
|
|
|
|
|
|
|
|
/* See if range bound type requires remapping. */
|
|
|
|
|
bound_remapinfo = BuildTupleRemapInfo(typcache->rngelemtype->type_id,
|
|
|
|
|
mycontext);
|
|
|
|
|
/* If not, the range doesn't either. */
|
|
|
|
|
if (bound_remapinfo == NULL)
|
|
|
|
|
return NULL;
|
|
|
|
|
/* OK, set up to remap the range. */
|
|
|
|
|
remapinfo = (TupleRemapInfo *)
|
|
|
|
|
MemoryContextAlloc(mycontext, sizeof(TupleRemapInfo));
|
|
|
|
|
remapinfo->remapclass = TQUEUE_REMAP_RANGE;
|
|
|
|
|
remapinfo->u.rng.typcache = typcache;
|
|
|
|
|
remapinfo->u.rng.bound_remap = bound_remapinfo;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
return remapinfo;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
* Build remap info for fields of the type described by the given tupdesc.
|
|
|
|
|
* Returns an array of TupleRemapInfo pointers, or NULL if no field
|
|
|
|
|
* requires remapping. Data is allocated in mycontext.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
static TupleRemapInfo **
|
|
|
|
|
BuildFieldRemapInfo(TupleDesc tupledesc, MemoryContext mycontext)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
TupleRemapInfo **remapinfo;
|
|
|
|
|
bool noop = true;
|
|
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* Recursively determine the remapping status of each field. */
|
|
|
|
|
remapinfo = (TupleRemapInfo **)
|
|
|
|
|
MemoryContextAlloc(mycontext,
|
|
|
|
|
tupledesc->natts * sizeof(TupleRemapInfo *));
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
Form_pg_attribute attr = tupledesc->attrs[i];
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
if (attr->attisdropped)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
remapinfo[i] = NULL;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
continue;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
remapinfo[i] = BuildTupleRemapInfo(attr->atttypid, mycontext);
|
|
|
|
|
if (remapinfo[i] != NULL)
|
|
|
|
|
noop = false;
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
/* If no fields require remapping, report that by returning NULL. */
|
|
|
|
|
if (noop)
|
|
|
|
|
{
|
|
|
|
|
pfree(remapinfo);
|
|
|
|
|
remapinfo = NULL;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
2016-07-31 16:05:12 -04:00
|
|
|
return remapinfo;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:10:08 -04:00
|
|
|
}
|