mirror of
https://github.com/postgres/postgres.git
synced 2026-04-04 16:55:45 -04:00
In the IANA timezone code, tzparse() always tries to load the zone
file named by TZDEFRULES ("posixrules"). Previously, we'd hacked
that logic to skip the load in the "lastditch" code path, which we use
only to initialize the default "GMT" zone during GUC initialization.
That's critical for a couple of reasons: since we do not support leap
seconds, we *must not* allow "GMT" to have leap seconds, and since this
case runs before the GUC subsystem is fully alive, we'd really rather
not take the risk of pg_open_tzfile throwing any errors.
However, that still left the code reading TZDEFRULES on every other
call, something we'd noticed to the extent of having added code to cache
the result so it was only done once per process not a lot of times.
Andres Freund complained about the static data space used up for the
cache; but as long as the logic was like this, there was no point in
trying to get rid of that space.
We can improve matters by looking a bit more closely at what the IANA
code actually needs the TZDEFRULES data for. One thing it does is
that if "posixrules" is a leap-second-aware zone, the leap-second
behavior will be absorbed into every POSIX-style zone specification.
However, that's a behavior we'd really prefer to do without, since
for our purposes the end effect is to render every POSIX-style zone
name unsupported. Otherwise, the TZDEFRULES data is used only if
the POSIX zone name specifies DST but doesn't include a transition
date rule (e.g., "EST5EDT" rather than "EST5EDT,M3.2.0,M11.1.0").
That is a minority case for our purposes --- in particular, it
never happens when tzload() invokes tzparse() to interpret a
transition date rule string found in a tzdata zone file.
Hence, if we legislate that we're going to ignore leap-second data
from "posixrules", we can postpone the TZDEFRULES load into the path
where we actually need to substitute for a missing date rule string.
That means it will never happen at all in common scenarios, making it
reasonable to dynamically allocate the cache space when it does happen.
Even when the data is already loaded, this saves some cycles in the
common code path since we avoid a memcpy of 23KB or so. And, IMO at
least, this is a less ugly hack on the IANA logic than what we had
before, since it's not messing with the lastditch-vs-regular code paths.
Back-patch to all supported branches, not so much because this is a
critical change as that I want to keep all our copies of the IANA
timezone code in sync.
Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de
118 lines
5.3 KiB
Text
118 lines
5.3 KiB
Text
src/timezone/README
|
|
|
|
This is a PostgreSQL adapted version of the IANA timezone library from
|
|
|
|
https://www.iana.org/time-zones
|
|
|
|
The latest version of the timezone data and library source code is
|
|
available right from that page. It's best to get the merged file
|
|
tzdb-NNNNX.tar.lz, since the other archive formats omit tzdata.zi.
|
|
Historical versions, as well as release announcements, can be found
|
|
elsewhere on the site.
|
|
|
|
Since time zone rules change frequently in some parts of the world,
|
|
we should endeavor to update the data files before each PostgreSQL
|
|
release. The code need not be updated as often, but we must track
|
|
changes that might affect interpretation of the data files.
|
|
|
|
|
|
Time Zone data
|
|
==============
|
|
|
|
We distribute the time zone source data as-is under src/timezone/data/.
|
|
Currently, we distribute just the abbreviated single-file format
|
|
"tzdata.zi", to reduce the size of our tarballs as well as churn
|
|
in our git repo. Feeding that file to zic produces the same compiled
|
|
output as feeding the bulkier individual data files would do.
|
|
|
|
While data/tzdata.zi can just be duplicated when updating, manual effort
|
|
is needed to update the time zone abbreviation lists under tznames/.
|
|
These need to be changed whenever new abbreviations are invented or the
|
|
UTC offset associated with an existing abbreviation changes. To detect
|
|
if this has happened, after installing new files under data/ do
|
|
make abbrevs.txt
|
|
which will produce a file showing all abbreviations that are in current
|
|
use according to the data/ files. Compare this to known_abbrevs.txt,
|
|
which is the list that existed last time the tznames/ files were updated.
|
|
Update tznames/ as seems appropriate, then replace known_abbrevs.txt
|
|
in the same commit. Usually, if a known abbreviation has changed meaning,
|
|
the appropriate fix is to make it refer to a long-form zone name instead
|
|
of a fixed GMT offset.
|
|
|
|
The core regression test suite does some simple validation of the zone
|
|
data and abbreviations data (notably by checking that the pg_timezone_names
|
|
and pg_timezone_abbrevs views don't throw errors). It's worth running it
|
|
as a cross-check on proposed updates.
|
|
|
|
When there has been a new release of Windows (probably including Service
|
|
Packs), the list of matching timezones need to be updated. Run the
|
|
script in src/tools/win32tzlist.pl on a Windows machine running this new
|
|
release and apply any new timezones that it detects. Never remove any
|
|
mappings in case they are removed in Windows, since we still need to
|
|
match properly on the old version.
|
|
|
|
|
|
Time Zone code
|
|
==============
|
|
|
|
The code in this directory is currently synced with tzcode release 2018e.
|
|
There are many cosmetic (and not so cosmetic) differences from the
|
|
original tzcode library, but diffs in the upstream version should usually
|
|
be propagated to our version. Here are some notes about that.
|
|
|
|
For the most part we want to use the upstream code as-is, but there are
|
|
several considerations preventing an exact match:
|
|
|
|
* For readability/maintainability we reformat the code to match our own
|
|
conventions; this includes pgindent'ing it and getting rid of upstream's
|
|
overuse of "register" declarations. (It used to include conversion of
|
|
old-style function declarations to C89 style, but thank goodness they
|
|
fixed that.)
|
|
|
|
* We need the code to follow Postgres' portability conventions; this
|
|
includes relying on configure's results rather than hand-hacked #defines,
|
|
and not relying on <stdint.h> features that may not exist on old systems.
|
|
(In particular this means using Postgres' definitions of the int32 and
|
|
int64 typedefs, not int_fast32_t/int_fast64_t.)
|
|
|
|
* Since Postgres is typically built on a system that has its own copy
|
|
of the <time.h> functions, we must avoid conflicting with those. This
|
|
mandates renaming typedef time_t to pg_time_t, and similarly for most
|
|
other exposed names.
|
|
|
|
* We have exposed the tzload() and tzparse() internal functions, and
|
|
slightly modified the API of the former, in part because it now relies
|
|
on our own pg_open_tzfile() rather than opening files for itself.
|
|
|
|
* tzparse() is adjusted to avoid loading the TZDEFRULES zone unless
|
|
really necessary, and to ignore any leap-second data it may supply.
|
|
We also cache the result of loading the TZDEFRULES zone, so that
|
|
that's not repeated more than once per process.
|
|
|
|
* There's a fair amount of code we don't need and have removed,
|
|
including all the nonstandard optional APIs. We have also added
|
|
a few functions of our own at the bottom of localtime.c.
|
|
|
|
* In zic.c, we have added support for a -P (print_abbrevs) switch, which
|
|
is used to create the "abbrevs.txt" summary of currently-in-use zone
|
|
abbreviations that was described above.
|
|
|
|
|
|
The most convenient way to compare a new tzcode release to our code is
|
|
to first run the tzcode source files through a sed filter like this:
|
|
|
|
sed -r \
|
|
-e 's/^([ \t]*)\*\*([ \t])/\1 *\2/' \
|
|
-e 's/^([ \t]*)\*\*$/\1 */' \
|
|
-e 's|^\*/| */|' \
|
|
-e 's/\bregister[ \t]//g' \
|
|
-e 's/int_fast32_t/int32/g' \
|
|
-e 's/int_fast64_t/int64/g' \
|
|
-e 's/struct[ \t]+tm\b/struct pg_tm/g' \
|
|
-e 's/\btime_t\b/pg_time_t/g' \
|
|
|
|
and then run them through pgindent. (The first three sed patterns deal
|
|
with conversion of their block comment style to something pgindent
|
|
won't make a hash of; the remainder address other points noted above.)
|
|
After that, the files can be diff'd directly against our corresponding
|
|
files.
|