Seek zone abbreviations in the IANA data before timezone_abbreviations.

If a time zone abbreviation used in datetime input is defined in the currently active timezone, use that definition in preference to looking in the timezone_abbreviations list. That allows us to correctly handle abbreviations that have different meanings in different timezones. Also, it eliminates an inconsistency between datetime input and datetime output: the non-ISO datestyles for timestamptz have always printed abbreviations taken from the IANA data, not from timezone_abbreviations. Before this fix, it was possible to demonstrate cases where casting a timestamp to text and back fails or changes the value significantly because of that inconsistency. While this change removes the ability to override the IANA data about an abbreviation known in the current zone, it's not clear that there's any real use-case for doing so. But it is clear that this makes life a lot easier for dealing with abbreviations that have conflicts across different time zones. Also update the pg_timezone_abbrevs view to report abbreviations that are recognized via the IANA data, and *not* report any timezone_abbreviations entries that are thereby overridden. Under the hood, there are now two SRFs, one that pulls the IANA data and one that pulls timezone_abbreviations entries. They're combined by logic in the view. This approach was useful for debugging (since the functions can be called on their own). While I don't intend to document the functions explicitly, they might be useful to call directly. Also improve DecodeTimezoneAbbrev's caching logic so that it can cache zone abbreviations found in the IANA data. Without that, this patch would have caused a noticeable degradation of the runtime of timestamptz_in. Per report from Aleksander Alekseev and additional investigation. Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
author: Tom Lane 2025-01-16 19:11:19 +0000
committer: Tom Lane 2025-01-16 19:11:19 +0000
commit: d7674c9fab09d5bab427ba3b9b7a20b169aba715 (patch)
tree: 081f69dbaf1d06de875389d6b898fcb279c70982 /src/timezone/localtime.c
parent: bc10219b9c8931ff4f872b3e799da2208101c574 (diff)
1 files changed, 114 insertions, 0 deletions
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 21516c65082..8eb02ef1469 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1844,6 +1844,120 @@ pg_interpret_timezone_abbrev(const char *abbrev,
 }
 
 /*
+ * Detect whether a timezone abbreviation is defined within the given zone.
+ *
+ * This is similar to pg_interpret_timezone_abbrev() but is not concerned
+ * with a specific point in time.  We want to know if the abbreviation is
+ * known at all, and if so whether it has one meaning or several.
+ *
+ * Returns true if the abbreviation is known, false if not.
+ * If the abbreviation is known and has a single meaning (only one value
+ * of gmtoff/isdst), sets *isfixed = true and sets *gmtoff and *isdst.
+ * If there are multiple meanings, sets *isfixed = false.
+ *
+ * Note: abbrev is matched case-sensitively; it should be all-upper-case.
+ */
+bool
+pg_timezone_abbrev_is_known(const char *abbrev,
+							bool *isfixed,
+							long int *gmtoff,
+							int *isdst,
+							const pg_tz *tz)
+{
+	bool		result = false;
+	const struct state *sp = &tz->state;
+	const char *abbrs;
+	int			abbrind;
+
+	/*
+	 * Locate the abbreviation in the zone's abbreviation list.  We assume
+	 * there are not duplicates in the list.
+	 */
+	abbrs = sp->chars;
+	abbrind = 0;
+	while (abbrind < sp->charcnt)
+	{
+		if (strcmp(abbrev, abbrs + abbrind) == 0)
+			break;
+		while (abbrs[abbrind] != '\0')
+			abbrind++;
+		abbrind++;
+	}
+	if (abbrind >= sp->charcnt)
+		return false;			/* definitely not there */
+
+	/*
+	 * Scan the ttinfo array to find uses of the abbreviation.
+	 */
+	for (int i = 0; i < sp->typecnt; i++)
+	{
+		const struct ttinfo *ttisp = &sp->ttis[i];
+
+		if (ttisp->tt_desigidx == abbrind)
+		{
+			if (!result)
+			{
+				/* First usage */
+				*isfixed = true;	/* for the moment */
+				*gmtoff = ttisp->tt_utoff;
+				*isdst = ttisp->tt_isdst;
+				result = true;
+			}
+			else
+			{
+				/* Second or later usage, does it match? */
+				if (*gmtoff != ttisp->tt_utoff ||
+					*isdst != ttisp->tt_isdst)
+				{
+					*isfixed = false;
+					break;		/* no point in looking further */
+				}
+			}
+		}
+	}
+
+	return result;
+}
+
+/*
+ * Iteratively fetch all the abbreviations used in the given time zone.
+ *
+ * *indx is a state counter that the caller must initialize to zero
+ * before the first call, and not touch between calls.
+ *
+ * Returns the next known abbreviation, or NULL if there are no more.
+ *
+ * Note: the caller typically applies pg_interpret_timezone_abbrev()
+ * to each result.  While that nominally results in O(N^2) time spent
+ * searching the sp->chars[] array, we don't expect any zone to have
+ * enough abbreviations to make that meaningful.
+ */
+const char *
+pg_get_next_timezone_abbrev(int *indx,
+							const pg_tz *tz)
+{
+	const char *result;
+	const struct state *sp = &tz->state;
+	const char *abbrs;
+	int			abbrind;
+
+	/* If we're still in range, the result is the current abbrev. */
+	abbrs = sp->chars;
+	abbrind = *indx;
+	if (abbrind < 0 || abbrind >= sp->charcnt)
+		return NULL;
+	result = abbrs + abbrind;
+
+	/* Advance *indx past this abbrev and its trailing null. */
+	while (abbrs[abbrind] != '\0')
+		abbrind++;
+	abbrind++;
+	*indx = abbrind;
+
+	return result;
+}
+
+/*
  * If the given timezone uses only one GMT offset, store that offset
  * into *gmtoff and return true, else return false.
  */
author	Tom Lane	2025-01-16 19:11:19 +0000
committer	Tom Lane	2025-01-16 19:11:19 +0000
commit	d7674c9fab09d5bab427ba3b9b7a20b169aba715 (patch)
tree	081f69dbaf1d06de875389d6b898fcb279c70982 /src/timezone/localtime.c
parent	bc10219b9c8931ff4f872b3e799da2208101c574 (diff)