Wed 21 February 2024

Filed under Python

Tags python datetime

DateTime gotcha-s

This post will be reworking some examples from https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls/ which show how counter-intuitive some of the default Python date-time handling is as the corner cases.

I will also look at how Pandas handle date-times, as I do most of data wrangling with Pandas.


Environment

Import the packages we use

In [1]:
from zoneinfo import ZoneInfo
from datetime import datetime, timedelta, date, timezone, UTC

import pandas as pd
import dateutil
In [19]:
%load_ext watermark

Time jumps due to daylight saving

Europe changes clocks forward on the last Sunday in March. So if we go to bed at 10:00 pm, and wake up at 7:00, we have only really slept 8 hours.

However the datetime library appears to ignore clock jumping.

In [2]:
paris = ZoneInfo('Europe/Paris')

# last Sunday in March in Paris, so clock should jump forward
bedtime = datetime(2023,3,25,22, tzinfo=paris)
wake_up = datetime(2023, 3, 26, 7, tzinfo=paris)

sleep= wake_up - bedtime
print(f'{sleep=}')

hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
sleep=datetime.timedelta(seconds=32400)
Hours slept = 9.0

If we print out the two datetime variables in question, we can see that they have a different UTC offset. This shows us that the clocks were changed while we were sleeping.

In [3]:
print(f' {bedtime.utcoffset()=}, {wake_up.utcoffset()=}')
 bedtime.utcoffset()=datetime.timedelta(seconds=3600), wake_up.utcoffset()=datetime.timedelta(seconds=7200)

The change in offset is exactly one hour.

In [4]:
wake_up.utcoffset() - bedtime.utcoffset()
Out[4]:
datetime.timedelta(seconds=3600)

So now we can correct our time calculation, by considering the change in UTC offset (if any)

In [5]:
sleep= wake_up - bedtime -(wake_up.utcoffset() - bedtime.utcoffset())

hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
Hours slept = 8.0

We also get the correct answer if there is no clock jumping involved

In [6]:
bedtime = datetime(2023,2,21,22, tzinfo=paris)
wake_up = datetime(2023, 2, 22, 7, tzinfo=paris)

sleep= wake_up - bedtime -(wake_up.utcoffset() - bedtime.utcoffset())

hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
Hours slept = 9.0

pandas gets it right

Pandas has extensive date-time support, and seems to get it right, with no coding from us required.

First the case of a clock change while we sleep.

In [7]:
bedtime = pd.to_datetime(datetime(2023,3,25,22, tzinfo=paris))
wake_up = pd.to_datetime(datetime(2023, 3, 26, 7, tzinfo=paris))

sleep= wake_up - bedtime
print(f'{sleep=}')

hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
sleep=Timedelta('0 days 08:00:00')
Hours slept = 8.0

And now a case with no clock change:

In [9]:
bedtime = pd.to_datetime(datetime(2023,2,21,22, tzinfo=paris))
wake_up = pd.to_datetime(datetime(2023, 2, 22, 7, tzinfo=paris))

sleep= wake_up - bedtime
print(f'{sleep=}')

hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
sleep=Timedelta('0 days 09:00:00')
Hours slept = 9.0

Non-existent times

If we have a clock change where the clock is moved forward, then some times become non-existance. For example consider the case where what would normally be 2:00 AM becomes 3:00 AM.

This makes 2:30 AM an impossible wall-clock time.

By default, Python will happily create these date-times!

In [60]:
# ⚠️ This time does not exist on this date
d = datetime(2023, 3, 26, 2, 30, tzinfo=paris)
d
Out[60]:
datetime.datetime(2023, 3, 26, 2, 30, tzinfo=zoneinfo.ZoneInfo(key='Europe/Paris'))

Converting this Python datetime to a Pandas Timestamp, results in a legal date-time, half an hour past what is now 3:00 AM.

In [61]:
pd.to_datetime(d)
Out[61]:
Timestamp('2023-03-26 03:30:00+0200', tz='Europe/Paris')

Another way to handle this is to create in Python a date-time that has no TimeZone unspecified (naive), and then tell Pandas to convert this to the Time Zone required. We can specify a behaviour if the date-time is non-existent, e.g. we can raise an exception if any such date-time is seen.

In [13]:
pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='raise')
---------------------------------------------------------------------------
NonExistentTimeError                      Traceback (most recent call last)
Cell In[13], line 1
----> 1 pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='raise')

File timestamps.pyx:2327, in pandas._libs.tslibs.timestamps.Timestamp.tz_localize()

File tzconversion.pyx:180, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc_single()

File tzconversion.pyx:426, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc()

NonExistentTimeError: 2023-03-26 02:30:00

The other behavious are to slide the input date-time backwards or forwards to the closest legal time

In [65]:
pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='shift_forward')
Out[65]:
Timestamp('2023-03-26 03:00:00+0200', tz='Europe/Paris')
In [67]:
pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='shift_backward')
Out[67]:
Timestamp('2023-03-26 01:59:59.999999999+0100', tz='Europe/Paris')

Duplicated times

Daylight saving where clocks go back can create a situation where a given wall clock time can occur twice in a 24 hour period. Standard Python has a parameter fold that lets you specify the first or last such time. The default is the first such wall clock time.

In [71]:
d = datetime(2023,10,29,2,30,tzinfo=paris)
d
Out[71]:
datetime.datetime(2023, 10, 29, 2, 30, tzinfo=zoneinfo.ZoneInfo(key='Europe/Paris'))

If we create an ambiguous date-time, by default we get the first one (larger of the two UTC Offsets)

In [72]:
d.utcoffset()
Out[72]:
datetime.timedelta(seconds=7200)

If we specify the seond wall-clock time we get the smaller UTC Offset.

In [74]:
d2 = datetime(2023,10,29,2,30,tzinfo=paris, fold=1, )
d2
Out[74]:
datetime.datetime(2023, 10, 29, 2, 30, fold=1, tzinfo=zoneinfo.ZoneInfo(key='Europe/Paris'))
In [75]:
d2.utcoffset()
Out[75]:
datetime.timedelta(seconds=3600)

Again, with pandas we can specify the behaviour we want, including raisng an exception if such a date-time is seen. The error message is little confusing?

In [77]:
pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous='raise')
---------------------------------------------------------------------------
AmbiguousTimeError                        Traceback (most recent call last)
Cell In[77], line 1
----> 1 pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous='raise')

File timestamps.pyx:2327, in pandas._libs.tslibs.timestamps.Timestamp.tz_localize()

File tzconversion.pyx:180, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc_single()

File tzconversion.pyx:371, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc()

AmbiguousTimeError: Cannot infer dst time from 2023-10-29 02:30:00, try using the 'ambiguous' argument

We can also specify a boolean value, that indicates if DST adjustment is to be applied.

In [10]:
dst_bool = False
d1 = pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous=dst_bool)
d1
Out[10]:
Timestamp('2023-10-29 02:30:00+0100', tz='Europe/Paris')
In [11]:
dst_bool = True
d2 = pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous=dst_bool)
d2
Out[11]:
Timestamp('2023-10-29 02:30:00+0200', tz='Europe/Paris')

pandas also repects the standard Python fold parameter.

In [14]:
d1 = datetime(2023,10,29,2,30,tzinfo=paris)
pd.to_datetime(d1)
Out[14]:
Timestamp('2023-10-29 02:30:00+0200', tz='Europe/Paris')
In [15]:
d2 = datetime(2023,10,29,2,30,tzinfo=paris, fold=1, )
pd.to_datetime(d2)
Out[15]:
Timestamp('2023-10-29 02:30:00+0100', tz='Europe/Paris')

Comparisons

Comparison of date-times can be confusing. As an example, below are six ways of asking "what is the date and time now" (one deprecated).

In [17]:
print(f'{date.today()=}')
print(f'{datetime.today()=}')
print(f'{datetime.now()=}')
print(f'{datetime.utcnow()=}')
print(f'{datetime.now(timezone.utc)=}')
print(f'{datetime.now(UTC)=}')
date.today()=datetime.date(2024, 2, 19)
datetime.today()=datetime.datetime(2024, 2, 19, 17, 24, 59, 687248)
datetime.now()=datetime.datetime(2024, 2, 19, 17, 24, 59, 687248)
datetime.utcnow()=datetime.datetime(2024, 2, 19, 7, 24, 59, 687248)
datetime.now(timezone.utc)=datetime.datetime(2024, 2, 19, 7, 24, 59, 687248, tzinfo=datetime.timezone.utc)
datetime.now(UTC)=datetime.datetime(2024, 2, 19, 7, 24, 59, 687248, tzinfo=datetime.timezone.utc)
C:\Users\donrc\AppData\Local\Temp\ipykernel_24512\1724687794.py:4: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  print(f'{datetime.utcnow()=}')

From Arie Bovenberg's blog post, below is an example where we create two apparently different date-times from an ambiguous date-time, only to find they test equal! Apparantly, the test compares wall clock digits only.

In [112]:
# two times one hour apart (due to DST transition)
earlier = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=0)
later = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=1)

print(earlier, later)
2023-10-29 02:30:00+02:00 2023-10-29 02:30:00+01:00
In [114]:
earlier.timestamp(), later.timestamp()
Out[114]:
(1698539400.0, 1698543000.0)
In [115]:
earlier==later
Out[115]:
True

Once again, pandas does the expected thing (no, these date-time are not equal)

In [125]:
t1 = pd.to_datetime(earlier)
t2 = pd.to_datetime(later)
In [120]:
t1 == t2
Out[120]:
False

Note that if we change the time zone information, even if to an equivalent set of information, the date-times will test not-equal!

In [122]:
later2 = later.replace(tzinfo=dateutil.tz.gettz("Europe/Paris"))
In [123]:
later == later2
Out[123]:
False

pandas again seems to do the correct thing, even if the time zone information came from two different typed objects.

In [126]:
t3 = pd.to_datetime(later)
t4 = pd.to_datetime(later2)
print(t3,t4)
2023-10-29 02:30:00+01:00 2023-10-29 02:30:00+01:00
In [127]:
t3==t4
Out[127]:
True
In [130]:
t3.tzinfo, t4.tzinfo
Out[130]:
(zoneinfo.ZoneInfo(key='Europe/Paris'), tzfile('Europe/Paris'))

Conclusion

The various datetime pitfalls are certainly something to be aware of, and should be considered in any code reviews of Python apps that deal with dates or times.

It is slightly reassuring the pandas seems to be more reliable in this regard.


Reproducability

In [20]:
%watermark -co
conda environment: c:\Users\donrc\Documents\VisualCodeProjects\DateTimeProject\.conda

In [21]:
%watermark -iv
dateutil: 2.8.2
pandas  : 2.1.4

In [22]:
%watermark
Last updated: 2024-02-19T17:39:41.908759+10:00

Python implementation: CPython
Python version       : 3.12.1
IPython version      : 8.21.0

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 11
Machine     : AMD64
Processor   : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores   : 8
Architecture: 64bit

Comment

Wed 25 March 2020

Filed under Python

Tags python

Solving string challenges in Python, with almost one-liners

Read More

Thu 09 January 2020

Filed under Python

Tags python

Using Python to Emulate Unix Pipes

Read More

net-analysis.com Data Analysis Blog © Don Cameron Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More