Accessing an Access Database with Pandas

Fri 26 May 2023

Interfacing Pandas to an Access Database¶

Introduction¶

This post is intended to be a guide for Python users, who wish to process a Microsoft (MS) Access database. The assumption is that pandas will be the primary analysis tool.

One option is to read the Access datatables into a MS Excel spreadsheet. This allows for an initial exploration, and access to Excel spreadsheets is well supported by pandas, but the two step translation loses the meta-data (column types, etc), so direct interfacing was chosen for implementation.

Background¶

Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The extensive support for ODBC in the Python ecosystem makes it the obvious choice for interfacing to MS Access.

As shown below, the appropriate drivers from MS must be installed to support the interfacing operations.

`pyodbc` Package¶

The pyodbc package was the first choice for an implementation of code to read the databasde

Initial Implementation¶

Imports¶

In this section, we import the Python packages we will use.

In [21]:

import pandas as pd
import pyodbc

# used for environment documentation
import sys
import os
import platform
import datetime

In [22]:

%load_ext watermark

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark

In [23]:

%load_ext lab_black

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black

Driver environment¶

As discussed above, the appropriate software drivers from MS must be installed to support the intergace to Access databases. We can determine the drivers available by the following code snippet, which create a list of the drivers available (as seen by the pyodbc package).

In [24]:

[x for x in pyodbc.drivers()]

Out[24]:

['SQL Server',
 'SQL Server Native Client 11.0',
 'Microsoft Access Driver (*.mdb, *.accdb)',
 'Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)',
 'Microsoft Access dBASE Driver (*.dbf, *.ndx, *.mdx)',
 'Microsoft Access Text Driver (*.txt, *.csv)']

In the initial run of this snippet, the only drivers available were the SQL Server drivers. Going to the link

https://www.microsoft.com/en-us/download/details.aspx?id=54920

and clicking the "Download" button will install the Microsoft Access Driver, as shown above.

Testing the Interface¶

To test the interface, we create a connection string that specifies the interfacing driver we need, and the database location (in this case, local to the machine running Python). Then we create a connection.

In [25]:

connStr = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    r"DBQ=D:\WaterQualityDB2022\wq2022.accdb;"
)
cnxn = pyodbc.connect(connStr)

We know from the exploration via Excel, that there is a table Equipment. We attempt to read the table into pandas.

In [26]:

sql_qry = 'select * from Equipment'
df = pd.read_sql(sql_qry, cnxn)
df.head()

D:\Anaconda2022\lib\site-packages\pandas\io\sql.py:762: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
  warnings.warn(

Out[26]:

	Equipment ID	Equipment Name	Model
0	1	Maroochy 1	Horiba U52
1	2	Maroochy 2	Horiba U52
2	3	Maroochy 3	Horiba U52
3	4	Maroochy 4	Horiba U52
4	5	Maroochy 5	Horiba U52

The problem here is that although this works, we have a warning message from pandas that the ODC access we are using is not supported:

UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy

Now in general, application or package warning messages have to considered seriously, especially in software intended to have an extended lifetime, because subsequent updates to pandas could totally remove support for non-SQLAlchemy access methods.

SQLAlchemy Implementation¶

Investigation of integration of the SQLAlchemy package, pandas, and MS Access revealed one potential item of concern: MS Access is not a supported database that the SQLAlchemy project supports. I suspect that this is because Python (especially in the machine learning domain) is oriented towards large to very large databases, and not the relatively small databases that MS Access is typically used for.

However, there is an independent Python package sqlalchemy_access, that provides the interface between SQLAlchemy and MS Access. Note that we still have to import the pyodbc package. sqlalchemy_access did not appear to be in the Anaconda ecosystem, so I just downloaded it from GitHub GitHub link

Implementation¶

Early SQLAlchemy¶

Again, we define the packages we wish to import.

In [27]:

import pandas as pd
import sqlalchemy as sa
import sqlalchemy_access

import pyodbc

Next (as before), we define a connection specification, and establish a connection to the database.

In [28]:

import sqlalchemy_access

connection_string = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    r"DBQ=D:\WaterQualityDB2022\wq2022.accdb;"
)
connection_url = sa.engine.URL.create(
    "access+pyodbc",
    query={"odbc_connect": connection_string},
)
engine = sa.create_engine(connection_url)

Now we can retrieve the Equipment table, and display the contents, without generating warning messages.

In [29]:

qry = 'select * from Equipment;'
df = pd.read_sql_query(qry, engine)
df.head(1).T

Out[29]:

	0
Equipment ID	1
Equipment Name	Maroochy 1
Serial
Probe Serial
Model	Horiba U52

Later SQLAlchemy¶

There is a problem: at sqlalchemy 2.0.11, the code above gives :

AttributeError: 'OptionEngine' object has no attribute 'execute'

Refer to https://levelup.gitconnected.com/how-to-fix-attributeerror-optionengine-object-has-no-attribute-execute-in-pandas-eb635fbb89e4

The suggested fix is

import pandas as pd
from sqlalchemy import create_engine, text


engine = create_engine(connection_url)
query = 'SELECT * FROM my_table'

df = pd.read_sql_query(sql=text(query), con=engine.connect())

This fix also works fine at sqlalchemy 1.4.39

In [30]:

df = pd.read_sql_query(
    sql=sa.text(qry),
    con=engine.connect(),
)
df.head(1).T

Out[30]:

	0
Equipment ID	1
Equipment Name	Maroochy 1
Serial
Probe Serial
Model	Horiba U52

Database Structure¶

We can confirm that SQLAlchemy 'sees' the same database tables, as can be seen via Excel.

In [31]:

from sqlalchemy import inspect

insp = inspect(engine)
insp.get_table_names()

Out[31]:

['Equipment',
 'Maintenance',
 'Parameter Standards',
 'People',
 'Results_Kit_Calibration',
 'Results_Sites_Water_Testing',
 'Sites']

In [ ]:

Reproducibility¶

This sections contains information to support the reproducibility of the software.

In [32]:

import sys
import os
import platform
import datetime

In [33]:

# | warn: false
%reload_ext watermark

In [34]:

# | warn: false
%reload_ext lab_black

Python execution environment¶

In [35]:

theNotebook = 'InterfacingToWQDatabase.ipynb'

print(f'Notebook: {theNotebook}')
print('python version : ' + sys.version)
print('pandas version : ' + pd.__version__)
print('current wkg dir: ' + os.getcwd())
print(
    'Notebook run at: '
    + str(datetime.datetime.now())
    + ' local time'
)
print(
    'Notebook run at: '
    + str(datetime.datetime.utcnow())
    + ' UTC'
)
print('Notebook run on: ' + platform.platform())

Notebook: InterfacingToWQDatabase.ipynb
python version : 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]
pandas version : 1.4.4
current wkg dir: C:\Users\donrc\Documents\JupyterNotebooks\PythonNotebookProject\develop
Notebook run at: 2023-05-24 20:30:00.363071 local time
Notebook run at: 2023-05-24 10:30:00.364066 UTC
Notebook run on: Windows-10-10.0.19045-SP0

`watermark` data¶

Execution environment as reported by the watermark utility

In [36]:

%watermark

Last updated: 2023-05-24T20:30:00.391079+10:00

Python implementation: CPython
Python version       : 3.9.13
IPython version      : 7.31.1

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
CPU cores   : 8
Architecture: 64bit

In [37]:

%watermark -h -iv -co

conda environment: D:\Anaconda2022

Hostname: DESKTOP-SODFUN6

sys              : 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]
pandas           : 1.4.4
pyodbc           : 4.0.34
sqlalchemy_access: 1.1.4.dev0
sqlalchemy       : 1.4.39
platform         : 1.0.8

In [ ]:

Accessing an Access Database with Pandas

Interfacing Pandas to an Access Database¶

Introduction¶

Background¶

`pyodbc` Package¶

Initial Implementation¶

Imports¶

Driver environment¶

Testing the Interface¶

SQLAlchemy Implementation¶

Implementation¶

Early SQLAlchemy¶

Later SQLAlchemy¶

Database Structure¶

Reproducibility¶

Python execution environment¶

`watermark` data¶

Comments

About

Accessing an Access Database with Pandas

Interfacing Pandas to an Access Database¶

Introduction¶

Background¶

pyodbc Package¶

Initial Implementation¶

Imports¶

Driver environment¶

Testing the Interface¶

SQLAlchemy Implementation¶

Implementation¶

Early SQLAlchemy¶

Later SQLAlchemy¶

Database Structure¶

Reproducibility¶

Python execution environment¶

watermark data¶

Comments

About

`pyodbc` Package¶

`watermark` data¶