Sun 07 June 2020

Filed under scipy

Tags scipy

ADC Analysis and Scipy

Introduction

In an effort to beat the Lockdown Blues, I thought I would dabble in the world of embedded computers. This was sparked by the recommendation on Hacker News for an EDX course "Embedded Systems - Shape The World: Microcontroller Input/Output". It has proved quite an interesting exercise.

The course uses a TI ARM microcontroller TM4C123GH6PM to illustrate aspects of embedded design and programming. Even though it is by now about five years old, there are still cheap Evaluation Kits from TI readily available. In fact, you can buy all the components needed for the course as a bundle, even here in Australia.

Culture Shock

It is quite a change to go from a high-level software engineering viewpoint, where you try to make abstractions of real world systems, to the world where you have to have a deep understanding of the architecture of the TM4C123GH6PM. You are expected to turn modules on and off to save power, and here poking bits into arcane registers to modify simultaneously many aspects system behaviour is a way of life. You can certainly come up with very concise C programs, where what would be a series of functions calls in my world, can be compactly expressed as a bit array, written into some control register.

ADC

One of the facilities of the TM4C123GH6PM is an Analog-to-Digital Convertor, that is very flexible. One of my exercises was to input an analog voltage, convert it to a 12 bit value, and send to a laptop (via the USB serial line) for display. I took a set of observations of the input Voltage (via a multimeter), and the uploaded ADC value. In doing this I noted that the output was quite noisy, with a jitter in last decimal digit of the ADC value.

I then repeated the experiment, but this time I configured the TM4C123GH6PM ADC to do hardware averaging of the values read. The jitter seemed to be markedly reduced.

Second Thoughts

Then I had second thoughts. Maybe it wasn't noise, but (say) 50Hz ripple I was seeing in the reference voltage (unlikely I know), being sampled at random times. In this case, the averaged ADC values would still drift up and down with the ripple.

Scipy to the Rescue

I decided to use scipy to tell if the reduced jitter was really smaller in the averaged case.


Notebook Setup

In [1]:
%load_ext lab_black
In [2]:
%load_ext watermark
In [3]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import numpy as np

from statsmodels.formula.api import ols

import statsmodels.api as sm

import scipy.stats as stats

import warnings

import sys
import os
import subprocess
import datetime
import platform
import datetime
D:\Anaconda3\envs\ac5-py37\lib\site-packages\statsmodels\compat\pandas.py:49: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  data_klasses = (pandas.Series, pandas.DataFrame, pandas.Panel)

Hardware Layout

The diagram below shows what must be the simplest Fritzing diagram ever.

In [4]:
from IPython.display import Image

Image("images/TM4C123 ADC_bb.png")
Out[4]:

Single Sample ADC

We read the data from the single sample ADC values via pandas, do some data cleanup, and use Seaborn to show the results

In [5]:
data = pd.read_csv('ADCRaw.txt')

data.head()
Out[5]:
Voltage ADC
0 0.52 576
1 1.15 1423
2 1.75 2159
3 2.66 3348
4 2.13 2764
In [6]:
data.columns
Out[6]:
Index(['Voltage', ' ADC '], dtype='object')
In [7]:
data.columns = ['Voltage', 'ADC']
data.sort_values(by='Voltage', inplace=True)

We use a different colors for raw data, and the fitted line. I have no idea why the legend doesn't show all labels. To make a point, I make the Confidence Interval to be the 99% range (and to make it easier to see)

In [8]:
fig, ax = plt.subplots(figsize=(10, 10))

sns.regplot(
    'Voltage',
    'ADC',
    data=data,
    ci=99,
    scatter_kws={"color": "black", "label": "ADC Reading"},
    line_kws={"color": "red", "label": "OLS Fit"},
)
plt.legend(loc='best')
ax.set_title('Regplot: TM4C123GXL ADC Performance')
D:\Anaconda3\envs\ac5-py37\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Out[8]:
Text(0.5, 1.0, 'Regplot: TM4C123GXL ADC Performance')

Linear Regression

We now perform a linear regression, and plot the Observation Confidence Interval (not the mean CI)

In [9]:
res1 = ols('ADC ~ Voltage', data=data).fit()
In [10]:
res1.summary()
D:\Anaconda3\envs\ac5-py37\lib\site-packages\scipy\stats\stats.py:1394: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=12
  "anyway, n=%i" % int(n))
Out[10]:
OLS Regression Results
Dep. Variable: ADC R-squared: 0.998
Model: OLS Adj. R-squared: 0.997
Method: Least Squares F-statistic: 4326.
Date: Sun, 07 Jun 2020 Prob (F-statistic): 1.61e-14
Time: 09:27:23 Log-Likelihood: -63.930
No. Observations: 12 AIC: 131.9
Df Residuals: 10 BIC: 132.8
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -37.8856 46.917 -0.807 0.438 -142.424 66.653
Voltage 1262.8033 19.200 65.770 0.000 1220.023 1305.584
Omnibus: 4.851 Durbin-Watson: 2.086
Prob(Omnibus): 0.088 Jarque-Bera (JB): 1.740
Skew: -0.197 Prob(JB): 0.419
Kurtosis: 4.823 Cond. No. 8.38


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Note that the linear model is a very good one, explaining most of the variation in the data

Now, we plot the data, with CI lines (using the default alpha)

In [11]:
gp = res1.get_prediction({'Voltage': data['Voltage']})


pred_df = gp.summary_frame()
pred_df.head(1)
Out[11]:
mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
0 618.772083 37.662607 534.854564 702.689602 471.013716 766.530449
In [12]:
fig, ax = plt.subplots(figsize=(10, 10))
ax.plot(
    data['Voltage'], data['ADC'], 'ko', label='ADC Readings'
)
ax.plot(
    data['Voltage'], pred_df['mean'], 'g-', label='OLS Fit'
)
ax.plot(
    data['Voltage'],
    pred_df['obs_ci_upper'],
    'y:',
    label='Upper CI',
)
ax.plot(
    data['Voltage'],
    pred_df['obs_ci_lower'],
    'y:',
    label='Lower CI',
)
ax.fill_between(
    data['Voltage'],
    pred_df['obs_ci_lower'],
    pred_df['obs_ci_upper'],
    alpha=0.2,
)
ax.legend(loc='best')

ax.set_title('TM4C123GXL ADC Performance')
ax.set_xlabel('Volts')
ax.set_ylabel('ADC Reading')
Out[12]:
Text(0, 0.5, 'ADC Reading')

Hardware Averaged Readings

Again, we read the dataset of hardware averaged values, and clean it up

In [13]:
data2 = pd.read_csv('ADCAv64.txt')
data2.head(2)
Out[13]:
Voltage ADC
0 0.53 658
1 1.15 1439
In [14]:
data2.columns = ['Voltage', 'ADC']
data2.sort_values(by='Voltage', inplace=True)

Perform Linear Regression

In [15]:
res2 = ols('ADC ~ Voltage', data=data2).fit()
In [16]:
res2.summary()
D:\Anaconda3\envs\ac5-py37\lib\site-packages\scipy\stats\stats.py:1394: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=12
  "anyway, n=%i" % int(n))
Out[16]:
OLS Regression Results
Dep. Variable: ADC R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 4.377e+05
Date: Sun, 07 Jun 2020 Prob (F-statistic): 1.53e-24
Time: 09:27:24 Log-Likelihood: -38.370
No. Observations: 12 AIC: 80.74
Df Residuals: 10 BIC: 81.71
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -12.7591 4.278 -2.983 0.014 -22.291 -3.227
Voltage 1254.0907 1.896 661.611 0.000 1249.867 1258.314
Omnibus: 0.128 Durbin-Watson: 2.721
Prob(Omnibus): 0.938 Jarque-Bera (JB): 0.344
Skew: -0.034 Prob(JB): 0.842
Kurtosis: 2.173 Cond. No. 6.00


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Plot Regression Results

We plot it twice, the second one with a wider CI band, but it is visually clear that the datapoints lie closer to the best-fit line

In [17]:
gp = res2.get_prediction({'Voltage': data2['Voltage']})


pred_df = gp.summary_frame()
pred_df.head(1)
Out[17]:
mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
0 651.908963 3.403163 644.326243 659.491682 635.587551 668.230375
In [18]:
fig, ax = plt.subplots(figsize=(10, 10))
ax.plot(
    data2['Voltage'],
    data2['ADC'],
    'ko',
    label='ADC Readings',
)
ax.plot(
    data2['Voltage'], pred_df['mean'], 'g-', label='OLS Fit'
)
ax.plot(
    data2['Voltage'],
    pred_df['obs_ci_upper'],
    'y:',
    label='Upper CI  (alpha=0.05)',
)
ax.plot(
    data2['Voltage'],
    pred_df['obs_ci_lower'],
    'y:',
    label='Lower CI',
)
ax.fill_between(
    data2['Voltage'],
    pred_df['obs_ci_lower'],
    pred_df['obs_ci_upper'],
    alpha=0.2,
)
ax.legend(loc='best')
ax.set_title(
    'TM4C123GXL ADC Performance (HW Average of 64 Samples)'
)
ax.set_xlabel('Volts')
ax.set_ylabel('ADC Reading')
Out[18]:
Text(0, 0.5, 'ADC Reading')
In [19]:
gp = res2.get_prediction({'Voltage': data2['Voltage']})


pred_df = gp.summary_frame(alpha=0.005)
In [20]:
pred_df.head(1)
Out[20]:
mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
0 651.908963 3.403163 639.720854 664.097071 625.674688 678.143237
In [21]:
fig, ax = plt.subplots(figsize=(10, 10))
ax.plot(
    data2['Voltage'],
    data2['ADC'],
    'ko',
    label='ADC Readings',
)
ax.plot(
    data2['Voltage'], pred_df['mean'], 'g-', label='OLS Fit'
)
ax.plot(
    data2['Voltage'],
    pred_df['obs_ci_upper'],
    'y:',
    label='Upper CI (alpha=0.005)',
)
ax.plot(
    data2['Voltage'],
    pred_df['obs_ci_lower'],
    'y:',
    label='Lower CI',
)
ax.fill_between(
    data2['Voltage'],
    pred_df['obs_ci_lower'],
    pred_df['obs_ci_upper'],
    alpha=0.2,
)
ax.legend(loc='best')

ax.set_title(
    'TM4C123GXL ADC Performance (HW Average of 64 Samples)'
)
ax.set_xlabel('Volts')
ax.set_ylabel('ADC Reading')
Out[21]:
Text(0, 0.5, 'ADC Reading')

Plot Residuals

We now plot the residuals for the linear regressions on the two datasets

In [22]:
fig, ax = plt.subplots(figsize=(10, 10))

ax.plot(data['Voltage'], res1.resid, 'o', label='Raw ADC')
ax.plot(
    data2['Voltage'],
    res2.resid,
    'r+',
    label='ADC, Av of 64',
    markersize=15,
)
ax.legend(loc='best')
ax.axhline(0, color='grey', alpha=0.6)

ax.set_title('TM4C123GXL ADC Residuals against Linear Fit')
ax.set_xlabel('Volts')
ax.set_ylabel('Residual')
Out[22]:
Text(0, 0.5, 'Residual')

Statistical Tests

We now apply a set of tests to check the the smaller residuals in the Hardware Averaged dataset regression are unlikely to have happened by chance.

Bartlett Test

"In statistics, Bartlett's test (see Snedecor and Cochran, 1989) is used to test if k samples are from populations with equal variances."
In [23]:
stats.bartlett(res1.resid, res2.resid)
Out[23]:
BartlettResult(statistic=30.530672983835455, pvalue=3.286294124784678e-08)
In [24]:
f_crit = stats.chi2.ppf(0.95, 1)

f_crit
Out[24]:
3.841458820694124
In [25]:
fstat = 30.53

x = np.linspace(0.1, 40, 100)

_ = plt.plot(
    x, stats.chi2.pdf(x, 1), '-', label=f'Chi^2(1,)'
)

_ = plt.fill_between(x, 0, stats.chi2.pdf(x, 1), alpha=0.5)

_ = plt.axvline(f_crit, color='green', label='95% cutoff')
_ = plt.axvline(
    fstat, color='red', label='Observed Chi^2 statistic'
)
_ = plt.axhline(0, color='gray')
_ = plt.legend(loc='best')
_ = plt.title('Bartlett Test')

Levene Test

Every statistical test has underlying assumptions: let us drop the assumption of normality of the residuals

"The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality."
In [26]:
stats.levene(res1.resid, res2.resid, center='mean')
Out[26]:
LeveneResult(statistic=4.355289527556775, pvalue=0.048693834861916425)
"The test version using the mean was proposed in the original article of Levene ([2]) while the median and trimmed mean have been studied by Brown and Forsythe ([3]), sometimes also referred to as Brown-Forsythe test."
In [27]:
stats.levene(res1.resid, res2.resid, center='trimmed')
D:\Anaconda3\envs\ac5-py37\lib\site-packages\scipy\stats\stats.py:2708: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return atmp[sl]
Out[27]:
LeveneResult(statistic=4.355289527556775, pvalue=0.048693834861916425)
In [28]:
stats.levene(res1.resid, res2.resid, center='median')
Out[28]:
LeveneResult(statistic=4.402475639327327, pvalue=0.04759093688854821)

Plot the result

In [29]:
f_crit = stats.f.ppf(
    0.95, 1, len(res1.resid) + len(res2.resid - 1)
)
f_crit
Out[29]:
4.259677272690231
In [30]:
fstat = 4.355

x = np.linspace(0.1, 10, 100)

_ = plt.plot(
    x,
    stats.f.pdf(
        x, 1, len(res1.resid) + len(res2.resid) - 1
    ),
    '-',
    label=f'F(1, {len(res1.resid) + len(res2.resid)-1})',
)

_ = plt.fill_between(
    x,
    0,
    stats.f.pdf(
        x, 1, len(res1.resid) + len(res2.resid) - 1
    ),
    alpha=0.5,
)

_ = plt.axvline(f_crit, color='green', label='95% cutoff')
_ = plt.axvline(
    fstat, color='red', label='Observed F statistic'
)
_ = plt.axhline(0, color='gray')
_ = plt.legend(loc='best')
_ = plt.title('Levene Test')

Fligner Test

Apply another test with different assumptions

"Fligner’s test tests the null hypothesis that all input samples are from populations with equal variances. Fligner-Killeen’s test is distribution free when populations are identical [2]."
In [31]:
stats.fligner(res1.resid, res2.resid)
Out[31]:
FlignerResult(statistic=5.669609800545504, pvalue=0.01726129518976522)

F Test

Finally apply the F Test although:

 ... test is so sensitive to the assumption of normality

Compute F statistic

In [32]:
a, b = (
    np.var(res1.resid, ddof=1),
    np.var(res2.resid, ddof=1),
)
a, b
fstat = a / b
fstat
Out[32]:
70.80436557004255

Probability of seeing this value or greater

In [33]:
fdist = stats.f(len(res1.resid) - 1, len(res2.resid))
p_value = 1 - fdist.cdf(fstat)
p_value
Out[33]:
3.6332373776204463e-09
In [34]:
f_crit = stats.f.ppf(
    0.99, len(res1.resid) - 1, len(res2.resid)
)
f_crit
Out[34]:
4.219819998398951

Plot the results

In [35]:
x = np.linspace(0.1, 80, 100)

_ = plt.plot(
    x,
    stats.f.pdf(
        x, len(res1.resid) - 1, len(res2.resid) - 1
    ),
    '-',
    label=f'F({len(res1.resid) - 1}, {len(res2.resid)-1})',
)

_ = plt.fill_between(
    x,
    0,
    stats.f.pdf(
        x, len(res1.resid) - 1, len(res2.resid) - 1
    ),
    alpha=0.5,
)

_ = plt.axvline(f_crit, color='green', label='99% cutoff')
_ = plt.axvline(
    fstat, color='red', label='Observed F statistic'
)
_ = plt.axhline(0, color='gray')
_ = plt.legend(loc='best')

Conclusions

From our test results (at the 5% level), we can be confident that the improved jitter did not occur by chance, and that Hardware Averaging in the ADC does improve stability of the readings.


Environment

In [36]:
%watermark

%watermark -h -iv
2020-06-07T09:27:29+10:00

CPython 3.7.1
IPython 7.2.0

compiler   : MSC v.1915 64 bit (AMD64)
system     : Windows
release    : 10
machine    : AMD64
processor  : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
CPU cores  : 8
interpreter: 64bit
platform    1.0.8
statsmodels 0.9.0
numpy       1.15.4
seaborn     0.9.0
scipy       1.1.0
matplotlib  3.0.2
pandas      1.0.0
host name: DESKTOP-SODFUN6
In [37]:
# show info to support reproducibility

theNotebook = 'ADCAnalysis.ipynb'


def python_env_name():
    envs = subprocess.check_output(
        'conda env list'
    ).splitlines()
    # get unicode version of binary subprocess output
    envu = [x.decode('ascii') for x in envs]
    active_env = list(
        filter(lambda s: '*' in str(s), envu)
    )[0]
    env_name = str(active_env).split()[0]
    return env_name


# end python_env_name

print('python version     : ' + sys.version)
print('python environment :', python_env_name())

print('current wkg dir    : ' + os.getcwd())
print('Notebook name      : ' + theNotebook)
print(
    'Notebook run at    : '
    + str(datetime.datetime.now())
    + ' local time'
)
print(
    'Notebook run at    : '
    + str(datetime.datetime.utcnow())
    + ' UTC'
)
print('Notebook run on    : ' + platform.platform())
python version     : 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
python environment : ac5-py37
current wkg dir    : C:\Users\donrc\Documents\JupyterNotebooks\PythonNotebookProject\develop
Notebook name      : ADCAnalysis.ipynb
Notebook run at    : 2020-06-07 09:27:38.618146 local time
Notebook run at    : 2020-06-06 23:27:38.619144 UTC
Notebook run on    : Windows-10-10.0.18362-SP0
In [ ]:
 
Comment

Mon 25 May 2020

Filed under scipy

Tags statistics scipy statsmodels

Regression for count-style dependent outcomes

Read More

Fri 22 May 2020

Filed under scipy

Tags statistics scipy statsmodels

Regression for binary dependent outcomes

Read More

Tue 05 May 2020

Filed under scipy

Tags statistics scipy statsmodels

Selection of Parameters for Linear Models

Read More

net-analysis.com Data Analysis Blog © Don Cameron Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More