Fri 26 June 2020

Filed under Machine Learning

Tags fastai

Machine Learning Fail

Introduction

A little while ago, I created a machine learning App to recognize Australian urban birds. I took the Australian Museum list of the 30 most common small urban birds, and used the fastai methodology and libraries to train a recognizer for each bird species.

I had a unique and refreshing approach to input data quality: I had none at all!

I used the common name of the bird, did a google image search, and used the top 200 thumbnail images returned, with no vetting at all. This was part of an experiment to see how far you could get with the lazy persons approach to Machine Learning.

I was close to astounded at how well it worked.

Live Test Images

Then just the other day, my brother gave me a wildlife camera to position next to my birdbath (which is very popular with the local wild life). I was able to take a number of close-ups of the local bird species. So I thought, I will run these images through my recognizer.

Success - (Pride before the Fall)

The first few were OK

Noisy Minor

I was a little surprised that the Rainbow Lorikeet was not identified with more certainty

Rainbow Lorikeet

It is hard to mistake a Pied Currawong up close

Pied Currawong

Fail

The last one was crushing.

Butcherbird

Whiskey Tango Foxtrot! My recognizer got the Grey Butcherbird completely wrong! The shame!

The Reason Why (I think)

I think I know what happened.

It turns out that there are two common butcherbird species (and both are common where I live).

There is the Grey Butcherbird (Cracticus torquatus),

Grey Butcherbird

and the Pied Butcherbird (Cracticus nigrogularis).

Pied Butcherbird

I suspect that my naive search for "Butcherbird" got images of both species. When I repeated the search just now, it certainly got images of both species. My bird recognizer really just matches by textures, and these two birds have completely different textures, so it is no wonder that recognition failed.

Conclusions

So the moral of the story is: Data Quality Matters! Ignoring input data quality might look to be the easy way to start, but it will come back to bite you later, in production.

In [ ]:
 
Comment

net-analysis.com Data Analysis Blog © Don Cameron Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More