Widespread Machine Studying Programming Errors in Python

Widespread Python Errors in Machine Studying

Supply

On this submit I’ll go over a few of the most typical errors I come throughout in python throughout the mannequin constructing and growth course of. For demonstration functions we’ll use top/weight information which might be discovered right here on Kaggle. The info accommodates gender, top in inches and weight in kilos.

The most typical errors I’ve come accross in my expertise are the next:

IMPORTS

  1. NameError
  2. ModuleNotFoundError
  3. AttributeError
  4. ImportError

READING DATA

6. FileNotFoundError

SELECTING COLUMNS

7. KeyError

DATA PROCESSING

eight. ValueError

We’ll construct a easy linear regression mannequin and modify the code to point out how the above errors come up in observe.

First let’s import the info utilizing pandas and print the primary 5 rows:

As you’ll be able to see the info set could be very easy, with gender, top and weight columns. The following factor we are able to do is visualize the info utilizing matplotlib and seaborn:

Wanting on the scatter plot of the load vs top we see that the connection is linear.

Subsequent let’s outline our enter (X) and output (y) and break up the info for coaching and testing:

We will then outline a linear regression mannequin, match to our coaching information, make predictions on the check set, and consider the efficiency of the mannequin:

The primary error I’ll focus on in NameError which happens if, for instance, I neglect to import a package deal. Within the following code I’ve eliminated “import numpy as np” :

If I try and run the script with that line of code lacking I get the next error:

I’d obtain related messages for leaving out the import statements for seaborn, matplotlib and pandas:

One other concern is by chance making an attempt to import a package deal that doesn’t exist as a consequence of misspelling, which ends up in a ModuleNotFoundError. For instance if I misspell ‘pandas’ as ‘pandnas’:

Or if I forgot ‘pyplot’ within the matplotlib scatterplot import we get an AttributeError:

Equally if I forgot the linear_regression and model_selection attributes within the sklearn import, I’ll get an ImportError:

By way of studying recordsdata, if I misspell the title of the file I get a FileNotFoundError:

Moreover, if I attempt to choose a column from a pandas information body that doesn’t exist I get a KeyError :

If I neglect to transform the pandas collection for “Weight” and “Peak” into numpy arrays I get a ValueError. That is truly quite common sklearn strategies solely settle for numpy arrays. I continuously discover myself forgetting this easy step of changing from a pandas collection to a numpy array:

or If I neglect to reshape the numpy array right into a 2 dimensional array I additionally get a ValueError:

One other widespread explanation for a ValueError is when finishing up the practice check break up. I typically neglect the order of the X and y arrays:

The place I swap X_test and y_train:

which provides the next error upon becoming:

Lastly, when making an attempt to suit a mannequin information akin to a particular class or inhabitants I typically come throughout the difficulty of not having sufficient information. Let’s filter our dataframe to duplicate this concern. Let’s filter the info to solely embrace data the place the ‘Weight’ = 241.893563. It will end in precisely one row of knowledge:

If we attempt to construct our mannequin we get the next error within the line the place we break up out information::

And if we attempt to match we get the next error::

Lastly, if the info has lacking or infinite values the becoming whereas throw an error. Let’s redefine the load column with ‘nan ‘ (Not a Quantity) values to generate this error:

We might get the identical error message with infinite values:

On this submit we reviewed totally different errors that come up when creating fashions in python. We reviewed errors associated to importing packages, studying recordsdata, deciding on columns and processing information. Having strong data of the various kinds of errors that come up when creating machine studying fashions might be helpful when productionizing machine studying code. Having this data can forestall errors from occurring in addition to inform the logic that can be utilized to catch these errors once they happen.

There are a lot of extra errors that I come throughout each day however the errors I’ve listed on this submit are most typical in my expertise. I hope this submit was helpful. The code from this submit is obtainable on GitHub. Thanks for studying!

Leave a Reply

Your email address will not be published. Required fields are marked *