Best Numpy Data Analytics Tutorial | Array Operations Explained with Code

Aniruddha Chaudhari
Updated: Oct 25, 2019

Python for data science

If you dwell in Python programming or data science, you might have heard a lot about Numpy. This is the most used Python library for data analytics and data science.

In this Numpy data analytics tutorial, I am explaining about Numpy Python module. How is it beneficial for data science? The Programming examples I am sharing here will make it more practical.

Whether you are new to the data science or its been a long time being a data scientist, this tutorial will add value to your knowledge.

Here is a brief outline of this tutorial.

Table of Contents

Why Numpy for Data Science? (Use)
How to Install the Numpy Python Module
How to Create Numpy Array?
Basic Arithmetic Operations on Numpy Array
Multi-dimensional Arrays (Matrix) in Numpy
Selecting Elements from Numpy Array/Matrix
Indexing and Slicing in Numpy Array
How to Reshape Numpy Array?
Creating Auto Initialized Numpy Arrays
Mathematical Operations on Numpy Matrix
Initialize Numpy Array with Random Numbers
How to Explore Numpy? [Your Action]

So let’s begin.

Why is a Numpy so useful for Data Science?

A few days back I attended TEDx talk about Data Science and one point that triggered me a lot-

The Amount of data we analyze is less than 1%.

This makes my brain spin.

The huge amount of Data is being generated. To analyze this huge data and to parse this data into information, Data Science has to perform the crucial role.

Python is leading its way by providing some of the very essential Python libraries for Data Science. Numpy is at the top of the list.

Rather than answering it, in this tutorial, I am demonstrating different use cases and examples. At the end of this tutorial, you will find the answer to this question by yourself.

Installing the Numpy Python Module

Hope you have Python installed on your system. (If not, check this guide for installing Python.)

Run the following command to install Numpy module in Python:

pip install numpy

This will install the latest version of Numpy module.

To check if it is successfully installed, you can list all the installed Python modules using freeze command.

pip freeze

The core strength of the Numpy Python module is its array (one or multi-dimensional array). Almost all the operations provided by Numpy module revolves around a Numpy array.

So let’s see Numpy array…

Numpy Data Analytics Tutorial | Creating a Numpy Array

import numpy as np
numArr = np.array([21,22,23,24,25])
print(numArr,len(numArr))

Output:

[21 22 23 24 25]

You can also specify the type of Numpy array.

import numpy as np
intArr = np.array([21,22,23,24,25],'int')
floatArr = np.array([21,22,23,24,25],'float')
strArr = np.array([21,22,23,24,25],'str')
print(intArr,floatArr,strArr,sep='\n')

It typecasts each element in the Numpy array with the specified data type.

Output:

[21 22 23 24 25]
[21. 22. 23. 24. 25.]
['21' '22' '23' '24' '25']

Now onwards, we will be exploring different Python code using Numpy. You can not type a code in your text editor save it and then run it for each example. You will be exhausted.

Rather, install Jupyter and start using it, now. Thanks me later :). It is one of the best editors for Data Science where you can run your Python code inside the browser.

Basic Arithmetic Operations on Numpy Array

There are numerous arithmetic operations are there.

Following is a program to find the sum, mean and standard deviation for the numbers mentioned in the Numpy array.

import numpy as np
floatArr = np.array([21,22,23,24,25,21.2,147.6,23.7],'float')
print(floatArr.sum(),floatArr.mean(),floatArr.std(),sep='\n')

Output:

307.5
38.4375
41.2800780492237

This one-liner code for performing arithmetic operation saves your lot of time. And it is faster than the usual way of doing a mathematical operation on the list.

Multi-dimensional Arrays (Matrix) in Numpy

Data science is more about finding a relation between two dataset entities. Whether you need to plot a graph or want to compare elements in two datasets, you need a multi-dimensional array.

Numy has provision for multi-dimensional arrays.

import numpy as np
arr=[[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]]
intArr=np.array(arr)
print(intArr)

Output:

[[ 0 1 2 3 4]
[10 11 12 13 14]
[20 21 22 23 24]]

Numpy also provides various functions for evaluating data elements in the multi-dimensional array. Going forward with this Numpy data analytics tutorial, I will share with you.

Selecting Elements from Numpy Array/Matrix

How if you want to filter elements from the Numpy array?

Let’s take an Example:

Write a Program to Select all the elements in the Matrix which are divisible by 2.

Mark each element in the Numpy array as 1 if it is even, otherwise marks it as 0 (odd).

You can simply do it by mentioning your filter expression.

import numpy as np
numArr=np.array([[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]])
print(np.where(numArr%2==0,1,0))

Output:

[[1, 0, 1, 0, 1],
[1, 0, 1, 0, 1],
[1, 0, 1, 0, 1]]

If you look at the above example, you don’t need to loop over each element in the array. This is what the Numpy array is different and efficient than the Python list.

Indexing and Slicing in Numpy Array

Indexing is used for finding the element at a particular position in the number list.

Slicing is finding a set of elements from its starting to end point.

import numpy as np
arr=[[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]]
intArr=np.array(arr)
print(intArr[1,3]) #indexing
print(intArr[1:2,1:3]) #slicing

Output:

13

[[11 12]]

Reshaping Numpy Array

You can change the dimension of the Numpy array.

Program to Change the Dimension of the Numpy Array from (3,5) to (5,5):

import numpy as np
arr=np.array([[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]])
print(arr.shape)
print(arr.reshape(5,3))

Output:

(3, 5)
[[ 0 1 2]
[ 3 4 10]
[11 12 13]
[14 20 21]
[22 23 24]]

Note: You can not simply change the array with any dimensions. Dimension should be matching to the dimension of the initial array.

Creating Auto Initialized Numpy Arrays

Program to create Numpy array having numbers from 0 to 9.

import numpy as np
iniArr = np.arange(10)
print(iniArr)

#creating multidimentional initialized array
iniArr2 = np.array([np.arange(10),np.arange(10)])
print(iniArr2)

Output:

[0 1 2 3 4 5 6 7 8 9]

[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]

We can auto initialize both single and multi-dimensional Numpy array.

Initializing Numpy array with ones.

import numpy as np
iniArr = np.ones(10)
print(iniArr)

Output:

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Here, by default, the type of Numpy array is a float.

Arranging Numpy Array with Power on n-value.

import numpy as np
iniArr = np.arange(10)**2
print(iniArr)

Output:

[ 0 1 4 9 16 25 36 49 64 81]

Mathematical Operations on Numpy Matrix

Numpy Python module has special functions for matrix operations. It is very useful in data science for processing data.

If you are familiar with the matrix, you might aware how complicate matrix operations are. If you are using Numpy array for matrix operation, you don’t need to write logic for matrix operations.

Just think what you need to do with the matrix and Numpy will do for you.

It makes your job pretty easy now, doesn’t it?

Creating Identity Matrix Array in Numpy

What is an Identity Matrix?

If the transpose of the matrix is one, it is called as an identity matrix.

Characteristics of Identity Matrix:

If we mark all the diagonal elements one and all other zeros, it becomes identity matrix.
Multiplying the identity matrix to any other matrix will give the same matrix.

Numpy has a built-in function to create an identity matrix.

Syntax:

identity(size)

Program:

import numpy as np
print(np.identity(10))

Output:

[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

Scalar Matrix multiplication in Numpy

In scalar multiplication, each element in the matrix is multiplied by the constant number value.

import numpy as np
xArr = np.arange(10)
yArr = np.array([xArr,xArr])
#Scalar multiplication
print(yArr*2)

Output:

[[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]]

Dot Matrix Multiplication in Numpy

In Dot multiplication, two matrices are multiplied.

A(x,y) * B(y,z) = C (x,z)

Note: For dot matrix multiplication, number of column in the first matrix should be the same as the number of rows in the second matrix

As like scalar matrix multiplication, dot matrix multiplication in Numpy has special function dot().

Program for Dot Matrix:

import numpy as np
newArr = np.dot(yArr,yArr.reshape(10,2)) #Dot product
print(newArr)

Output:

[[220 265]
[220 265]]

Initialize Numpy Array with Random Numbers

Numpy supports random number to initialize the array. You can also provide the range of values for the random number.

import numpy as np
np.random.randint(-10,10,size=(9,9))

Output:

[[ 5, 5, -5, 4, -10, -8, 7, 9, 8],
[ 0, -4, -8, -1, -1, -8, 6, 9, -7],
[ 7, -5, 2, 0, 7, -8, 6, -3, -4],
[ 5, 3, 0, 4, 4, 3, 0, -5, 0],
[ -5, 4, -2, 3, 2, 6, -4, -1, 6],
[ -4, -10, -10, 0, -6, -2, -9, 4, 3],
[ -4, 8, 8, -8, 8, -2, -8, 4, 4],
[ -4, -9, -4, -4, 1, 9, -1, 9, 7],
[ -2, -9, -4, -2, -6, 7, -5, 8, 7]]

Your Action…

This is all about Numpy tutorial for data science. I have tried to explain it with required programming examples.

There are many functions you can use with Numpy array for data analytics.

Here is the simple program to list all the Python functions associated with Numpy array.

import numpy as np
arr = np.array([23, 45, 24, 55])
print(dir(arr))

Output:

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

Try to implement these Numpy functions in your actual data science project.

As like Numpy, there are some more essential Python libraries for Data Science. Explore these libraries to excel in Data Science.

In the next tutorial, we will explore one more Data Science library called Pandas. You can always check all the updated post on my complete Python tutorial page. Stay Tuned!

If you like this Numpy data analytics tutorial and find it informative, please do share with your friends.

Happy Pythoning!

Aniruddha Chaudhari

I am a Python enthusiast who loves Linux and Vim. I hold a Master of Computer Science degree from NIT Trichy and have 10 years of experience in the IT industry, focusing on the Software Development Lifecycle from Requirements Gathering, Design, Development to Deployment. I have worked at IBM, Ericsson, and NetApp, and I share my knowledge on CSEstack.org.