If you dwell in Python programming or data science, you might have heard a lot about Numpy. This is the most used Python library for data analytics and data science.

In this Numpy data analytics tutorial, I am explaining about Numpy Python module. How is it beneficial for data science? The Programming examples I am sharing here will make it more practical.

Whether you are new to the data science or its been a long time being a data scientist, this tutorial will add value to your knowledge.

Here is a brief outline of this tutorial.

**Table of Contents**

- Why Numpy for Data Science? (Use)
- How to Install the Numpy Python Module
- How to Create Numpy Array?
- Basic Arithmetic Operations on Numpy Array
- Multi-dimensional Arrays (Matrix) in Numpy
- Selecting Elements from Numpy Array/Matrix
- Indexing and Slicing in Numpy Array
- How to Reshape Numpy Array?
- Creating Auto Initialized Numpy Arrays
- Mathematical Operations on Numpy Matrix
- Initialize Numpy Array with Random Numbers
- How to Explore Numpy? [Your Action]

So let’s begin.

A few days back I attended TEDx talk about Data Science and one point that triggered me a lot-

The Amount of data we analyze is less than 1%.

This makes my brain spin.

The huge amount of Data is being generated. To analyze this huge data and to parse this data into information, Data Science has to perform the crucial role.

Python is leading its way by providing some of the very essential Python libraries for Data Science. Numpy is at the top of the list.

Rather than answering it, in this tutorial, I am demonstrating different use cases and examples. At the end of this tutorial, you will find the answer to this question by yourself.

Hope you have Python installed on your system. (If not, check this guide for installing Python.)

Run the following command to install Numpy module in Python:

pip install numpy

This will install the latest version of Numpy module.

To check if it is successfully installed, you can list all the installed Python modules using freeze command.

pip freeze

The core strength of the Numpy Python module is its array (one or multi-dimensional array). Almost all the operations provided by Numpy module revolves around a Numpy array.

So let’s see Numpy array…

import numpy as np numArr = np.array([21,22,23,24,25]) print(numArr,len(numArr))

**Output:**

[21 22 23 24 25]

You can also specify the **type of Numpy array**.

import numpy as np intArr = np.array([21,22,23,24,25],'int') floatArr = np.array([21,22,23,24,25],'float') strArr = np.array([21,22,23,24,25],'str') print(intArr,floatArr,strArr,sep='\n')

It typecasts each element in the Numpy array with the specified data type.

**Output:**

[21 22 23 24 25] [21. 22. 23. 24. 25.] ['21' '22' '23' '24' '25']

Now onwards, we will be exploring different Python code using Numpy. You can not type a code in your text editor save it and then run it for each example. You will be exhausted.

Rather, install Jupyter and start using it, now. Thanks me later :). It is one of the best editors for Data Science where you can run your Python code inside the browser.

There are numerous arithmetic operations are there.

Following is a program to find the sum, mean and standard deviation for the numbers mentioned in the Numpy array.

import numpy as np floatArr = np.array([21,22,23,24,25,21.2,147.6,23.7],'float') print(floatArr.sum(),floatArr.mean(),floatArr.std(),sep='\n')

**Output:**

307.5

38.4375

41.2800780492237

This one-liner code for performing arithmetic operation saves your lot of time. And it is faster than the usual way of doing a mathematical operation on the list.

Data science is more about finding a relation between two dataset entities. Whether you need to plot a graph or want to compare elements in two datasets, you need a multi-dimensional array.

Numy has provision for multi-dimensional arrays.

import numpy as np arr=[[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]] intArr=np.array(arr) print(intArr)

**Output:**

[[ 0 1 2 3 4] [10 11 12 13 14] [20 21 22 23 24]]

Numpy also provides various functions for evaluating data elements in the multi-dimensional array. Going forward with this Numpy data analytics tutorial, I will share with you.

How if you want to filter elements from the Numpy array?

Let’s take an Example:

**Write a Program to Select all the elements in the Matrix which are divisible by 2.**

Mark each element in the Numpy array as 1 if it is even, otherwise marks it as 0 (odd).

You can simply do it by mentioning your filter expression.

import numpy as np numArr=np.array([[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]]) print(np.where(numArr%2==0,1,0))

**Output:**

[[1, 0, 1, 0, 1], [1, 0, 1, 0, 1], [1, 0, 1, 0, 1]]

If you look at the above example, you don’t need to loop over each element in the array. This is what the Numpy array is different and efficient than the Python list.

**Indexing** is used for finding the element at a particular position in the number list.

**Slicing** is finding a set of elements from its starting to end point.

import numpy as np arr=[[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]] intArr=np.array(arr) print(intArr[1,3]) #indexing print(intArr[1:2,1:3]) #slicing

**Output:**

13 [[11 12]]

You can change the dimension of the Numpy array.

**Program to Change the Dimension of the Numpy Array from (3,5) to (5,5):**

import numpy as np arr=np.array([[0,1,2,3,4],[10,11,12,13,14],[20,21,22,23,24]]) print(arr.shape) print(arr.reshape(5,3))

**Output:**

(3, 5) [[ 0 1 2] [ 3 4 10] [11 12 13] [14 20 21] [22 23 24]]

Note: You can not simply change the array with any dimensions. Dimension should be matching to the dimension of the initial array.

**Program to create Numpy array having numbers from 0 to 9.**

import numpy as np iniArr = np.arange(10) print(iniArr) #creating multidimentional initialized array iniArr2 = np.array([np.arange(10),np.arange(10)]) print(iniArr2)

**Output:**

[0 1 2 3 4 5 6 7 8 9] [[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4 5 6 7 8 9]]

We can auto initialize both single and multi-dimensional Numpy array.

**Initializing Numpy array with ones.**

import numpy as np iniArr = np.ones(10) print(iniArr)

**Output:**

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Here, by default, the type of Numpy array is a float.

**Arranging Numpy Array with Power on n-value.**

import numpy as np iniArr = np.arange(10)**2 print(iniArr)

**Output:**

[ 0 1 4 9 16 25 36 49 64 81]

Numpy Python module has special functions for matrix operations. It is very useful in data science for processing data.

If you are familiar with the matrix, you might aware how complicate matrix operations are. If you are using Numpy array for matrix operation, you don’t need to write logic for matrix operations.

Just think what you need to do with the matrix and Numpy will do for you.

It makes your job pretty easy now, doesn’t it?

**What is an Identity Matrix?**

If the transpose of the matrix is one, it is called as an identity matrix.

**Characteristics of Identity Matrix:**

- If we mark all the diagonal elements one and all other zeros, it becomes identity matrix.
- Multiplying the identity matrix to any other matrix will give the same matrix.

Numpy has a built-in function to create an identity matrix.

**Syntax:**

identity(size)

**Program:**

import numpy as np print(np.identity(10))

**Output:**

[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

In scalar multiplication, each element in the matrix is multiplied by the constant number value.

import numpy as np xArr = np.arange(10) yArr = np.array([xArr,xArr]) #Scalar multiplication print(yArr*2)

**Output:**

[[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]]

In Dot multiplication, two matrices are multiplied.

A(x,y) * B(y,z) = C (x,z)

**Note:** For dot matrix multiplication, number of column in the first matrix should be the same as the number of rows in the second matrix

As like scalar matrix multiplication, dot matrix multiplication in Numpy has special function dot().

**Program for Dot Matrix:**

import numpy as np newArr = np.dot(yArr,yArr.reshape(10,2)) #Dot product print(newArr)

**Output:**

[[220 265] [220 265]]

Numpy supports random number to initialize the array. You can also provide the range of values for the random number.

import numpy as np np.random.randint(-10,10,size=(9,9))

**Output:**

[[ 5, 5, -5, 4, -10, -8, 7, 9, 8], [ 0, -4, -8, -1, -1, -8, 6, 9, -7], [ 7, -5, 2, 0, 7, -8, 6, -3, -4], [ 5, 3, 0, 4, 4, 3, 0, -5, 0], [ -5, 4, -2, 3, 2, 6, -4, -1, 6], [ -4, -10, -10, 0, -6, -2, -9, 4, 3], [ -4, 8, 8, -8, 8, -2, -8, 4, 4], [ -4, -9, -4, -4, 1, 9, -1, 9, 7], [ -2, -9, -4, -2, -6, 7, -5, 8, 7]]

This is all about Numpy tutorial for data science. I have tried to explain it with required programming examples.

There are many functions you can use with Numpy array for data analytics.

Here is the simple program to list all the Python functions associated with Numpy array.

import numpy as np arr = np.array([23, 45, 24, 55]) print(dir(arr))

**Output:**

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

Try to implement these Numpy functions in your actual data science project.

As like Numpy, there are some more essential Python libraries for Data Science. Explore these libraries to excel in Data Science.

In the next tutorial, we will explore one more Data Science library called Pandas. You can always check all the updated post on my complete Python tutorial page. Stay Tuned!

If you like this Numpy data analytics tutorial and find it informative, please do share with your friends.

Happy Pythoning!