[Solved] Find Duplicate in Array in O(n) Linear Time

Aniruddha Chaudhari
Updated: Jun 29, 2022

Problem Statement:

An array contains n numbers ranging from 0 to n-1. There are some numbers duplicated in the array.

It is not clear how many numbers are duplicated or how many times a number gets duplicated.

How do you find a duplicated number in the array?

Example:

If an array of length 7 contains the numbers {2, 3, 1, 0, 2, 5, 3}, the implemented function (or method) should return either 2 or 3.

Method 1: Using Sorting

The simple solution to the above problem is sorting elements in the array list. If the number is the same as the number located next to it in the array, then the number is duplicate.

Python Program:

def findDup(liArr):
    liArr.sort()
     
    liDuplicate=[]
    for i in range(0, len(liArr)-1):
        if liArr[i]==liArr[i+1]:
            liDuplicate.append(liArr[i])

    return liDuplicate

print(findDup([2, 3, 1, 0, 2, 5,3]))

Output:

[2, 3]

Complexity:

In the best case, the merge sort takes time O(nlogn) to sort the n elements. After sorting, we are traversing over the sorted array again, this will take time O(n).

So the total complexity of this algorithm is O(nlogn+n) i.e. O(nlogn).

Let’s see another optimized solution which is having less complexity.

Read: Different types of sorting algorithms

Method 2: Using Hashing

A hash table of size n is used. There will be one hash table entry for each element. The value in the hash table can be either 0 or 1.

Algorithm:

Take the hash table of size n (says hashIndex) and initialize each value in the hashtable to zero.
Traverse over each element in the array.
For each element (i) in the array
- if hashIndex[i]==0, set hashIndex[i]=1
- if hashIndex[i]==1, element is duplicate.

Let’s implement this logic by coding.

Python Program:

def findDuplicate(arr):
    liDuplicate=[]
    hashIndex=[0]*len(arr)
    for i in arr:
        if hashIndex[i]==0:
            hashIndex[i]=1
        elif hashIndex[i]==1:
            liDuplicate.append(i)

    return liDuplicate
     
arr=[4, 5, 2, 1, 4, 6, 6]
print(findDuplicate(arr))

Output:

[4, 6]

Complexity:

Here we are using the hashing technique. The hashIndex is a kind of hash table where the key is an element from the actual array and the value is 0 or 1.

Each element in the array is visited at once. The time complexity of this algorithm is O(n).

This question to find duplicates in array was asked on the NVIDIA interview coding round. You can solve this problem in any programming language like Python, C/++ or Java.

FAQ (MCQ) question:

Vijay is given a problem to solve. he is given an array of names and asked to find duplicates in the names. Vijay builds a hashtable with all the names and uses that to find duplicates. Which of the following statements are true?

(unless otherwise stated, assume that the hash-function and hash-table are working well and doing a good job.) pick all that apply in most cases,

the program will run in o(n) time in most cases,
the program will run in o(n log n) time in most cases,
the program will run in o(n^2) time
if the hash-function is a doing a bad job, and there are lots of collisions, the program will run in o(n log n) time
if the hash-function is a doing a bad job, and there are lots of collisions, the program will run in o(n^2) time

Answer: 5

Are You Ready for the Next Challenge?

You can use a similar technique to solve the below coding challenge.

Write a program to print all the unique numbers present in the array.
Write a program to find out the number of times each number is present in the array.

To solve the above coding challenge, you just have to tweak some lines of code in the above programs.

If you find the optimized solution to these problems, share it with me by writing in the comment section. You are free to use any programming leagues like C/C++, Java or Python.

Happy Coding!

Aniruddha Chaudhari

I am a Python enthusiast who loves Linux and Vim. I hold a Master of Computer Science degree from NIT Trichy and have 10 years of experience in the IT industry, focusing on the Software Development Lifecycle from Requirements Gathering, Design, Development to Deployment. I have worked at IBM, Ericsson, and NetApp, and I share my knowledge on CSEstack.org.