Searching Optimisation on Python

Spread the love

Well, it’s me again, after a long disappearing from the world, I am back to update this blog again. I will be sharing some of the lessons I have learnt in my real life.

Searching

Searching is one of the important parts in our code. We use a lot of different techniques to do a huge amount of searching in implementing our function in the real world. When searching comes into the data of big sizes, the efficiency of searching algorithms becomes matter.

words = ["this", "is", "a", "list", "of", "words"]

This is a list in python, consisting of strings. When we want to check whether something is in the list, we use a searching method. Usually we would run this to check whether something is in the list or not.

Using list to find existence

"of" in words # return True


This returns boolean status that tells us whether it is valid, or not. What python does behind the scene is linear search. Which is looking one by one and checking whether it is exists. In the worst case, it will have a time complexity of O(n). What it does underneath is:

for word in words:
    if word == "of":
        return True
return False

This could be slow, if we had a long list of words, i.e: 10k words. So how could we shorten this?
You know, we as human only live 22,075,000 seconds in average, if we spend 5,000 seconds on searching in long list, while we can do it in 5 seconds, we are wasting that 5,000 seconds in our life, and it can’t be recovered, thus it is crucial for us to always trying to improve the performance of our code instead of wasting everyone life.

Using set to find existence

Instead of using list, set() can save human life by shortening the time needed for our searching algorithm, especially for the one that searches whether one exists.

All-hail the life saver – set(), but how should we use set to reduce the time complexity? Most importantly, what is the time complexity for this? Is it better than O(n) ?

words = set(words) # this convert the list into set
"of" in words # return True

It’s easy, one line of a simple code, save your life! You just had to convert it to a set() with set(). Moreover, this is a O(1) search, which means you only have to do one operation instead of n times if your list has size of n. The larger the size of the data, the greater the performance gap between set and list.

Some thoughts

So, why is set() faster?
set() is implemented using a hash table. When adding an element into the set, the hash will be calculated and added to a position in memory. When we want to look for an element, it just checks over the hash table for that position. That’s why it only does one operation, therefore, the speed of searching in a set doesn’t depend on the size of the data.

This is a simple concept, and most people might already know, but I still share it as I just faced this issue last week when doing some word searching on a long list of positive and negative words in doing sentence sentiment code. This does reduce the time from 1s to 0.04s (25x) for my case, which is very significant for my use case. If you found this blog post has saved your life, you may share it so that it can save more life.

Leave a Reply