And you don’t need to memorize how binary search works
Categories:
#data structures
Tags:
#python
#trees
#recursion
I always get anxious when I see a problem that involves a sorted list because I know that it’ll inevitably involve using binary search.
This fear is completely irrational, and I always kick myself for feeling this way because binary search is beautiful. Given a sorted list you can find the index of any value in log(N) time because you get to cut your search space in half with every lookup.
Precisely this operation allows sorted lists to be very space and time efficient at solving problems requiring lookups as well as determining how many entries are greater or less than your target.
Now if you want to take advantage of your sorted lists you may be tempted to write your own binary search and insert functions like I’ve been doing so far and inevitably it takes a few tries to actually get it working right. Thankfully there’s a built in library that already does the same thing, in a single line and a lot faster too.
The bisect library is a tiny one, but it does two things and it does them well; search and insert.
For search we have bisect_left and bisect_right which perform binary search to find where to insert a value right before our target on either the left or right. In the case of bisect_right it will overshoot by one index to show you where you should insert your new value to maintain order.
list: [1,2,3,4,5], target = 4
For insertion we have insort_left and insort_right which perform binary insert to actually insert our value on either the left or the right of our target and manipulates our list inplace.
Apart from bisect and insort which are equivalent in function to bisect_right and insort_right respectivly, these are the only functions available in this library. However they completely replace the need to write your own functions for lookups and inserts.
1def binary_search(lst, target):
2 start, end = 0, len(lst) - 1
3 while start <= end:
4 middle = (end+start)//2
5 if target > lst[middle]:
6 start = middle + 1
7 elif target < lst[middle]:
8 end = middle - 1
9 else:
10 return middle
11 return -1
And here’s a binary race between it and bisect. We’ll be looking for value 0 because ironically it will take the longest to find with binary search since 0 is never in the middle of anything.
1#list and target
2input_list = list(range(1000000))
3target = 0
4
5#my binary search
6start = time.time()
7index = binary_search(input_list, target)
8duration = time.time() - start
9print("search found {} at index {} in {} seconds".format(target,index,duration))
10
11#bisect binary search
12start = time.time()
13index = bisect.bisect_left(input_list, target)
14if index == len(input_list) or input_list[index] != target:
15 index = -1
16duration = time.time() - start
17print("bisect found {} at index {} in {} seconds".format(target,index,duration))
Here we can see that bisect is 6 times faster than my implementation
Because bisect is implemented in C it is much faster, though both functions use the exact same algorithm. Python is just a lot slower at adding numbers than C because all numbers in Python are actually Integer classes which in turn take time to call their addition functions whereas in C integers are just 4 byte data types and are processed very efficiently by the CPU. The difference in time can start to compound into a tangible boost in performance when you start doing many lookups or insertions sequentially.
Bisect is a small but useful library that helps maintain sorted order in lists as well as providing lookups in a quick and efficient way. Knowing about its existence can save on both implementation and runtime and it is an indespensible tool for many coding interviews where questions about sorted lists tend to crop up from time to time.
Launching my AI Web Security Lab — Here’s Why I’m Building It