Sweep Line
Get startedজয় শ্রী রাম
🕉At the core of the powerful concept of Sweep Line is a vertical line (and/or, a horizontal line, in some cases) that is conceptually “swept” across the plane.
Sweep Line technique is often used to find intersections but can be extended to other scenarios like finding areas etc.
We will learn Sweep Line algorithm by discussion how to solve a basic geometric problem: Orthogonal Line Segment Intersection search.
Suppose we have a large number of vertical and horizontal line segments. We are talking in the magnitude of hundreds of millions of line segments here. We will be finding all the intersections wherever a vertical line cut a horizontal line.
Let's consider the picture above. We have 6 intersections here.
A naive solution will be quadratic in time (O(n2), where n = total number of line segments present), where for each line segment we iterate over all other line segments to check if there is an intersection. But such algorithm is not practical for millions of line segments. This is not desirable when we are talking in millions of data points.
Instead using Sweep Line algorithm we get an optimized O(nlogn + R) solution (where n = total number of given vertical and horizontal line segments, and R = total number of intersections present):
Let's imagine that there is a vertical line which sweeps the data points (given line segments) from left to right. In the diagram above, it is the yellow vertical line (SW) that is swept across the plane from left to right. Let's call this vertical line, which is being swept across the plane, SWEEP LINE. Again, as already mentioned, in the above diagram vertical line SW is the sweep line. While sweeping the plane with the vertical line, the vertical line will hit all the given line segments. Each of these incidents of hitting the line segments will be considered as EVENTS. Identifying what would constitute as events is super important while designing a Sweep Line algorithm based solution. For each event we would need to take some action.
The idea is, as we sweep the vertical line and we encounter a vertical line, we look for: if we have gotten any horizontal line segment(s) so far, which:
- are located in between the y-coordinates of the the bottom-end and that of the top-end of the vertical line segment. So, in our diagram, for the vertical line segment MN, these two y-coordinates would be the y-coordinate of the M (top-end of MN)and N (bottom-end of MN).
-
have started but not ended yet. We call these horizontal lines active events.
When sweep line scans MN, the sweep line has already scanned the point E and G but has not yet reached at the points F and H, which means horizontal lines EF and GH are active because they haven't ended when sweep line scans vertical line MN.
Now, how can we keep track of horizontal lines that has started and not ended when we encounter a vertical line segment? We will call these line segments active events and at any point of time we keep track of for all active events at that time.
How do we do this? Whenever the sweep line hits the start of a horizontal line segment (like, point A of horizontal line AB, in our diagram), we push an event for that in the active events list, and when the sweep line hits the end of a horizontal line (example, point B of the horizontal line AB), we remove the corresponding event from active events. In our case, remove active event for point A from active events when sweep line hits point B.
So, whenever we encounter a vertical line segment, we iterate over the active events data structure and do a Range Query or Range Search to find if there is any active event(s) (i.e, horizontal line segments) which have y-coordinate which is in between the y-coordinate of the top end and the y-coordinate of the bottom end of the vertical lines. These are the horizontal lines with which this vertical line segment intersects. Note that the left-end and the right-end of a horizontal line have the same y-coordinate. Now just get the x-coordinate of the vertical line (since this is a vertical line, the two ends would have the same x-coordinate) and the y-coordinate of the horizontal line segment and there you got the intersection point. Take the example of MN and GH in the above diagram and see how you get get their intersection point applying the logic we just discussed.
Range Search:
In the above discussion we see that we are having to do a range search: searching for active events (horizontal line segments) which are location in between the y-coordinates of the top end and the bottom end of a vertical line. For the range search api we would be sending the y-coordinate of the top-end and bottom-end of a vertical line segment and the range search api would respond with all the horizontal line segments present in that range which are present in the activeEvents list. So let's see what would be a good data structure for keeping the list of active events. If we use an ArrayList this would be a linear time operation. We can do better by having a Binary Search Tree instead and the range query is now a logarithmic time operation which is a huge improvement. Please see the Range Search chapter to read more about Range Search and how it works.So we are sweeping the vertical line along the x-axis from left to right and it hits the line segments. How do you think we could represent the events ? The answer is, by the x-coordinates of the line segments.
X-coordinates define events.
So, in summary:
below are the different kinds of events involved in the Orthogonal Line Segment Intersection search problem we are discussing:
Event data structure should be able to keep track of:
below are the different kinds of events involved in the Orthogonal Line Segment Intersection search problem we are discussing:
- the start of a horizontal line segment
- the end of a horizontal line segment
- vertical line
- Whenever the sweep line hits the start of a horizontal line push an event for it in the active events data structure.
- Whenever the sweep line hits the end point (right-end point) of a horizontal line remove the event you put in the active events data structure when you encountered the start (left-end point) of this horizontal line.
- Whenever the sweep line hits a vertical line do a Range Search to find all the intersection this vertical line makes with any horizontal line(s) present in the activeEvents data structure. For the problem we are discussing we are using Binary Search Tree for the active events data structure, so that we can do an efficient Range Search.
Event data structure should be able to keep track of:
- the x-coordinate of the start and end of a line segment,
- the y-coordinate of the start and end of a line segment,
- whether the end is start or end of the line segment,
- whether the line segment is vertical or horizontal.
Worth noting, we would need to sort the given line segments on their x-coordinates if they are not already sorted, since the sweep line scans the given line segments from left to right. We can either use an efficient sorting algorithm like quick sort or merge sort, or we can use priority queue or min heap.
Time Complexity: O(nlogn + R)
- Having the events in sorted order will be O(nlogn).
- Inserting the events in BST when they become active will be O(nlogn), because: Insertion in BST is a O(logn) operation. We would need to do O(n) insertion operations. Overall: O(nlogn).
- Deleting all the active events when they are no more active will be O(nlogn). Because: Deletion from BST is O(logn) operation. We would need to do O(n) deletion operations. So overall O(nlogn).
-
As we will see in Range Search chapter
that a BST Range Search operation is O(logn + R) operation where R is the total number of elements
in the range.
We will have to do range search every time encounter a vertical line while doing the line sweep. On average we will have O(n) vertical lines, so we will end up doing range search O(n) times.
So it is O(nlogn) for o(n) times range search, and O(R) for a total of R intersections. Overall: O(nlogn + R).
Sweep Line implementation will have three important components:
-
Sorted Events:
You add all events (example: both beginning and end of horizontal lines) in the events list and sort them.
-
Active Events:
Depending on the problem you are solving, you add only certain events in the activeEvents list. For example: for some problems you will find yourself iterating over sorted events list and adding only the beginning of a horizontal line in the activeEvents list, and when you reach the end of aa horizontal line you do some required computation. -
Event class:
We will often find ourselves creating an Event class depending on the problem we are solving. For example: while solving problems concerning sweeping across a plane with horizontal lines, we may find ourselves creating an Event class with properties (eventValue, isStart, begin, end) where eventValue will be the coordinate value of the begin or end of the horizontal line, isStart will indicate if it is begin or end of the line, begin will coordinate value for beginning of the line and end will have coordinate value for end of the line.
Now let's try to have a good grasp on Sweep Line Algorithm by solving few simple problems before jumping on to solving some interesting use cases in next few chapters.
While looking at the below examples notice how we are using sorted events list, activeEvents list and Event class.
Problem #1: We are given a list of intervals. Remove all intervals that are contained in another interval in the list. Interval [a,b) is contained in interval [c,d) if and only if c <= a and b <= d. Example: for a given list of intervals [[2,5],[0,10]], [2,5] is contained in [0,10] and should be removed. Any overlapping intervals which are not contained are not removed, example [[0,5],[3,8]] are overlapping but not contained. After the removal, return the number of remaining intervals.
Example 1:
Input: intervals = [[0,10],[50,80],[20,100],[90,100]]
Output: 2
Explanation: Interval [50,80] is removed because it is contained in [20,100].
Solution:
Imagine that all the given intervals are plotted in a 2-D plane as horizontal lines. For two intervals having same start value can be plotted in two different y-coordinate to distinguish among themselves. For example, an interval [5,10] can be represented as a horizontal line segment from x = 5 to x = 10 at any y-coordinate.
Now we sweep a vertical line from left to right starting from x = 0. For a list of intervals [[0,9], [1,10], [2,5], [7,11], [14,16]] the plot will like below:
| | | __ [14,16] | ____ [7,11] | ___ [2,5] | _________ [1,10] |_________ [0,9] |_____________________________At every start point of a horizontal line segment we need to add an event for that horizontal line segment, and when the sweep line hits the end of this line segment, remove the event we had added.
- If none of the intervals are overlapping then at any point of time there will be only one active event. Of course no horizontal line segment will be contained in any other line segment in this case.
-
What happens for overlapping non-contained intervals? There will be more than one active events for overlapping
events but they will be removed in the order in which they were added. If they are not removed in order,
which means if the sweep line hits the end of second line segment (say, [1,10]) before reaching the first line segment
(say, [0,9]), that would mean
we encountered the start of the first line segment starting first and have not reached its end point yet,
but by now we have already reached the end of second horizontal line segment which started after the first line segment
did. This clearly means second line segment is contained in first line segment.
So for the second line segment to be overlapping with first line segment but not contained, first line segment needs to end before second line segment does. - If an interval is contained in one or more other intervals, then this interval will end before the interval(s) in which it is contained in, does.
So, at every end-point of an interval we need to check if the event
for this interval happens to be first active event in the list or not.
Login to Access Content
Problem #2: Given the availability time slots arrays slots1 and slots2 of two people and a meeting duration, return the earliest time slot that works for both of them. If there is no common availability time slot that satisfies the requirements, return an empty array. Each time slot is an array of two elements [start, end] representing an inclusive time range from start to end. No two slots of the same person intersect with each other.
Example 1:
Input: slots1 = [[2,5],[7,10],[11,12]], slots2 = [[6,10],[11,12]], duration = 2
Output: [7,9]
Example 2:
Input: slots1 = [[1,5],[6,7],[11,12]], slots2 = [[5, 8],[10,12]], duration = 3
Output: []
Self-explainatory solution using Sweep Line Technique:
Login to Access Content
Problem #3: We are given an array of intervals in the form intervals[i] = [a, b]. We need to remove the intersections between any interval in the given array and an given specific interval notAllowed. Return a sorted list of intervals after all such removals.
Example 1:
Input: intervals = [[0,5],[8,12],[15,20]], notAllowed = [10,16]
Output: [[0,5],[8,10],[16,20]]
Example 2:
Input: intervals = [[2,10]], notAllowed = [5,8]
Output: [[2,5],[8,10]]
Self-explainatory solution using Sweep Line Technique:
Login to Access Content
Optional Reading:
Template for Code Implementation:
From the discussion we had on Orthogonal Line Segment Intersection search and all other discussion so far in this chapter, we see that to solve a problem using Sweep Line Algorithm we are most likely to see the below commonalities among the implementation:- One of the most crucial part would be to identify what constitutes the Events, as we have seen while discussing Sweep Line Algorithm, in a given problem.
- Next big thing would be to identify on what attribute or parameter you are going to sort the Events.
-
So having a sorted data structures for the Events is definitely a characteristics of Sweep Line implementations.
For Orthogonal Line Segment Intersection search problem we are having a sorted data structure to keep track of the start and end of the horizontal and vertical line segments in sorted order.
In a later chapter, in Skyline problem we will have a sorted array of vertical lines called sweepLineEvents. -
Next you need to figure out what
- condition would contribute to the result(s)
- and, what triggers it
-
For the Orthogonal Line Segment Intersection search problem, discussed above,
-
condition that contributes to the result(s) is whether or not
there is one or more points found in the range of the
beginning and ending y-coordinate of a vertical line.
The range search result would contribute to the result. If we find any point while doing the range search then that point is one of the intersection points.
Let's call it deciding factor. - Encountering a vertical line is what triggers it. Note that vertical line is an event.
-
condition that contributes to the result(s) is whether or not
there is one or more points found in the range of the
beginning and ending y-coordinate of a vertical line.
-
In one of the problems discussed in a later chapter: Skyline Problem, we will see that
- Whether or not a building becomes part of the skyline depends on its height and the height of the tallest building so far. A boundary line (start line or end line) becomes part of skyline only if the building is the tallest one among all the active buildings at that point of time.
- Encountering either a start line or an end line triggers the necessity to check if we got a result. Note that both start line and end line are events.
-
So, from the above discussion you see that it is almost always an Event that triggers the scenario of checking if we have a result.
We just need to identify what kind of event triggers that if there are more than one events. In Orthogonal Line Segment Intersection search problem
there are two types of events: horizontal line, vertical line. While horizontal does not trigger it, vertical line does.
For Skyline problem we would see that we have two kinds of events: start line and end line, and both of them triggers the necessity to check if we have a result. -
You might also need to have a data structure just to check if you got a valid result or
if the condition to get a valid result is satisfied. In Orthogonal Line Segment Intersection search problem
we are having a BST to do the range search query.
In Skyline problem we have max heap to keep track of the heights, just for this purpose. -
You need to know when to add or remove the deciding factor to/from its data structure.
In Orthogonal Line Segment Intersection search problem we are adding the y-coordinate of a horizontal line when we first encounter it (i.e, when we find its start point), and remove it when the line ends.
For Skyline problem we will see that we add the height of the building we are processing when we encounter the building's start line and remove when we reach the end line of that building.
So, overall to implement a Sweep Line Solution, in most cases, we are looking at having the following data tructure
-
a sorted data structure for Events.
For Orthogonal Line Segment Intersection search problem it is sorted data array or list of horizontal and vertical events.
For Skyline problem it is a sorted array 'sweepLineEvents'. -
A data structure that would have the information that would help us deciding whether we got a valid result.
For Orthogonal Line Segment Intersection search problem it is BST to be able to query efficient range search. For Skyline problem it is Max Heap.
Related Must-Read Topics:
- 1d Range Search
- Closest Pair
- Convex Hull
- Dual Line Sweep & Rectangles Union
- 2D Intervals Union & Skyline Problem
- Overlapping 1D Intervals
- Merging Overlapping 1D Intervals
- Separating Overlapping 1D Intervals