Neeraj
Member • Aug 2, 2013
Oracle SQL: Union vs Union All? Union All wins it.
Hey all,
I am going to discuss one of the most asked interview questions on Oracle SQl and a basic one which many freshers are unaware of. The question is:
What is the difference between Union and Union All?
The best way to answer this question is going from the basic definitions to a practical example. It impresses the interviewer a lot. To start with, Union operator is just used to perform a union operation (similar to Mathematics) on columns of two or more tables provided the the columns in each section of the union operation is same. In other words, a union operation clubs the common values as one and also returns all the unique values.
On the other hand, Union All just clubs all the values one after another without worrying about the common values i.e. it just piles up the data one after the another. Its quite understood now that union all will be faster than union as union all has no intelligence of finding the common values and overlapping them.
As a common example:
select count(1) from
(
select job_id from employees
union
select job_id from jobs
);
count(1)
---------
19
select count(1) from
(
select job_id from employees
union all
select job_id from jobs
);
count(1)
---------
126
Now we can see quite a difference between the two outputs as union all involves no intelligence. Then, what is the use of union all being fast if it can't solve my requirement? Here is how we use it to replace union.
Union All as a substitute to Union
We can tweak union all to work like union and in a much faster way. This can be done by using a group by with union all. Modifying the same query given above as follows:
select count(1) from
(
select job_id from(select job_id from employees
union all
select job_id from jobs)
group by job_id
);
count(1)
---------
19
returns the same result as is the case with a union operation.
How is Union All + Group By faster than Union?
Here is a performance analysis Union when compared to Union All + Group by
For Union operation to perform the intelligence of finding and overlapping the common values, two intermediate steps of finding the unique values among the lot and sorting the data is required which is not the case with union all as it just heaps the data and the final group by clubs the common data.
Following example will make my statement crystal clear. Consider the following queries
select job_id from employees
union
select job_id from jobs;
The explain plan for the above query is
As seen from the explain plan, we have an intermediate SORT UNIQUE operation just as I stated above. Also have a look at the number of Bytes, cost and CPU % utilization.
When I try to get the same functionality by using Union All,
select job_id from(select job_id from employees
union all
select job_id from jobs)
group by job_id;
the explain plan is
When you look at this plan you can see that we have got rid of the intermediate sort and unique operation. Also, have a look at the Rows processed, Bytes processed and Cost of CPU and CPU% Utilization. It is enough to prove that Union All with group by works much better than Union operation.
One, can argue that the Time column shows same for both plans. Well, I am running these queries on tables having very less rows so the performance improvement will be of some milliseconds but when you query on real world production systems where you store terabytes of data, this trick proves to be very very handy.
Do let me know in case you need clarifications with any of the points stated above or you want to add someone to this.
Hope it helps you understand the difference. This is the kind of answer that will prompt the interviewer to get up and hug you. 😛
I am going to discuss one of the most asked interview questions on Oracle SQl and a basic one which many freshers are unaware of. The question is:
What is the difference between Union and Union All?
The best way to answer this question is going from the basic definitions to a practical example. It impresses the interviewer a lot. To start with, Union operator is just used to perform a union operation (similar to Mathematics) on columns of two or more tables provided the the columns in each section of the union operation is same. In other words, a union operation clubs the common values as one and also returns all the unique values.
On the other hand, Union All just clubs all the values one after another without worrying about the common values i.e. it just piles up the data one after the another. Its quite understood now that union all will be faster than union as union all has no intelligence of finding the common values and overlapping them.
As a common example:
select count(1) from
(
select job_id from employees
union
select job_id from jobs
);
count(1)
---------
19
select count(1) from
(
select job_id from employees
union all
select job_id from jobs
);
count(1)
---------
126
Now we can see quite a difference between the two outputs as union all involves no intelligence. Then, what is the use of union all being fast if it can't solve my requirement? Here is how we use it to replace union.
Union All as a substitute to Union
We can tweak union all to work like union and in a much faster way. This can be done by using a group by with union all. Modifying the same query given above as follows:
select count(1) from
(
select job_id from(select job_id from employees
union all
select job_id from jobs)
group by job_id
);
count(1)
---------
19
returns the same result as is the case with a union operation.
How is Union All + Group By faster than Union?
Here is a performance analysis Union when compared to Union All + Group by
For Union operation to perform the intelligence of finding and overlapping the common values, two intermediate steps of finding the unique values among the lot and sorting the data is required which is not the case with union all as it just heaps the data and the final group by clubs the common data.
Following example will make my statement crystal clear. Consider the following queries
select job_id from employees
union
select job_id from jobs;
The explain plan for the above query is
As seen from the explain plan, we have an intermediate SORT UNIQUE operation just as I stated above. Also have a look at the number of Bytes, cost and CPU % utilization.
When I try to get the same functionality by using Union All,
select job_id from(select job_id from employees
union all
select job_id from jobs)
group by job_id;
the explain plan is
When you look at this plan you can see that we have got rid of the intermediate sort and unique operation. Also, have a look at the Rows processed, Bytes processed and Cost of CPU and CPU% Utilization. It is enough to prove that Union All with group by works much better than Union operation.
One, can argue that the Time column shows same for both plans. Well, I am running these queries on tables having very less rows so the performance improvement will be of some milliseconds but when you query on real world production systems where you store terabytes of data, this trick proves to be very very handy.
Do let me know in case you need clarifications with any of the points stated above or you want to add someone to this.
Hope it helps you understand the difference. This is the kind of answer that will prompt the interviewer to get up and hug you. 😛