[support] [development] SQL Query Doubt

Ivan Sergio Borgonovo mail at webthatworks.it
Sat Apr 16 22:37:35 UTC 2011


On Sat, 16 Apr 2011 13:21:24 -0700 (PDT)
nan wich <nan_wich at bellsouth.net> wrote:

> <:avglsdata id=avglsdata> 
> Please, this should not be in development; it belongs in support.
> 
> Actually, Ivan's query won't work because each skill is on a

It happens it worked with real data on a real DB (no pun on
MySQL)
The exists method works as well but there were some typo
in David's example.
I'd guess the join method should be faster on most DB engines but I
didn't take the time to actually test it on more than one DB
(postgres) on more than a tens of rows.
On a slower box remotely connected the exists, 2 condition version
was 10% slower (order of magnitude for complete execution 200ms).
On a faster box on my lan the join and exists versions with 3
conditions were indistinguishable (12ms).

The test was so ridiculous I could just say both version works.
I did test without the index (auto created by the pk).
With such a small data set it changes the query plan but it doesn't
change the timing.

> separate row. Your query, AFAICT, is the only way the way to
> actually find multiple skill combinations. It also has the
> advantage of scaling to handle even more skills. But let's make it
> truly Drupalized:

Automatically generating a query for more than 2 skills with the
JOIN method may look harder, but not that much (you've to name
differently the tables aliases, and add some additional logic to
write the correct join).
If he has to work with a very large data set it would be advisable
to really test which method runs really faster on *his* DB.
I'd suspect that unless he is working on a very slow connection PHP
overhead and difference in verbosity of the 2 queries won't make the
difference.

create table test.employee (eid int, skill varchar (15), primary key
(eid, skill));

insert into test.employee values(1, 'PHP');
insert into test.employee values(1, 'SQL');
insert into test.employee values(1, 'Java');
insert into test.employee values(1, 'VoIP');
insert into test.employee values(2, 'Shell Scripting');
insert into test.employee values(2, 'PHP');
insert into test.employee values(2, 'Java');
insert into test.employee values(2, 'SQL');
insert into test.employee values(3, 'C++');
insert into test.employee values(3, 'PHP');
insert into test.employee values(3, 'Java');
insert into test.employee values(3, 'Pearl');

select t1.eid from test.employee t1
  join test.employee t2 on t1.eid=t2.eid
  join test.employee t3 on t2.eid=t3.eid
  where t1.skill='Java' and t2.skill='PHP' and t3.skill='SQL'
  group by t1.eid
;

select t1.eid from test.employee t1
  join test.employee t2 on t1.eid=t2.eid and t2.skill='PHP'
  join test.employee t3 on t2.eid=t3.eid and t3.skill='Java'
  where t1.skill='SQL'
  group by t1.eid
;

select eid from test.employee e where
  exists (select 1 from test.employee s where s.eid=e.eid and
s.skill = 'Java') and
  exists(select 1 from test.employee s where s.eid=e.eid and
s.skill='PHP') and
  exists(select 1 from test.employee s where s.eid=e.eid and
s.skill='SQL')
group by eid
;

BTW the group by in this case is a synonym of distinct and
anticipating conditions in the join has no effect on the query plan
of a reasonably smart DB.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it



More information about the support mailing list