Saturday, February 25, 2012

Matching Columns

All

I have a table which contains 4 columns each of which are NULL or contain a 6 digit code.
Here is a sample of the table content:

COL1 COL2 COL3 COL4
-- -- -- --
452359 NULL NULL 347406
NULL NULL 347406 347406
592319 NULL 347406 347406
592319 150009 347406 347406
592319 150010 347406 347406

Through out the table any number of the columns can be null.

I wish to remove rows from this table where the columns values are contained in another row i.e Row 2 above is contained within row 3. Similarly, row 3 is contained within row 4.
So, the only rows I want from the sample data above are rows 1, 4 and 5.

I hope I have explained my query adequately, and any help would be great appreciated. (Before I go mad...!)

Regards,
KatherineYou'll have to test this carefully, but I'd use something like:DELETE FROM foo
WHERE (Col1 IS NULL OR Col2 IS NULL OR Col3 IS NULL OR Col4 IS NULL)
AND EXISTS (SELECT *
FROM foo AS b
WHERE b.Col1 = Coalesce(foo.Col1, b.Col1)
AND b.Col2 = Coalesce(foo.Col2, b.Col2)
AND b.Col3 = Coalesce(foo.Col3, b.Col3)
AND b.Col4 = Coalesce(foo.Col4, b.Col4))-PatP|||pat: sorry: :)
insert...
select 123, null, null, null
union
select null, 123, null, null
union
select 123, 123, null, null

gives:
col1 col2 col3 col4
---- ---- ---- ----
NULL 123 NULL NULL
123 NULL NULL NULL
123 123 NULL NULL

Turns out it's a nice challenge, at least for me...|||I know there *has* to be a better way to do this... but here goes.

create table test (a int null, b int null, c int null, d int null)

insert into test(a,b,c,d)
select 123, null, null, null
union
select null, 123, null, null
union
select 123, 123, null, null

delete from t1
from test t1
inner join test t2
on (t1.a is null or t1.a=t2.a)
and (t1.b is null or t1.b=t2.b)
and (t1.c is null or t1.c=t2.c)
and (t1.d is null or t1.d=t2.d)
where
(
((t1.a!=t2.a and (t1.a is not null and t2.a is not null)) or (t1.a is null and t2.a is not null) or (t1.a is not null and t2.a is null))
or ((t1.b!=t2.b and (t1.b is not null and t2.b is not null)) or (t1.b is null and t2.b is not null) or (t1.b is not null and t2.b is null))
or ((t1.c!=t2.c and (t1.c is not null and t2.c is not null)) or (t1.c is null and t2.c is not null) or (t1.c is not null and t2.c is null))
or ((t1.d!=t2.d and (t1.d is not null and t2.d is not null)) or (t1.d is null and t2.d is not null) or (t1.d is not null and t2.d is null))
)

You'd have to run a dedup after this if you've got duplicate rows.
That ugly WHERE is just killin' me, but dealing with the NULLs is a real PITA.|||I tried it by creating a new table, inserting those with the least nulls then winding up to the most nulls skipping those with matching values. The way I have it now is a real pain and doesn't work properly.. so...|||Like a lot of tricky SQL problems, the issue here is not the question you are asking, but the fact that your schema is not normalized. Fix it, if at all possible, or you are sure to run into more problems in the future.

This method solves your problem but normalizing your data into a temporary table, and then searching the temporary table for record groups that are a subset of other record groups. It presupposes that you have a unique primary key in your original table.

DECLARE @.TEMPNORMAL TABLE
(PKEY INT,
COLVALUE VARCHAR(6))

DECLARE @.PKEYITEMCOUNT TABLE
(PKEY INT,
ITEMCOUNT INT)

INSERT INTO @.TEMPNORMAL
(PKEY,
COLVALUE)
SELECT DISTINCT
PKEY,
COLVALUE
FROM
(Select PKEY, COL1 as COLVALUE from YOURTABLE
UNION
Select PKEY, COL2 as COLVALUE from YOURTABLE
UNION
Select PKEY, COL3 as COLVALUE from YOURTABLE
UNION
Select PKEY, COL4 as COLVALUE from YOURTABLE) NORMALTRANSFORM

INSERT INTO @.PKEYITEMCOUNT
(PKEY,
ITEMCOUNT)
SELECT PKEY,
COUNT(*)
FROM @.TEMPNORMAL
GROUP BY PKEY

DELETE
FROM YOURTABLE
INNER JOIN
(SELECT PKEY
FROM @.PKEYITEMCOUNT PKEYITEMCOUNT1
INNER JOIN
(SELECT PKEY1,
PKEY2,
COUNT(*) MATCHCOUNT
FROM (SELECT DISTINCT
TEMPNORMAL1.PKEY PKEY1,
TEMPNORMAL2.PKEY PKEY2,
TEMPNORMAL1.COLVALUE
FROM @.TEMPNORMAL TEMPNORMAL1
INNER JOIN @.TEMPNORMAL TEMPNORMAL2
ON TEMPNORMAL1.COLVALUE = TEMPNORMAL2.COLVALUE
AND TEMPNORMAL1.PKEY <> TEMPNORMAL2.PKEY) MATCHQUERY
GROUP BY PKEY1,
PKEY2) PKEYMATCHES
ON PKEYITEMCOUNT.PKEY = PKEYMATCHES.PKEY
AND PKEYITEMCOUNT.ITEMCOUNT = PKEYMATCHES.MATCHCOUNT
INNER JOIN @.PKEYITEMCOUNT PKEYITEMCOUNT2
ON PKEYMATCHES.PKEY2 = PKEYITEMCOUNT2.PKEY
AND PKEYITEMCOUNT1.ITEMCOUNT < PKEYITEMCOUNT2.ITEMCOUNT) SUPERFLUOUSRECORDS
ON YOURTABLE.PKEY = SUPERFLUOUSRECORDS.PKEY

blindman|||Oops...insert statement should read like this:

INSERT INTO @.TEMPNORMAL
(PKEY,
COLVALUE)
SELECT DISTINCT
PKEY,
COLVALUE
FROM
(Select PKEY, COL1 as COLVALUE from YOURTABLE where COL1 is not null
UNION
Select PKEY, COL2 as COLVALUE from YOURTABLE where COL1 is not null
UNION
Select PKEY, COL3 as COLVALUE from YOURTABLE where COL1 is not null
UNION
Select PKEY, COL4 as COLVALUE from YOURTABLE where COL1 is not null) NORMALTRANSFORM

No comments:

Post a Comment