Update fields that contains certain prefix by lookup table

Question

I have a temporary table #NAMEOLD_NAMENEW which contains a column NAMEOLD and a column NAMENEW where every row contains the old and new value of a value to be updated. This table consists of roughly 2.000 rows. I created a clustered index on NAMEOLD.

The temporary table will be used to update certain fields in the database. I know which columns in which tables need to be updated (roughly 25 columns over 20 tables).

Now, the fields that need to be updated contain the string DOMAIN\ as a prefix and after that a NAMEOLD value of the temporary table.

What I want to do is: use the temporary table to replace the full value of the string DOMAIN\ plus the string after that with the appropriate value in the #NAMEOLD_NAMENEW table.

Example:

Example data in #NAMEOLD_NAMENEW:

NAME_OLD NAMENEW
user1 user5
user2 user6

Example data in TABLE1:

ColumnWithName1
DOMAIN\user1
DOMAIN\user2

Now running the query should update TABLE1 to:

ColumnWithName1
user5
user6

I think that scanning the full table multiple times will not perform well enough (SELECT on certain tables already takes minutes).

What I thought of is typing out all the columns that need to be updated, and then updating them with a regex with a capture group. Something like this (non-working code):

UPDATE [dbo].[Table1]
SET [ColumnWithName1] = (SELECT [NAMENEW] from new_old WHERE [NAMEOLD] = [/1]) -- first CAPTUREGROUP??
FROM #NAMEOLD_NAMENEW new_old
WHERE [ColumnWithName1] like '%DOMAIN\(.*)'

Here the capture group does not work since it does not seem to exist in T-SQL. Furthermore, using the nested SELECT is I believe a bad practice for some reason.

Is it possible to do something like this in T-sql? Or is there perhaps even a better approach?

Do all the values that need update start with `DOMAIN\\ `? It would help if you posted some sample rows with actual data, of both a table (to be updated) and of the temp table. — ypercubeᵀᴹ, Dec 29 '15 at 11:16
Yes. They all start with `DOMAIN\ `. I think I have an entrance using `fn_Split`. I will add some example data in the question. — Erwin Rooijakkers, Dec 29 '15 at 11:19
Then Aaron's answer is OK. Just change the `ON` to: `ON t.ColumnWithName1 = 'DOMAIN\\' + new_old.NAMEOLD` and the `SET` to `SET t.ColumnWithName1 = new_old.NAMENEW` (and remove the `WHERE` completely). — ypercubeᵀᴹ, Dec 29 '15 at 11:25
@ErwinRooijakkers Your description of the desired outcome does not match the example output. You said twice that you want the string _after_ `DOMAIN\ ` to be substituted with the new value, which means keeping the `DOMAIN\ ` prefix in the resulting string. Yet the example output has the `DOMAIN\ ` prefix removed. Which way do you really want? — Solomon Rutzky, Dec 29 '15 at 11:37
@srutzky You're right. I have not specified the question clearly. I want the prefix to be removed. Sorry! — Erwin Rooijakkers, Dec 29 '15 at 11:39

score 5 · Answer 1 · edited Dec 29 '15 at 11:56

5

Typically the pattern would be something like this:

UPDATE t
  SET t.ColumnWithName1 = 'DOMAIN\\' + new_old.NAMENEW
  FROM dbo.Table1 AS t
  INNER JOIN #NAMEOLD_NAMENEW AS new_old
  ON t.ColumnWithName1 = new_old.NAMEOLD
  WHERE <some filter on a pattern>;

But your question doesn't really explain how to identify the specific columns that match. Is [/1] a real string value, a pattern, or something else? Is \(.*) supposed to be a regular expression (which doesn't exist in native T-SQL)? Is the User column a typo or do you have multiple columns that have this domain prefix?

edited Dec 29 '15 at 11:56

ypercubeᵀᴹ

92,106
13
189
284

answered Dec 28 '15 at 17:08

Aaron Bertrand

175,645
25
380
589

Thanks for your `INNER JOIN` example, it is cleaner. It should be `#NAMEOLD_NAMENEW ON t` I believe. The `User` column was a typo, I fixed it. The `[1]` was meant to be a [backreference](http://www.regular-expressions.info/backref.html) (use the captured group in the WHERE clause). The problem I have is that I need to find the value that is captured in the `WHERE` clause (after the identifier) as the `NAMEOLD` in the `#NAMEOLD_NAMENEW` table and substitute it with `NAMENEW`. Your pattern does not accomplish this. But since you say a regex does not exist in T-SQL it is perhaps not possible. – Erwin Rooijakkers Dec 29 '15 at 10:56

Solomon Rutzky · Accepted Answer · 2015-12-29T12:30:21.643

3

If the prefix is always DOMAIN\ then you need neither RegEx nor a string splitter. You just need to use SUBSTRING as follows:

UPDATE tbl 
SET    tbl.[ColumnWithName1] = new_old.[NAMENEW] -- to keep the prefix, add "'DOMAIN\' + "
FROM   [dbo].[Table1] tbl
INNER JOIN #NAMEOLD_NAMENEW new_old
        ON new_old.[NAMEOLD] = SUBSTRING(tbl.[ColumnWithName1], 8, 1000);

Just be sure to set the 3rd parameter of the SUBSTRING to be the longest max length of all of the VARCHAR fields being updated (which should already be the length of the NAMENEW field anyway).

Better yet: You could probably improve the efficiency of the operation and reduce the complexity at the same time by approaching this from the opposite direction. Meaning, rather than trying to deal with removing the prefix from the field being updated, simply add in that prefix on the other side. This is similar to what @YperSillyCubeᵀᴹ suggested in a comment on the Question, but given that this is just a temporary table and it is not millions of rows, just prefix all of the values in the NAMEOLD field such that the JOIN does not have any expressions in it at all: it is just a straight equality operation:

-- one-time update
UPDATE tmp
SET    tmp.[NAMEOLD] = 'DOMAIN\' + tmp.[NAMEOLD];


UPDATE tbl 
SET    tbl.[ColumnWithName1] = new_old.[NAMENEW] -- to keep the prefix, add "'DOMAIN\' + "
FROM   [dbo].[Table1] tbl
INNER JOIN #NAMEOLD_NAMENEW new_old
        ON new_old.[NAMEOLD] = tbl.[ColumnWithName1];

^{Please note that the "pretty-print" formatting for SQL does not handle the \' sequence of characters correctly when formatting a string literal, and does not recognize it as being the end of the string literal, hence the red continues beyond where it should.}

edited Dec 29 '15 at 12:30

answered Dec 29 '15 at 11:25

Solomon Rutzky

65,256
6
135
267

Check this out, solves everything regarding COLLATION: http://stackoverflow.com/questions/1607560/cannot-resolve-the-collation-conflict-between-sql-latin1-general-cp1-ci-as-and#1607725 So the `ON` clause can read: `ON t.[ColumnWithName1] COLLATE DATABASE_DEFAULT = new_old.[NAMEOLD] COLLATE DATABASE_DEFAULT` – Erwin Rooijakkers Dec 29 '15 at 12:02
1

@ErwinRooijakkers Yikes. Um, no, I would strongly recommend _against_ using `DATABASE_DEFAULT` on both sides. You should never change both sides unless you have a darn good reason _and_ fully understand the ramifications. That answer _looks_ great, and appears to work in many cases where the two collations are similar enough. However, it is actually a bit irresponsible and could easily lead to "odd" behavior that is hard to debug. But I do appreciate you pointing it out to me as I can now add it to me growing list of topics I am writing about related to collations and Unicode :-). – Solomon Rutzky Dec 29 '15 at 12:09
@ErwinRooijakkers I have not written the article yet ;-). But collations do many things. And doing equality comparisons will have less opportunity to show "odd" behavior, but which characters equate to which other characters, how certain combinations of characters equate to other combinations (e.g. e + accent vs e with the accent already), which order the characters are in, which order lower-case vs upper-case are in, etc, are all determined by collation. The `DATABASE_DEFAULT` value could easily be different than the collation used on both sides of the comparison. – Solomon Rutzky Dec 29 '15 at 12:27
Okay @srutzky please post the link to the article here if you remember this request when it is finished. – Erwin Rooijakkers Dec 29 '15 at 13:32
@ErwinRooijakkers Will do. I have made a note of it :-) – Solomon Rutzky Dec 29 '15 at 13:34

Update fields that contains certain prefix by lookup table

2 Answers2