NVARCHAR - VARCHAR conversions
This topic has been discussed heavily in a previous question on DBA.SE
Some parts of the answers on that question:
which characters can be stored in an 8-bit / non-Unicode encoding
depends on the code page, which is determined by the Collation.
&
VARCHAR can no longer be referred to as "non-Unicode". So, starting
with the first public beta of SQL Server 2019 in September 2018, we
should refer to VARCHAR as an "8-bit datatype", even when speaking in
terms of versions prior to SQL Server 2019. This terminology holds
true for all 4 types of encodings that can be used with VARCHAR:
Extended ASCII Double-Byte Character Sets (DBCS) EBCDIC UTF-8
(Unicode)
Solomon Rutzky
From the comments, I agree "Extended ASCII" is really bad term that
actually means a code page that maps characters/code points in the
128-255 range, beyond the standard 0-127 code point range defined by
ASCII.
Dan Guzman
Depending on your collation, some characters will be able to be stored in varchar / 8-bit datatype fields, some not.
My values do not transform to '?' when casting to varchar, your fiddle
uses sql 2017 could that be the reason?
This can depend on collation, and should be due to converting of certain characters
I am using SQL Server 2017, collation Latin1_General_CI_AS
On my test, the Ƒ character was changed to ƒ instead of put as ? when converting nvarchar --> varchar.
Other than that, there are no different characters when comparing both, on my collation / version.
An example:
SELECT CAST([name] AS VARCHAR(1000)), [name]
FROM dbo.Bla
WHERE [name]
NOT LIKE '%'+CAST([name] AS VARCHAR(1000))+'%';
With the small difference on the F's.
(No column name) name
ORRISON CPA LIMITED E¦¬Ç¦®EŒÆŒƒÈ¨ˆÅ¸«Äº‹Å‹™Æ‰€ÆŒ‰E™Å…¬Å¸ ORRISON CPA LIMITED E¦¬Ç¦®EŒÆŒƑȨˆÅ¸«Äº‹Å‹™Æ‰€ÆŒ‰E™Å…¬Å¸
You could compare the text here
Which makes the solution below not useful for you.
Some workarounds on values that fail to convert would be to
- Cast the values as
varchar(1000), and then remove the ? after validating that no ?'s exist.
- Only update name values that do not have regular
?'s.
- Replacing the
? to a value that cannot exist in your data and then
replacing it back.
On the premise that the unicode is changed to a ? when casting them to varchar
#1 Cast the values as varchar(1000), and then remove the ?.
SELECT [name] FROM [dbo].[Table] WHERE [name] like '%?%';
Update if none found
UPDATE [dbo].[Table] SET [name] = REPLACE(CAST([name] AS VARCHAR(1000)),'?','');
#2 Only update name values that do not have regular ?'s.
UPDATE [dbo].[Table]
SET [name] = REPLACE(CAST([name] AS VARCHAR(1000)),'?','')
WHERE [name] NOT LIKE '%?%';
DB<>Fiddle
#3 Replacing the ? to a value that cannot exist in your data and then
replacing it back.
With you guessed it, another REPLACE.
UPDATE [dbo].[Table]
SET [name] = REPLACE(REPLACE(CAST(REPLACE([name],'?','THISVALUECANNOTOCCURINMYDATA') AS VARCHAR(1000)),'?',''),'THISVALUECANNOTOCCURINMYDATA','?')
ReplaceCeption.
DB<>Fiddle
If there are no other, more foolproof, get all unicode values and replace these workarounds. I would use #3 just to be sure that no original ?s are replaced.