12

I have this small CLR that does a RegEX function on a string in columns.

When running on SQL Server 2014 (12.0.2000) on Windows Server 2012R2 the process crashes with

Msg 0, Level 11, State 0, Line 0 A severe error occurred on the current command. The results, if any, should be discarded.

and gives a stack dump if I do

select count (*) from table where (CLRREGEX,'Regex')

but when I do

select * from table where (CLRREGEX,'Regex') 

it returns the rows.

Works perfectly on same SQL Server build running on Windows 8.1 .

Any ideas?

-- Edit It is as simple as it can be

using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;           //SqlString, SqlInt32, SqlBoolean
using System.Text.RegularExpressions; //Match, Regex
using Microsoft.SqlServer.Server;     //SqlFunctionAttribute
public partial class UserDefinedFunctions
{
    public static readonly RegexOptions Options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline;
    [SqlFunction]
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = true)]
    public static SqlBoolean RegExMatch(SqlString input, SqlString pattern)
    {
        if (input.IsNull || pattern.IsNull) //nulls dont qualify for a match
            return SqlBoolean.False;
    return Regex.IsMatch(input.Value, pattern.Value, RegexOptions.IgnoreCase);
    }
}

So by little changes this works now: Main lesson in C# seems to be the same as in TSQL beware of implicit data conversion.

using System;
using System.Text;
using System.Data.SqlTypes;           //SqlString, SqlInt32, SqlBoolean
using System.Text.RegularExpressions; //Match, Regex
using Microsoft.SqlServer.Server;     //SqlFunctionAttribute
public partial class UserDefinedFunctions
{
public static readonly RegexOptions Options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.CultureInvariant;

    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.Read)]
    public static SqlBoolean RegExMatch(SqlString input, SqlString pattern)
{
    if (input.IsNull || pattern.IsNull) //nulls dont qualify for a match
        return SqlBoolean.False;
    string sqldata = input.ToString();
    string regex = pattern.ToString();
    return Regex.IsMatch(sqldata, regex);
 }
Spörri
  • 4,309
  • 11
  • 28
  • Does this happen for all patterns or just this one? It could be an inefficient pattern (i.e. excessive [backtracking](https://msdn.microsoft.com/en-us/library/dsy130b4.aspx) or unnecessary Captures). You should look into setting the [MatchTimeout](https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.matchtimeout.aspx) property (new in .NET Framework 4.5). Did you code the RegEx function yourself? If so, are you using static or instance RegEx methods? Is the `SqlFunction` method marked as `IsDeterministic=true`? Is the assembly marked as `SAFE`? – Solomon Rutzky Apr 13 '15 at 20:21
  • This happens for all patterns, even simple ones like [0-9].* The CLR is compiled for 2.0 and we have tried to go to 3.0 and 3.5 with the same results. The RegEx is static, the function is deterministic and the assembly safe. I have used this on multiple servers/platforms and it will only show this behaviour on certain tables/columns on SQL2014 on Win2012R2 – Spörri Apr 14 '15 at 09:13
  • 2
    The code looks fine, except for the duplicate `[SqlFunction]` attribute. Is that the exact code? I don't think that would compile. The Framework version 2.0 / 3.0 / 3.5 distinction is a non-issue as you are using 4.0 / 4.5 / 4.5.x / etc or whatever is on that server since you are on SQL Server 2014 which is bound to CLR version 4. Is the server showing the problem 32-bit? How much memory does it have compared to the other servers? And have you checked the SQL Server logs just after getting that error? – Solomon Rutzky Apr 14 '15 at 17:04
  • 2
    How big are these tables? Also, could you check if the estimated plan for the problem statements has a parallel operator? If yes, could you check if the issue occurs without parallelism i.e. with a MAXDOP = 1 hint. – Amit Banerjee Apr 14 '15 at 11:18
  • The code is not exact there are other functions which I have removed but this one shows the symptoms. I dont have access to the server to check the exact .net version there but the servers are well provisioned. I'll investigate further when the server admin stops trout fishing for his birthday. What if troubling me is that this happens in a database which I have used the function on in a different version of SQL server and only on some colums the one i'm investigating is varchar (20) with only max 5 chars in each row, I'll keep investigating. – Spörri Apr 15 '15 at 01:06
  • 2
    The exact version of .NET is not related to the problem, though it would be nice to know if all of the servers are on at least 4.5 since that would mean you can use the new `MatchTimeout` property. But I don't think that is really the issue either if you are only passing in 5 chars max. It _is_ possible that this one machine has a corrupted install of the .NET Framework, and that can be repaired once trout fishing activities have ceased ;-). Also, `[0-9].*` is simple but also inefficient since it matches all chars, if any, after the first digit; using just `[0-9]` for an `IsMatch` is better. – Solomon Rutzky Apr 15 '15 at 02:36
  • Thanks for all the comments - This now seems to work after some work - Question updated. – Spörri May 29 '15 at 16:08
  • 1
    Why did you change `DataAccessKind` to `Read`? That just slows it down and you aren't doing any data access. Also, I do realize that it seems to be working now, but I would be cautious with using the `ToString()` method as opposed to the `Value` property as I don't think ToString handles encodings properly, or something like that. What is your databases collation set to? Of course, I just re-read one of your comments above and see that the column is VARCHAR instead of NVARCHAR. Does that field have a different collation than the database? – Solomon Rutzky May 29 '15 at 18:35
  • Along with the questions I just asked in the comment above, I am curious what your language settings are between the systems, both at the OS / Windows level and in SQL Server. I suspect that on the Windows Server 2012R2 machine (where it was crashing), there is a different language set between the OS and SQL Server. – Solomon Rutzky May 29 '15 at 18:47
  • @struzky that's an interesting thought, I will disable data access, the servers are in English and I'll have to check with the server admin the exact language settings. The database and the engine is in icelandic_ci_as which is a latin1 subset – Spörri May 29 '15 at 19:00
  • Ok, so you mentioned the SQL Server server-level and database-level collations, but what is the "Language" setting of the server, and what is the collation of the column? I have a feeling that the LCIDs between the OS and SQL Server do not match. ALSO, I just noticed that you have a static variable for RegexOptions but are not using it, is that correct? I ask because it has the `CultureInvariant` option in it that could play a role IF that option variable is being used, but it does not appear to be in the call to `Regex.IsMatch`. – Solomon Rutzky May 29 '15 at 19:17

2 Answers2

4

The problem is a locale conflict between the Windows OS and SQL Server (specifically the database where the Assembly is loaded). You can run the following query to see what they are both set to:

SELECT os_language_version,
       DATABASEPROPERTYEX(N'{name of DB where Assembly exists}', 'LCID') AS 'DatabaseLCID'
FROM   sys.dm_os_windows_info;

If they are different then you can definitely get some "odd" behavior, such as what you are seeing. The issue is that:

  • SqlString includes more than just the text itself: it includes the default collation of the database in which the assembly exists. The collation is comprised of two pieces of information: the locale info (i.e. LCID), and the comparison options (i.e. SqlCompareOptions) which detail the sensitivity to case, accents, kana, width, or everything (binary and binary2).
  • String operations in .NET, unless explicitly given a locale, use the locale info of the current thread, which is set in Windows (i.e. the Operating System / OS).

The conflict usually occurs when referencing a SqlString parameter without using .Value or .ToString() such that it does an implicit conversion to SqlString. In that case it would cause an exception saying that the LCIDs do not match.

There are apparently other scenarios, such as performing (some / all?) string comparisons, including when using Regex as this case shows (though so far I have not been able to reproduce this).

Some ideas for fixes:

Ideal (expectations will always be met regarding how the comparisons work):

  • Change either the Windows or SQL Server LCID (default language) so that both match

Less than ideal (the behavior of the Windows locale might not be the same rules for equality and sorting and so there could be unexpected results):

  • Use the .ToString method or .Value property, which both return the string without the SQL Server LCID so the operations will all be using the OS LCID.

Might help:

  • Maybe use SqlChars instead of SqlString as it does not bring along the LCID and collation info from SQL Server
  • Specify that culture doesn't matter via StringComparison.InvariantCulture:
    • String.Compare(string, string, StringComparison.InvariantCulture) or String.Compare(string, string, StringComparison.InvariantCultureIgnoreCase)
    • For Regex, specify RegexOptions.CultureInvariant
Solomon Rutzky
  • 65,256
  • 6
  • 135
  • 267
1

Updated..

The localization is different between the SQL Engine and the window Server as @srutzky points out:

os_language_version SqlServerLCID
1033                1039

The following change to the code - setting the option RegexOptions.CultureInvariant gets around the error. The unchanged code will not crash SQL Server 2012 on Windows Server 2012R2 with the same language settings but does so on SQL Server 2014.

using System;
using System.Text;
using System.Data.SqlTypes;           //SqlString, SqlInt32, SqlBoolean
using System.Text.RegularExpressions; //Match, Regex
using Microsoft.SqlServer.Server;     //SqlFunctionAttribute
public partial class UserDefinedFunctions
{
public static readonly RegexOptions Options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.CultureInvariant;

    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = true)]
    public static SqlBoolean RegExMatch(SqlString input, SqlString pattern)
{
    if (input.IsNull || pattern.IsNull) //nulls dont qualify for a match
        return SqlBoolean.False;
    string sqldata = input.ToString();
    string regex = pattern.ToString();
    return Regex.IsMatch(sqldata, regex);
 }
Spörri
  • 4,309
  • 11
  • 28
  • Can you please run the following on the server that was crashing: `SELECT os_language_version, SERVERPROPERTY('LCID') AS 'SqlServerLCID' FROM sys.dm_os_windows_info;`. It is quite possible that the issue was a conflict in language settings. Your solution might still be the best way to go, but generally there shouldn't be a need to use `ToString()` instead of the `Value` property on `SqlString`s. So it would just be nice to confirm the situation. – Solomon Rutzky Jun 08 '15 at 20:52
  • I posted an answer to clarify, but the issue _shouldn't_ be solved by setting `RegexOptions.CultureInvariant` since you don't pass the `Options` variable into `Regex.IsMatch(sqldata, regex)`. The thing that changed between your original code and the new, working code is you went from using `SqlString.Value` to `SqlString.ToString()`. I suspect you would see the same fixed behavior if you switched to using `SqlChars`. But I would just do that as a test. The best approach is to change the LCID of either Windows or SQL Server to match the other. You can also remove the Options static variable. – Solomon Rutzky Jun 10 '15 at 18:04
  • Hi there. Thanks for accepting my answer :). Just to mention, I did further research and, if understood what I was seeing, then while I am correct about the root cause being a different LCID between the OS and SQL Server, it is not, or should not be, related to the `.Value` property of a `SqlString` as that apparently returns the same internal value as the `.ToString()` method. I am still investigating and will update my answer with whatever I find :). – Solomon Rutzky Sep 29 '15 at 16:30
  • I adjusted my answer in light of new info. I can't reproduce this scenario. Is the code in the Question really what you were/are using? The only real difference between them is that the one that errors uses `RegexOptions.IgnoreCase` while the other does not. I have set up a similar environment: Windows (8.0) using LCID of 1033, SQL Server DB has LCID of 1039, using the same RegEx that you posted, doing a `COUNT(*)` on a `VARCHAR` field filled with GUIDs, using a pattern of `'[0-3â].*'`, on a table with 10 million rows. It is SQL Server 2012, not 2014, though I don't think that should matter. – Solomon Rutzky Sep 30 '15 at 04:52
  • 1
    Thanks for all the answers. The code in the question is what I was using. I had a really complicated regex but managed to crash this using a very simple one. Changing the RegexOptions.CultureInvariant settings stopped the behaviour – Spörri Sep 30 '15 at 21:22
  • Thanks for that info. What I don't get is, `RegexOptions.CultureInvariant` isn't being used in either code sample. There is a static variable declared to hold it (and some other), but in neither code block is it passed into the `IsMatch` method. In a comment on the question you mention this is a "varchar(20) with only max 5 chars in each row". How many rows was it? Did the error happen on just a single row or only if you queried the whole table? What types of characters are in the field? I really want to reproduce this if at all possible :-) – Solomon Rutzky Sep 30 '15 at 21:36