7

I'm trying to join a few rows from a remote view to a local table. The view has about 300 million rows, so I want to use the REMOTE query hint so that all 3 million rows don't have to be transferred to my computer.

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
    SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
    ) R
ON L.ID = R.L_ID

This returns 100 rows, as I expected, and takes basically no time, as I expected.

However,

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER REMOTE JOIN (
    SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
    ) R
ON L.ID = R.L_ID

starts to return thousands of rows. I quit it after a few seconds, but it was in the tens - hundreds of thousands.

How could a query hint change my result set?

paparazzo
  • 4,957
  • 1
  • 15
  • 31
xyzzy
  • 71
  • 2
  • 1) What's the size of the local table? 1b) Are you sure you're on the right server? 2) What's the relationship like; 1-1, 1-many, many-many? 3) As pointed out in one of the answers, `TOP` without `ORDER BY` is meaningless, but it's not much better to grab the top n rows if they're not actually in the set you're interested in - what was the point of grabbing the top 100 rows here? 4) – Clockwork-Muse Jul 27 '16 at 08:18

2 Answers2

18

TOP 100 with no ORDER BY means it is undeterministic which 100 rows from the remote table end up participating in the join. This is execution plan dependant and can vary.

If it is a one to many relationship it may be the case that one batch of 100 rows has more matches on the other side of the join than another different batch of 100 rows.

You should specify an ORDER BY (inside the derived table) on some unique column or combination of columns to ensure deterministic results.

Martin Smith
  • 77,689
  • 15
  • 224
  • 316
  • `TOP` without an `ORDER BY` is meaningless, yes, but I'd consider `TOP` before the join condition (which relevant rows!?) to be pretty meaningless anyways. You'd need to be able to get the same search criteria inside, which probably amounts to normalization violation. – Clockwork-Muse Jul 27 '16 at 08:14
  • @Clockwork-Muse - Not at all. It would be entirely possible to want the most recent 100 rows from remote table (`TOP 100 ... FROM RemoteTable ORDER BY DateInserted DESC`) and their associated details from local table. e.g. there is nothing meaningless about requesting the latest 100 order headers and their line items. (Though this would be unlikely to be split over servers) – Martin Smith Jul 27 '16 at 08:22
1

You can try forcing the remote query to run remotely:

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
        SELECT TOP 100 Something, L_ID 
        FROM OPENQUERY([RemoteServer], 'SELECT Something, L_ID
                                        FROM [RemoteDB].[dbo].[RemoteTable]'
                      )
               ) R
ON L.ID = R.L_ID

Or (if you want the 100 limiter to be in the remote query):

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
        SELECT Something, L_ID 
        FROM OPENQUERY([RemoteServer], 'SELECT TOP 100 Something, L_ID
                                        FROM [RemoteDB].[dbo].[RemoteTable]'
                      )
               ) R
ON L.ID = R.L_ID
SQLDevDBA
  • 2,203
  • 7
  • 12