Distributed queries in SQL Server are a blessing and a curse. What can I say, they are fascinating. For those of you who are not familiar with distributed queries, here is the crash course.
What is a distributed query?
So here is the deal, a distributed query is query that relies on data that lives on another data source. The classic example of distributed queries is the Linked Servers in SQL Server. You can also use the OpenQuery and OpenRowset stored procedures to get to the remote data. The difference between OpenQuery and OpenRowset is that the OpenQuery uses linked servers (linked servers store the connection string data); whereas OpenRowset requires the entire connection string to be passed.
For sanity purposes I like the OpenQuery it is cleaner and is more abstract.
Example:
—What not to do!
SELECT RTRIM(name), id FROM OPENQUERY(linkedserver, ‘SELECT name,id FROM foo’) WHERE id = 24234
–Much better!
SELECT name, id FROM OPENQUERY(linkedserver, ‘SELECT RTRIM(name),id FROM foo WHERE id = 24234’)
In the above example something might be popping in your head, and you’re right, it is a nested query! The theory that I use often is this: All you are doing is creating a temporary table from one server to use on another. All of the processing inside the OpenQuery occurs on the linked server first and then fills in a temporary table on the SQL Server from whence you are executing the distributed query. SQL Server then takes that temporary table (results of the OpenQuery) and runs the remainder of the query.
Here is the order of execution, prepare to have your mind blown:
- OPENQUERY
- FROM
- WHERE
- SELECT
Here is the mind blowing part, this also relates to performance. Let me explain. The more criteria (restricting the number of rows and fields) you impose in step 1, makes all the remaining steps faster. Likewise any data manipulation you do in step 1 will speed up the entire query. If you have more than one OPENQUERY linking to eachother then join them if possible (they have to be from the same data source), see the example below. Try to get to one OPENQUERY. Remember what I said from earlier, it downloads all that data and puts it into temporary tables. You could take 5 Mb of data transferred from a couple OPENQUERY’s and combine them to be under 1 Mb, this has a huge difference in network traffic and computing time on the SQL Server.
–This is really bad.
SELECT * FROM OPENQUERY(linkedserver1,’SELECT name, id from foo’) AS foo INNER JOIN OPENQUERY(linkedserver1,’SELECT name2, id from goo’) AS goo
ON foo.id = goo.id
–This is much better.
SELECT * FROM OPENQUERY(linkedserver1,’SELECT name, name2 FROM foo INNER JOIN goo ON foo.id = goo.id’)
Gotcha’s with Eloquence and the SQL/R ODBC
Only inner joins are accepted in the very old style: FROM foo, goo WHERE foo.id = goo.id
You can only link 5 tables per OPENQUERY
If you have any questions, just post a comment.