Wednesday, August 12, 2009
ISRO unveils mapping application 'Bhuvan'
However, the new mapping service -- Bhuvan, which means earth in Sanskrit -- will allow users to have a closer look at any part of the subcontinent barring sensitive locations such as military and nuclear installations.
ISRO Chairman G Madhavan Nair will unveil the Bhuvan webportal at a function here tomorrow in the presence of Minister of State in the PMO Prithviraj Chavan.
Bhuvan will use images taken by ISRO's seven remote sensing satellites at least a year ago. These eyes in the sky can capture images of objects as small as a car on the road to build three-dimensional map of the world.
http://bhuvan2.nrsc.gov.in
Thursday, July 9, 2009
Google Chrome Operating System
Google's blog announces a natural extension of the Chrome project: an operating system for netbooks. "Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year we will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010. (...) Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year. The software architecture is simple — Google Chrome running within a new windowing system on top of a Linux kernel."
As people use more and more web applications, the operating system becomes less important and it makes no sense to pay for it. The desktop mail client could be replaced by Gmail, the calendaring application could be replaced by Google Calendar, the office suite has lightweight alternatives: Google Docs and Zoho, it makes more sense to use an online feed reader like Google Reader, your scientific calculator is less powerful than Wolfram Alpha and you'll rarely need a video player when you have YouTube, Hulu and other video sites.
This idea is not new and there are already operating systems optimized for the browser. For example, Good OS announced last year Cloud, an operating system that "integrates a web browser with a compressed Linux operating system kernel for immediate access to Internet, integration of browser and rich client applications, and full control of the computer from inside the browser". If Google manages to create a great user interface, the new operating system could be very successful.
http://googleblog.blogspot.com/2009/07/introducing-google-chrome-os.html
Tuesday, March 31, 2009
Gmail in 5 Indian Languages Hindi, Tamil, Telugu, Kannada and Malayalam
When you compose a new mail in Gmail, you should now see an icon with an Indian character, as the screenshot below shows. This feature is enabled by default for Gmail users in India. If you do not see this function enabled by default, you will need to go the "Settings" page and enable this option in the "Language" section.
When you click the Indian languages icon, you can type words the way they sound in English and Gmail will automatically convert the word to its Indian local language equivalent. For example, if a Hindi speaker types "namaste" we will transliterate this to "नमस्ते." Similarly, "vanakkam" in Tamil will become "வணக்கம்." We currently support five Indian languages -- Hindi, Tamil, Telugu, Kannada and Malayalam -- and you can select the language of your choice from the drop-down list next to the icon.
They built this new feature using Google's transliteration technology, which is also available on Google India Labs, Orkut, Blogger and iGoogle. I hope you find this feature useful to communicate with those of your friends and family who prefer to write in their native language, and it will be available soon to businesses and schools using Google Apps. Now back to replying to all those Hindi emails I got from my family and friends today!
Sunday, March 29, 2009
How to Minimize SQL Server Blocking
When the LOCK_TIMEOUT setting is used to set a maximum amount of time that a connection can wait for a blocking lock to go away, the connection with the lock that is causing the blocking problem is not affected, but the connection waiting for the block is halted and an error message is received. When this happens, error message 1222, "Lock request time-out period exceeded," is sent to the application.
This means that the application needs to include the appropriate error-handling code to deal with this situation and take the appropriate action, which includes rolling back the transaction. If the application does not know how to deal with this error message, and the transaction is not rolled back, it is possible that the application can continue as if the transaction was not automatically cancelled. Because of this, you should not use the LOCK-TIMEOUT setting unless your application(s) that will be affected by it know what to do when they receive this message from SQL Server.
The syntax for the SET LOCK_TIMEOUT is:
SET LOCK_TIMEOUT timeout_period
Where timeout_period is the number of milliseconds that a connection waits for a blocking lock to go away before an error is returned from SQL Server to the application. A value of -1 is the default, which means to wait indefinitely. A value of 0 tells SQL Server not to wait at all, and to return the error immediately.
This command is based on a per connection basis, and stays with the connection until the connection is broken, or a new SET LOCK_TIMEOUT command is issued. [7.0, 2000]
A view runs on our development server in 30 seconds, but it takes over 10 minutes on the production server. Why?
Solution
Assuming that the servers and the databases are virtually the same, and the statistics have been updated on both of them, here's what is suggestted you to check. First, do an Execution Plan of the view that is causing the problem on the development server. Find out if this particular view is using parallelism as part of the execution plan. I am guessing that this is the case.
Now, do an Execution Plan of the view on the production server. Find out if this particular view is using parallelism as part of the execution plan. My guess is that it is not.
If I am write so far, then check the SQL Server Parallelism setting for each server. I bet that it is set to "Use All Available CPUs" on the development server, and set to use only one CPU on the production server. If I am right, then set the production server to "Use All Available CPUs," and the performance difference of the same view on both servers should be resolved, and performance should be virtually identical on both servers.
By default, SQL Server is set to use "Use All Available CPUs" on a server. If this is not your current setting, then this has been changed by someone.
Essentially, this setting tells SQL Server whether to attempt to execute queries in parallel using multiple processors. If the default "Use All Available CPUs" option is selected, then SQL Server will attempt to run queries in parallel, using multiple CPUs. This can sometimes result in dramatic performance differences in a query's performance. But if this option is turned off, then SQL Server will only use one CPU to execute a query, which can be much slower.
Keep in mind that if SQL Server is set to use parallelism, that it may not always use it. If the server is very busy, it may not be used, but when the server is less busy, it may be used. This means that sometimes the view will be fast, and other times, slow (depending if parallelism is being used or not).
On the other hand, sometimes parallelism can cause queries to take longer to execute than if parallelism is not used. In these cases, you can use a Query Hint to turn off parallelism for the odd-acting query.
SQL Server Transact-SQL DML
For example, you could accomplish this two different ways:
USE Northwind
UPDATE Products
SET UnitPrice = UnitPrice * 1.06
WHERE UnitPrice > 5
GO
USE Northwind
UPDATE Products
SET UnitPrice = ROUND(UnitPrice, 2)
WHERE UnitPrice > 5
GO
Or
USE Northwind
UPDATE Products
SET UnitPrice = ROUND(UnitPrice * 1.06, 2)
WHERE UnitPrice > 5
GO
As is obvious from this example, the first option requires two queries to accomplish the same task as the second query. Running one query instead of two or more usually produces the best performance. [6.5, 7.0, 2000, 2005]
Wednesday, March 25, 2009
How many logical and physical processors do you have?
cpu_count / hyperthread_ratio AS [Physical CPUs]
FROM sys.dm_os_sys_info
Sorting different dateformats correctly
(
SomeDate DATETIME
)
INSERT @Stats
SELECT 20000 + ABS(CHECKSUM(NEWID())) % 30000
FROM master..spt_values
DECLARE @Style INT
SET @Style = 100
WHILE @Style <= 113
BEGIN
-- Orders by ISO format but displays according to @Style parameter
SELECT TOP 10 @Style AS Style,
CONVERT(VARCHAR(40), SomeDate, @Style) as SomeDate
FROM @Stats
GROUP BY CONVERT(VARCHAR(40), SomeDate, @Style),
CONVERT(VARCHAR(8), SomeDate, 112)
ORDER BY CONVERT(VARCHAR(8), SomeDate, 112) DESC
SET @Style = @Style + 1
END
The general idea is to group by both ISO format and the style you wish and the sort by the ISO format and display the other format.
The reason this work is that the two lines in GROUP BY clause are deterministic.
Performance Tuning for SQL Server Backup and Restore
*****
If you are backing up your databases directly over a network (not really recommended as it hurts performance), one way to boost throughput is to perform the backup over a dedicated network which is devoted to backups and restores only. All devices should be on the same high-speed switch. Avoid going over a router when backing up over a network, as they can greatly slow down backup speed. [6.5, 7.0, 2000, 2005]
Friday, March 13, 2009
SQL Server Transact-SQL DML
If you are UPDATing a column of a row that has a unique index, try to update only one row at a time.
Try not to change the value of a column that is also the primary key.
When updating VARCHAR columns try to replace the contents with contents of the same length.
Try to minimize the UPDATing of tables that have UPDATE triggers.
Try to avoid UPDATing columns that will be replicated to other databases.
Try to avoid UPDATing heavily indexed columns.
Try to avoid UPDATing a column that has a reference in the WHERE clause to the column being updated.
Of course, you may have very little choice when UPDATing your data, but at least give the above suggestions a thought.
Thursday, March 12, 2009
How to Perform Multiple lookups to the same table
In a SQL Server Integration Service (SSIS) package, when lookups are performed on the tables which are inserted in the same package by a previous task, the error “could not match” is raised.
See the failed package layout. From the insert data, two records will be inserted and at the lookup those values are used to get the id. However, the package fails indicating that records are not found even though those records are in the table.
This is happening due to the cache setting in the lookup. There are three types of cache, Full, Partial and No caching.
Full pre-caching, is when the complete reference data set is read before the input is processed. This is the default caching type.
Partial caching, is when the Lookup transformation specifies the size of the cache that is loaded with reference data. This option is available only with connections that support keyed access.
No caching, the reference data set is accessed by each row in the rowset.
As the default caching type is full, before the start of the data flow, it will cache the lookup data to improve the performance. As the data is inserted after the caching, the lookup it will failed. Therefore, for this kind of implementation you need to set caching type to No Cachin
Wednesday, March 11, 2009
Action plan: how to get listed in Google's search results for many keywords
There are good ways and bad ways to target multiple keywords. If you choose the wrong way, search engines will think that you are a spammer. If you choose the right way, your website will get higher rankings. The following three steps will help you to get high rankings for as many keywords as possible.
Step 1: Remove stuffed keywords from your web pages
Some webmasters think that the best way to optimize a website for multiple keywords is to add them on every web page and separate them with a comma.
These websites usually have extremely long meta tags with dozens if not hundreds of keywords. You won't get high rankings on Google with that method. Add dozens of keywords on a web page is called keyword stuffing and all search engines consider this spam. You actually risk your search engine rankings if you use this method on your web pages.
To do: Check if your web pages contain long keywords lists. If yes, remove them from your pages.
Step 2: Optimize different pages of your website for different keywords
If you add many keyword on the same web page, then the web page will be somewhat relevant to many keywords. If you optimize a web page for one keyword/keyphrase, then your web page will be very relevant to this keyword.
It is much easier to get high rankings if the web page is very relevant to the searched keyword. For that reason, you should optimize as many pages of your website for as many keywords as possible.
The keywords for which you optimize your web pages should be related. If you have a shoe shop, optimize different pages of your website for the keywords sports shoes, sneakers, sandals and boots. By having optimized web pages for these keywords, you show Google that your website is relevant to the topic shoes.
To do: Find the best keywords for your website and then optimize different pages of your website for different keywords.
Step 3: Add keyword modifiers and combine these variations
In step two, you have optimized your web pages for keywords such as sneakers and sandals. The next step is to add modifiers to these keywords on your optimized pages. For example, the sneakers page could include these keyword variations: sneakers shop, nike sneakers, cool sneakers, etc.
It's usually a good idea to combine all of these keyword variations in one sentence (for example for your web page title): Buy cool Nike sneakers in our sneakers shop in London. With this single sentence, your web page can be found for many keyword combinations that are included in this sentence.
To do: Find keyword modifiers and add different keyword variations to your optimized pages.
If you optimize many pages of your website for different but related keywords and if you add variations of each keyword to each page than the content of your web site will be perfectly optimized for high rankings on Google.
If you combine optimized content with good inbound links, your website will get the best possible rankings on Google and other search engines.
Nine factors that affect your website usability
If you do it correctly, your website will be attractive to both web surfers and search engine spiders. The following list shows nine factors that can improve the usability of your website as well as your search engine rankings.
1. You should have fast loading web pages
Usability: Web surfers don't want to wait for web pages.
Search engine optimization: Search engines can index your web pages more easily.
2. Your web pages should be easy to read
Usability: It's easier for web surfers to read your web pages.
Search engine optimization: Near-white text on a white background and tiny text is considered spam by most search engines.
3. The contents of your web pages should be clearly arranged
Usability: Clear headings, paragraphs and bullet lists make your web pages easier to read.
Search engine optimization: Clear headings, paragraphs and bullet lists make it easier for search engines to find the topic of your web pages.
4. Your web page images should use the IMG ALT attribute
Usability: Web surfers with images turned off and visually impaired visitors will be able to see the content of your images.
Search engine optimization: Search engines cannot index the content of your images but they can index the content of the IMG ALT attribute.
5. You should use custom 404 not found error pages
Usability: If your 404 not found page contains links to other pages of your website or a search form then people might remain on your website.
Search engine optimization: Proper 404 error pages make sure that search engines index the right pages of your website.
6. Your website should be easy to navigate
Usability: Clear and concise navigation links that are easy to find help your website visitors to find content on your site.
Search engine optimization: Clear and concise navigation links that contain your keywords make it easy for search engines to index all of your web pages.
7. Important content is above the fold
Usability: Web surfers with small computer screens can quickly see what your web page is about.
Search engine optimization: The sooner your important content appears in the HTML code of your web pages, the more likely it is that it will be indexed by search engines.
8. Your web page titles are explanatory
Usability: If web surfers bookmark your web pages, a clear web page title will help them to find it again.
Search engine optimization: The web page title is one of the most important SEO elements. It should contain your keywords and it should look attractive so that web surfers click on it when they see your web page title in the search results.
9. The URLs of your web pages are meaningful and self-explanatory
Usability: It's much easier to remember a web page like www.example.com/support than a web page like www.example.com/123123-werwc.php?2342234.
Search engine optimization: If your URLs contain your keywords, this can have a positive effect on your search engine rankings. Dynamic URLs with many variables can lead to problems with search engine spiders.
There are many more factors that influence the position of your website in Google's search results.
Tuesday, March 10, 2009
What's new in SQL Server 2008 Reporting Services?
IIS is no longer required
The previous two releases relied on IIS to host the RS Web service and the Report Manager Web application. Many organizations have made a rule of not installing IIS on the same machine as SQL Server for security reasons. This dependency on IIS therefore became a showstopper for using SSRS for many potential users. Based on user feedback, Microsoft re-architected SSRS 2008 and removed this dependency.
SSRS now handles HTTP via HTTP.SYS, a native OS kernel that intercepts and handles HTTP requests just like IIS. If this sounds familiar, that's because it's the same mechanism used by the Native XML Web Services in SQL Server 2005 – a feature that, interestingly, was removed in SQL Server 2008. Microsoft also claims that this implementation provides better performance and scalability. The Reporting Services Configuration tool has been updated to provide management capabilities for Report Manager and Reporting Server services.
Better memory management
Users who have deployed SSRS in high-traffic environments may have noticed performance degradation when many concurrent users access reports with many pages. The reason for the slowness was that SSRS was holding rendered reports in memory, causing problems if memory demands became too high. SSRS now saves rendered pages to the file system to remove the load on server memory. For large reports, it also renders reports as the user is viewing them. This on-demand processing decreases the load on the server by not rendering pages the user might potentially not view at all. In addition to these memory enhancements, you can now also set a threshold on how much memory SQL Server Reporting Services should use.
Export into Microsoft Word
This feature has been requested pretty much since SSRS 2000 came out in 2004, mainly because there are more Word users than Excel users. For whatever reason, this feature did not make it to SQL Server 2005, but it's finally here and certainly a much welcome addition to all the other export options.
Improved charting components
Last year Microsoft acquired Dundas Data Visualization technology for SSRS. Dundas is a company specializing in developing powerful visual components for developers, such as chart, gauge, map and calendar controls. Purchasing the code base for the chart, gauge and calendar controls allowed Microsoft to provide better integration of these components with SSRS and to control the components' future direction. Dundas components have always been visually stunning, so it's no surprise that the new chart component is like night and day compared with chart control in the previous versions of SSRS.
In my opinion, if you do a lot of charting in your reports, this feature alone justifies the effort to upgrade SQL Server 2008 Reporting Services. The new chart region includes many additional chart types, such as cylinder, pyramid, radar and funnel. Other useful enhancements are interval labeling for avoiding label collisions, combining small slices into a single slice in a pie chart and more 2-D and 2-D visual effects.
Tablix data region
This new data region combines the functionality of both table and matrix data regions. This region allows you to combine multiple column groups and row groups, essentially allowing you to create reports that have both dynamic row and dynamic column output. You can now more easily create nested and recursive groups and make them adjacent to each other. There is a lot more to the Tablix data region than I can describe in this space. For more details, read the Understanding the Tablix Data Region section of Microsoft Books Online.
Enhanced CSV export
Comma-separated value (CSV) export can be very useful for automating report processing if you are interested in extracting data elements from the report. SSRS CSV export in previous versions included both the data and the layout in the export. But the inclusion of report layout elements in the CSV output created additional work and complications. The new output format is much cleaner and easier to use for automating programmatic processing.
There are plenty of significant reasons for you to give SSRS 2008 serious consideration. And remember, just like Integration Services, SSRS is a standalone tool you can install on a dedicated server. This means, if your organization is not ready to migrate to SQL Server 2008, you can upgrade just your reporting server and use these new features, using your 2000 or 2005 databases as data sources.
History of SQL Server Integration Services
The first release of DTS shipped with SQL Server 7.0 to receptive users. The alternatives at the time were either difficult to work with, expensive, or both. Many DBAs were forced to write custom transformation software, which was inflexible and difficult to maintain. Some tools had limitations, such as the need for source and destination schemas to match exactly, direct dependence on a particular database product, and/or no transformation capabilities. Many wrote custom parsing and transformation applications. For example, many companies are only now converting from hand-coded flat file parsers, SQL scripts, and transformation code to a standard platform such as Integration Services.
The first release of DTS addressed several of these issues and simplified life for a lot of people. By using OLEDB for its data access layer, DTS could access various data sources with little or no custom coding. DTS was also affordable because it shipped "in the box" with SQL Server. Users had access to all the power of more expensive products, yet incurred no additional cost for their ETL tools. This was obviously a benefit to IT shops trying to stretch their budgets. DTS was a flexible product that was easy to use. There were also a number of standard tasks in the box, including the Transform Data, Execute Process, Active X Script, Execute SQL, and Bulk Insert Tasks.
SQL Server 8.0 added even more functionality by adding more tasks. The Execute Package, FTP, and MSMQ Tasks added incremental improvements across the product. However, users experienced some frustration with DTS when attempting to work with large datasets and some of the other limitations inherent in a script-based tool. The time was ripe to create a truly enterprise-ready integration tool.
In 2000, SQL Server decided to make a substantial investment in the ETL and Integration space and brought together some talented folks who formulated the ideas behind the Data Flow Task and the next version of Integration Services. Over a period of five years, the development time frame for SQL Server 2005, the DTS team completely redesigned and rewrote DTS to become Integration Services 2005.
Integration Services 2008 is the next incarnation of that release. While not as revolutionary a release as in 2005, Integration Services 2008 brings some new capabilities and incremental improvements across the product.
What Is SQL Server Integration Services?
Data Import/Export Wizard
One of the most popular features of Integration Services is the Import/Export Wizard, which makes it easy to move data from a source location such as a flat file or database table to a flat file, table, or other destination. The Import/Export Wizard was the first utility developed back in the SQL Server 7.0 time frame and continues today as an important utility in the database administrator's (DBA) toolbox.
ETL tool
ETL is an acronym for Extract, Transform, and Load and describes the processes that take place in data warehousing environments for extracting data from source transaction systems; transforming, cleaning, deduplicating, and conforming the data; and finally loading it into cubes or other analysis destinations. Although Data Transformation Services (DTS), Integration Services' predecessor application, was considered a valuable tool for doing ETL, Integration Services is where true Enterprise ETL became available in SQL Server.
Control flow engine
The processes involved in moving data from location to location and transforming it along the way are not restricted to only processing data. Integration Services provides a control flow for performing work that is tangentially related to the actual processing that happens in data flow, including downloading and renaming files, dropping and creating tables, rebuilding indexes, performing backups, and any other number of tasks. Integration Services provides a full-featured control flow to support such activities.
Application platform
Developers can create applications that use Integration Services as a platform, embedding the engines within their application using the provided object models. As a developer, you can embed the Integration Services engines and components within your applications using the object models.
High performance data transformation data pipeline
That's a mouthful and really incorporates two ideas: high performance and data pipelining. The Data Flow Task is a high-performance tool because you can use it to perform complex data transformations on very large datasets for incredibly performant processing. The pipeline concept means that you can process data from multiple heterogeneous data sources, through multiple parallel sequential transformations, into multiple heterogeneous data destinations, making it possible to process data found in differing formats and on differing media in one common "sandbox" location.
Performance Tuning SQL Server Backup and Restore
What can sometimes happen to the DBA is that a database may be of reasonable size now, given their current server hardware, and backups and restores may take a reasonable amount of time. But as time goes by, the database grows, and at some point the amount of time to backup and restore the database becomes too long given the backup and restore window available. The moral to this story, keep a close eye on backup and restore times, and factor this in when projecting future hardware needs for your SQL Server databases. [6.5, 7.0, 2000, 2005]
Saturday, March 7, 2009
Reasons Why You May Not Want to Use a Heap
As I compiled the list, in some cases I have had the opportunity to verify that they are true, but in other cases, I have not had the time to verify them. I would like your input on the list. Is there anything on the list that is not true, or only partially true (under some circumstances, but not others), and what other reason are there to avoid using heaps that have I left out?
I look forward to your feedback.
If non-clustered indexes are not added to a heap, then all queries against a heap will require table scans to retrieve the requested data. If the heap is large, then these queries will be very resource intensive and hurt SQL Server's overall performance.
Since the data in a heap is unordered, performing a table scan on a heap can cause a lot of extra I/O activity because inefficient random reads, not efficient sequential reads, are more the norm.
While a non-clustered index can be added to a heap to speed up some queries, in almost all cases, to make use of a non-clustered index, the use of a bookmark lookup is required. A bookmark lookup means that once the record(s) to be returned by the query are identified in the non-clustered index, additional reads (the bookmark lookup) must be made of the related rows in the heap, so that all of the data requested by the query is returned. This is not very I/O efficient, especially if many rows are returned. At some point, it may be faster for SQL Server to do a table scan than it is to use a non-clustered index when returning many rows.
Heaps cannot be replicated using SQL Server replication.
If you want to create an XML index on an XML data column, a clustered index must exist on the table.
If you want to create a spatial index on a spatial data column (GEOMETRY or GEOGRAPHY), a clustered index must exist on that table.
If a heap has a non-clustered index on it (as the primary key), and data is inserted into the table, two writes have to occur. One write for inserting the row, and one write for updating the non-clustered index. On the other hand, if a table has a clustered index as the primary key, inserts take only one write, not two writes. This is because a clustered index, and its data, are one in the same. Because of this, it is faster to insert rows into a table with a clustered index as the primary key than it is to insert the same data into a heap that has a non-clustered index as its primary key. This is true whether or not the primary key is monotonically increasing or not.
When data is updated in a heap, and the updated row is larger than the old row and can't fit into the old space, a forwarding record is inserted into the original location that points to the new location of the page. If this happens a lot, then there is a lot of space wasted in a database maintaining the forwarding records. This also contributes to additional I/O activity as both the pointer, and the row, have to be read.
Even if data updated in a heap is not larger than the old row (the updated data is smaller or the same size than the original data), updating a heap with a non-clustered primary key is slower than updating the same table that has a clustered index as the primary key. This is because updating a table with a clustered index is less write intensive than updating a heap with a non-clustered index as its primary key.
If a row is deleted from a heap with a non-clustered index as its primary key, it is slower than deleting the same row from the same table with a clustered index as its primary key. This is because it takes more I/O to perform this task on a heap than on a clustered index.
When data is deleted from a heap, the data on the page is not compressed (reclaimed). And should all of the rows of a heap page are deleted, often the entire page cannot be reclaimed. This not only wastes space, it contributes to fragmentation of the data pages within a database.
If you take two identical tables, one that is a heap with a non-clustered index as its primary key, and a table that has a clustered index as its primary key, the heap with the non-clustered index will be substantially larger, wasting valuable space and increasing disk I/O.
The ALTER INDEX rebuild and reorganize options cannot be used to defragment and reclaim space in a heap (but they can used to defragment non-clustered indexes on a heap). If you want to defragment a heap, you have three options: 1) create a clustered index on the heap, then drop the clustered index; 2) Use SELECT INTO to copy the old table to a new table; or 3) use BCP or SSIS to move the data from the old table to a new table.
Thursday, March 5, 2009
Importing and exporting bulk data with SQL Server's bcp utility
This tip discusses several examples of bcp commands to bulk-copy data in and out of SQL Server tables. The examples have been tested on SQL Server 2005 and SQL Server 2008, and for all examples I used the AdventureWorks sample database.
Exporting data with the bcp utility
One of the simplest operations that you can perform with the bcp utility is to bulk-copy data out of a SQL Server table into a text file. By entering the command at a Windows command prompt, you can run a bcp command. The following command, for example, copies the data from the Sales.vSalesPerson view in the AdventureWorks database to the C:\Data\SalesPerson.txt file:
bcp AdventureWorks.Sales.vSalesPerson out C:\Data\SalesPerson.txt -c –T
As you can see, the bcp command begins with the utility name, followed by the fully qualified name of the table (database.schema.table). The out keyword is next, which tells the bcp utility that the data will be exported from the table. The path and file name of the target text file follows the out keyword. Note that the command examples in this tip might wrap to multiple lines. All the examples, however, should be run as a single command.
In addition to these basic arguments, the bcp utility supports several switches that control the utility's behavior. In the example above, the -c switch indicates that all data is treated as character data, regardless of how it is stored in the source table. If you do not specify the -c switch or another type-related switch, you will be prompted to specify the type for each column after you enter the bcp command. You may notice that bcp command switches are case sensitive.
The next switch in the above example is -T, which tells the bcp utility to use a trusted connection to connect to the SQL Server instance. If you do not specify -T, you must provide your user name (the -U switch) and your password (the -P switch), or you will be prompted for this information.
Because no instance is specified in the preceding example, the utility uses the default instance on the local machine. To specify a SQL Server instance, use the -S switch, followed by the server name, as in the following example:
bcp AdventureWorks.Sales.vSalesPerson out C:\Data\SalesPerson.txt -c -T -S Server01
The bcp utility will now connect to the default instance on Server01. If you want to connect to an instance other than the default, you must specify the instance name along with the server name, as in Server01\SqlSrv.
By default, the bcp utility uses tabs to separate fields in the target file. You can, however, override that behavior by using the -t switch, as in the following example:
bcp AdventureWorks.Sales.vSalesPerson out C:\Data\SalesPerson.csv -c -T -t,
In this case, a comma is specified after the -t switch, which means that the data fields will now be separated by commas. This allows you to save the data to a .csv file, as I've done here, which can be handy if you want to view the data in a Microsoft Excel file.
These examples have been limited to exporting data from a table. But you can also use the bcp command to run a Transact-SQL query and export the query results. The following bcp command, for example, includes a SELECT statement that retrieves only the SalesPersonID, FirstName, and LastName columns from the vSalesPerson view:
bcp "SELECT SalesPersonID, FirstName, LastName FROM AdventureWorks.Sales.vSalesPerson"
queryout C:\Data\SalesPerson.csv -c -T -t,
In this scenario, the query, enclosed in quotes, is passed into the bcp command instead of the table name. In addition, the out keyword has been replaced with queryout. But the rest of the command is the same as the preceding example. As a result, the SalesPerson.csv file will now contain only the three specified columns. You can also refine your query further: You can include, for example, a WHERE clause that qualifies which rows are returned from the source table.
Importing data with the bcp utility
The bcp utility makes importing data as simple as exporting data. To run the examples in this section, first run the following T-SQL script to create the SalesPeople table in the AdventureWorks database:
USE AdventureWorks
GO
IF OBJECT_ID (N'SalesPeople', N'U') IS NOT NULL
DROP TABLE dbo.SalesPeople
GO
CREATE TABLE dbo.SalesPeople (
SalesPersonID INT IDENTITY PRIMARY KEY,
FirstName NVARCHAR(50) NOT NULL,
LastName NVARCHAR(50) NOT NULL
)
To import data, you need a source file from which to copy the data. For instance, the following command uses the recently created SalesPerson.csv file to load data into the SalesPeople table:
bcp AdventureWorks.dbo.SalesPeople in C:\Data\SalesPerson.csv -c -T -t,
First, you must specify the target table, followed by the in keyword that replaces out or queryout in this scenario. Next, you must specify the path and file name of the source file, followed by any applicable switches.
When you run this command and view the results, note that while the source file includes the salesperson IDs, these values are not inserted into the SalesPersonID column. The column is defined as an IDENTITY column, so the source data is ignored. To retain the original values, you must add the -E switch to the command, as in the following example:
bcp AdventureWorks.dbo.SalesPeople in C:\Data\SalesPerson.csv -c -T -t, -E
Your table will now include the desired data.
Using a format file
When importing or exporting data, you may find that the structure of your source data does not match the structure of the target data. The columns in a text file, for example, might not be in the same order as those in your target table, or the number of columns may be different. You can address this issue by creating a format file that maps the structures of the source and destination. Let's look at an example to demonstrate how this works.
Suppose you use the following command to export data from the vSalesPerson view to the SalesPeople.txt file:
bcp "SELECT LastName, FirstName, SalesPersonID FROM AdventureWorks.Sales.vSalesPerson"
queryout C:\Data\SalesPeople.txt -c -T -t,
This command uses the same arguments as those in previous examples. Note, however, the order in which the columns are retrieved from the view: LastName, then FirstName, and finally SalesPersonID.
Now suppose you plan to use this file to import data into the SalesPeople table. The columns in the table, however, are defined in a different order from that in the text file. To address this issue, you can create a format file to map the columns from the source to the destination. The following command demonstrates how to create a format file:
bcp AdventureWorks.dbo.SalesPeople format nul -f C:\Data\SalesPeople.fmt -c -T -t,
As previous examples show, the command first specifies the target table. This time, however, the table name is followed by the keywords format nul, which indicate that the bcp utility should create a format file. The -f argument is then used to specify a path and file name for the format file, followed by several switches. As a result, when you run this command, a format file is generated that contains the structure of the SalesPeople table.
The following data shows the content of the SalesPeople.fmt format file that is generated by the above command:
10.0
3
1 SQLCHAR 0 12 "," 1 SalesPersonID ""
2 SQLCHAR 0 100 "," 2 FirstName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\r\n" 3 LastName SQL_Latin1_General_CP1_CI_AS
The first row of the file (10.0) identifies the version of bcp that is being used. The second row (3) identifies the number of columns in the table, and the next three rows provide information about the columns themselves:
The first field indicates the order in which the columns should appear in the source document.
The second field shows the source file data type for each column. Because I specified the -c switch when I generated the file, a character data type is used for all fields as they are extracted from the data file. SQL Server will then convert this data into the correct types as the data is inserted.
The third field shows the prefix length for the field, which is used by SQL Server to provide the most compact file storage. A value of 0 is used automatically if you specify the -c switch when creating the format file.
The fourth field indicates the number of bytes for the data type of a particular field.
The fifth field shows how columns and rows are terminated. Because the -t switch was used when creating the format file, the field values of the source file must be terminated by commas.
The sixth field maps to the order in which the columns appear in the SQL Server table.
The seventh and final field provides the collation information for character columns in the SQL server table.
To use the format file to import data into the SalesPeople table, we must modify the file as follows:
10.0
3
1 SQLCHAR 0 100 "," 3 LastName SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "," 2 FirstName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 12 "\r\n" 1 SalesPersonID ""
As you can see, the order of the columns has been modified to reflect how they appear in the format file. The SalesPersonID column is now listed last and it is terminated with \r\n. The LastName column is now listed first and is terminated with a comma.
After you modify and save the format file, it is ready to use in your bcp commands. The following example demonstrates how to call the format file:
bcp AdventureWorks.dbo.SalesPeople in C:\Data\SalesPeople.txt -f C:\Data\SalesPeople.fmt –T
Note that, when you import data from the SalesPeople.txt file, you must also use the -f switch to call the format file. Also notice that you no longer need to include the -t and -c switches because the information is now included in the format file.
Friday, February 13, 2009
Passing a Table to a Stored Procedure
This article introduces the new feature added to SQL Server 2008, which supports passing a TABLE to a stored procedure or function.
This article is based on SQL Server 2008 CTP 3. Some of the information may change by the time the product is finally released.
Before we create a Function or Stored Procedure that accepts a TABLE variable, we need to define a User Defined TABLE Type. SQL Server 2008 introduced a new User defined TABLE type. A TABLE type represents the structure of a table that can be passed to a stored procedure or function.
So the first step is to create a User Defined TABLE type. The following TSQL code creates a User defined TABLE type named "ItemInfo".
CREATE TYPE ItemInfo AS TABLE (
ItemNumber VARCHAR(50),
Qty INT )
You can use the system view SYS.TYPES to see the type that you have just created. The following query returns all the types defined in the system.
SELECT * FROM SYS.TYPES
/* If you just need to find information about the TABLE types, you could find it from the following TSQL query.*/
SELECT * FROM SYS.TYPES WHERE is_table_type = 1
/* There is another view, which is handy to find information about TABLE types. */
SELECT * FROM SYS.TABLE_TYPES
We have created a TABLE type that we need. Now let us see how it works. Let us create a variable of type "ItemInfo" and try to insert a few records to it. Then lets query the table variable to see if the information is correctly inserted. [code]
/* Let us declare a variable of type ItemInfo which is a TABLE Type */
DECLARE @items AS ItemInfo
/* Insert values to the variable */
INSERT INTO @Items (ItemNumber, Qty)
SELECT '11000', 100 UNION ALL
SELECT '22000', 200 UNION ALL
SELECT '33000', 300
/* Lets check if the values are correctly inserted or not */
SELECT * FROM @Items
/* OUTPUT:
ItemNumber Qty
-------------------------------------------------- -----------
11000 100
22000 200
33000 300
*/
Now let us create a stored procedure that accepts a TABLE variable. Let us create a very simple stored procedure which accepts a TABLE variable and SELECTs contents of the table.
1 CREATE PROCEDURE TableParamDemo
2 (
3 @Items ItemInfo
4 )
5
6 AS
7
8 SELECT *
9 FROM @Items
Well, this would generate the following error:
1 /*
2 Msg 352, Level 15, State 1, Procedure TableParamDemo, Line 1
3 The table-valued parameter "@Items" must be declared with the READONLY option.
4 */
A table variable that is passed to a stored procedure or function should be marked as READONLY. The "callee" cannot modify the table being passed into it. Here is the correct code.
1 CREATE PROCEDURE TableParamDemo
2 (
3 @Items ItemInfo READONLY
4 )
5
6 AS
7
8 SELECT *
9 FROM @Items
Now let us execute the stored procedure we just created. Run the following code.
1 /*
2 declare the variable
3 */
4 DECLARE @items AS ItemInfo
5
6 /*
7 Insert values to the variable
8 */
9
10 INSERT INTO @Items (ItemNumber, Qty)
11 SELECT '11000', 100 UNION ALL
12 SELECT '22000', 200 UNION ALL
13 SELECT '33000', 300
14
15 /*
16 Execute the procedure
17 */
18 EXECUTE TableParamDemo @Items
19
20 /*
21 OUTPUT:
22
23 ItemNumber Qty
24 -------------------------------------------------- -----------
25 11000 100
26 22000 200
27 33000 300
28
29 */
You cannot modify the TABLE parameter passed into the stored procedure. If you try to do so, you will get an error as shown in the following example.
1 CREATE PROCEDURE TableParamDemo
2 (
3 @Items ItemInfo READONLY
4 )
5
6 AS
7
8 SELECT *
9 FROM @Items
10
11 INSERT INTO @Items (ItemNumber, Qty)
12 SELECT '1001', 20
13
14 /*
15 OUTPUT:
16
17 Msg 10700, Level 16, State 1, Procedure TableParamDemo, Line 11
18 The table-valued parameter "@Items" is READONLY and cannot be modified.
19 */
Conclusions
The support for TABLE variables is very interesting. While working with User Defined TABLE Type, please note that you cannot use it as a column of a table. Please also note that, once created, you cannot alter the structure of the TABLE.
SQL Server Federated Database Performance Tuning
Federated database servers works best for databases that can be naturally partitioned over multiple servers. For example, if the data can be segregated by product line or geographical location, then the data can easily be separated over multiple databases in a federation.
Here's a federated database might work. Let's say that a company has customers in North America, South America, Europe, and Asia, and that the data stored for every customer, no matter where they are located, is identical. One option would be to partition the data based horizontally on continent. In this case, four different SQL Servers would be needed in the federation, each storing its respective data.
In almost all cases, when a query is run against the customer data stored in the federated SQL Servers, the data will all be related to a specific continent. Because of this, only one SQL Server will be hit with the query, not all of the SQL Servers. And assuming that customers in all continents will be queried, the separation of customers by continent on different SQL Servers will spread the queries among all of the servers, allowing greater performance and scalability. [2000, 2005]
Sunday, February 8, 2009
String comparison: binary vs. dictionary
To get a feel for the extent of the performance difference, I ran several tests comparing two commonly-used collations:
· Latin1_General_BIN, and
· SQL_Latin1_General_CP1_CI_AS
More specifically, I ran the following T-SQL script in a loop for 100 times:
DECLARE @s1 varchar(max), @s2 varchar(max), @i int
DECLARE @dummy int, @dt datetime
DECLARE @length int
SET @length = 1000000 -- or 100, or 1000000
SELECT @s1 = REPLICATE(CAST('a' as varchar(max)), @length)
SELECT @s2 = REPLICATE(CAST('a' as varchar(max)), @length)
SELECT @i = 1, @dt = GETDATE()
WHILE @i < 1000
BEGIN
IF @s1 = @s2 COLLATE SQL_Latin1_General_CP1_CI_AS
--IF @s1 = @s2 COLLATE Latin1_General_BIN
SET @dummy =0
SET @i = @i + 1
END
SELECT DATEDIFF(ms, @dt, GETDATE())
The script was run in a number of scenarios:
· The @length variable was set to 100, 10,000, and 1,000,000 to see the impact of the string length on the comparison method,
· The string comparison was done in the IF clause with either the Latin1_General_BIN collation or the SQL_Latin1_General_CP1_CI_AS collation
The most salient point to note is that as the string length increases, the performance difference between the two comparison methods increases. While there was no performance difference when the strings were 100 bytes long, the binary comparison was about twice as fast as the dictionary comparison when the string length was 10,000 bytes. When the string length increased to 1,000,000, the binary comparison was about seven times faster.
So, if you are comparing two short strings, you probably shouldn’t bother to explicitly cast the comparison into a binary collation. But when the strings can be very long, it can result in a huge performance improvement to explicitly specify a binary collation for string comparison.
Note that string comparison in COLLATE Latin1_General_BIN and string comparison in COLLATE SQL_Latin1_General_CP1_CI_AS have different semantics. But there are many cases where you have control over your data, and the difference in comparison semantics does not matter (e.g. when you know your data is all in lower case).
Nothing is particularly new here. Just want to contribute some data points to the community.
Sunday, February 1, 2009
Indexed View Performance Tuning and Optimization
Once a unique clustered index has been created for a view, you can also create non-clustered indexes for the same view, which can be used by queries against the view to enhance performance.
As the underlying tables of the view are modified, the clustered index, and any non-clustered indexes of the view, is modified so that it is always up-to-date when it is accessed. And just like indexes on tables, indexes on views experience modification overhead. So only add an index to a view if the benefit of its speed increase when running exceeds the time it takes to update the view's index.
Indexed views can be used by SQL Server two different ways. First, the view can be called directly from a query, as conventional views are currently used. But instead of running the view's underlying SELECT statement and creating the view's result set on the fly, it uses the unique clustered index to display the results of the view almost immediately. Second, any query that is run on SQL Server 2000/2005 is automatically evaluated to see if any existing indexed views exist that would fulfill the query. If so, the Query Optimizer uses the indexed query, even though it has not been specified in the query, greatly speeding the query.
To get the most benefit out of indexed views, you need to use the SQL Server 2000/2005 Enterprise Edition. While you can create indexed views in the other editions of SQL Server 2000/2005, they will not be automatically considered by the query optimizer, and they require the use of the NOEXPAND hint to be used. [2000, 2005]
Friday, January 30, 2009
Good bye
Not sure any words or condolences make an announcement like this any less painful. Families and friends are directly impacted in a negative way when you hear about news like this and I consider all of you as friends and extended family members. I don't even want to think about the anger and disappointment everyone feels over there. To say I am shocked and confused is an understatement.
Playing a blame game of what went wrong and where did it go wrong leaves me with no answers or directions. Its easy to blame it on the current Economy of the USA, and perhaps I have been somewhat blinded by the fact it has not already hit our company. I assumed this was hitting companies much bigger than ourselves that may have been overstaffed to start with. Thinking it was more manufacturing based than educational based. Did not correlate the massive amounts of home forclosures tied to taxes which support the educational world. (90% comes from state taxes). Did not think about the companies and how they pay taxes on profits. Once again, companies all downsizing as there are no profits, hence no taxes paid by companies either. Here is a local story I found online today as well. http://www.waow.com/Global/Story.asp?S=9442775 I heard more about things like this on the news last night. (Keep in mind Wisconsin is only 1 state among many we are infiltrated with). Seems like every night I turn on the news, there is a new local company cutting jobs. Listen to the radio and massive layoffs for other companies forcasting billion dollar losses. Shutting down stores, factories.
I hope our time together would be classified as one of the best experiences in our lives. I hope we have allowed you to become better programmers than when you first started based on the programs you have developed. I hope Suresh and myself will be remembered as one of the best managers you have ever had the ability to work for. Perhaps this is not true with the most recent of events, but I am SO PROUD of the work you have done. As a team, you guys have shown so much growth thru our time together.
Anil, you have been a rock on this team with your programming practices and delivery builds. I think your fundamentals have improved greatly as well as your ability to take complex concepts and ideas (Program Selection on CMT, SystemInfo.dll, Object Oriented Class level project builds) and bring them to life. There is no doubt in my mind that you will go on to bigger and better things in the programming world. You have what it takes to be a great programmer in this world.
Its hard to talk about success when external factors force our working relationships to be closed and terminated. I will never consider the time we spent working together anything less than professional, productive and honorable. I was lucky to be able to work with a staff as great as what you have been. I really do hope you stay in touch and keep me up to date on personal as well as professional experiences in your lives. This is a sad day for Renaissance Learning and one I never want to be part of again. Several at RLI home offices have also been let go and over the next few months, more could happen. Its a scary time that nobody wants any part of. I hear from my grandparents about the Great Depression and how bad it was. Its a different time now, yet watching so many people lose their jobs, even outside of RLI and surrounding communities makes me fear that it will be a long and difficult journey back to where we need to be. I just hope we remember what it took to get us there and all the people that were sacrificed in the process.
Anil, if you ever need a reference, don't be afraid to email me. Currently I check my work email more often than my personal one, but as we have seen, things can change way to quick. So stay in touch to both accounts.
Thank you for everything you have developed for us as its been more than I could have asked for.
Tuesday, January 27, 2009
Restore a SQL Server 2008 Database into SQL Server 2005
Problem
Transferring databases from SQL Server 2008 to SQL Server 2005 is not possible with standard backup and restore facilities.
Solution
Leverage the scripting wizard in SQL Server 2008 to script data as well as schemas in SQL Server 2005 compatibility mode.
Using the "Generate SQL Server Scripts" Wizard
The Northwind database is no longer shipped as part of the SQL Server installation, but you can download it from go.microsoft.com. The data is scripted as INSERT statements.
To create the scripts, you have to run the "Generate SQL Server Scripts" wizard, which you can run within SQL Server Management Studio (once Object Explorer is connected to the appropriate instance) by right clicking on the database and selecting "Tasks –> Generate Scripts."
1. Click "Script all objects in the selected database" (see Figure 2), and then click "Next."
2. Amend the script options: Specifically, set "Script for Server Version" to "SQL Server 2005" and set "Script Data" to "True" (see Figure 3). (SQL Server 2000 is also supported.) If you are putting the database on a new instance for the first time, make sure the "Script Database Create" option is set to "True." Click "Next" when you are happy with the options.
3. Select "Script to file," select the file name, and choose "Single file" and Click "Next" for a summary
4. Now click on "Finish" to get progress messages while the script runs and completes.
If the generation process fails, then you can use the "Report" option to see why.
5. When the scripting is completed, look for the following lines:
CREATE DATABASE [Northwind] ON PRIMARY
(NAME = N'Northwind', FILENAME =
N'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\DATA\northwnd.mdf' ,
SIZE = 3328KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
LOG ON
(NAME = N'Northwind_log', FILENAME =
N'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\DATA\northwnd.ldf' ,
SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
You will need to amend the paths to a valid path for the computer on which you are running. You also will need to comment out the following lines like this:
--EXEC sys.sp_db_vardecimal_storage_format N'Northwind', N'ON'
--GO
Once you have made these changes, you can run the script in SQL Server 2005 Management Studio to recreate the database in your development environment. You can now test data against SQL Server 2008 and SQL Server 2005.
The Inevitable Limitations
Of course, this technique is not without its limitations. Here are a few to bear in mind:
* The data is insecure, as it is in clear readable text. So if you are using real data, you should delete the file created once you have loaded it into SQL Server 2005. You can regenerate the file from the SQL 2008 backup, if necessary.
* If you have a database with a large amount of data, the script file, of course, will be huge.
* SQL Server 2008 specifics in the source database will not be migrated.
SQL Server 2008 Management Studio Trick
Then I saw that the bar above it was a moveable bar. So I moved it and saw this:
Whoa! So then I tried a table, HumanResources.Department from AdventureWorks2008:
Which caused me to check a procedure:
Each line has a little icon on the side that lets you copy it, line by line. It’s really just a way to display the basic properties about the object in a readily accessible format. It’s not going to change my life, or yours, but it sure is handy to know a quick way to access some basic information.
F4 is also a lot more handy in SSMS these days than it used to be. My favorite is when you hit F4 after clicking on a node in a graphical execution plan. But it works for several other things as well, including tables, procedures, etc. And doesn’t make you keep object explorer details open (which causes some problems of its own, at least pre-CU3).
SQL Server Query Execution Plan Analysis
The Query Optimizer will not always use parallelism, even though it potentially can. This is because the Query Optimizer takes a variety of different things into consideration before it decides to use parallelism. For example, how many active concurrent connections are there, how busy is the CPU, is there enough available memory to run parallel queries, how many rows are being processed, and what is the type of query being run? Once the Query Optimizer collects all the facts, then it decides if parallelism is best for this particular run of the query. You may find that one time a query runs without parallelism, but later, the same query runs again, but this time, parallelism is used.
In some cases, the overhead of using multiple processors is greater than the resource savings of using them. While the query processor does try to weigh the pros and cons of using a parallel query, it doesn't always guess correctly.
If you suspect that parallelism might be hurting the performance of a particular query, you can turn off parallelism for this particular query by using the OPTION (MAXDOP 1) hint.
The only way to know for sure is to test the query both ways, and see what happens. [7.0, 2000, 2005]
Wednesday, January 21, 2009
How can I confirm whether a SQL Server installation is 32 or x64 bit?
You can obtain the relevant information using a TSQL statement or referring to the SQL installation directory on that Windows server. Each of these options are outlined below:
Transact-SQL statement:
Execute select @@version from Query Editor or Query Analyzer and the results will be :
SQL 90
Microsoft SQL Server 2005 - 9.00.2153.00 (X64) May 9 2006 13:58:37 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 1)
Microsoft SQL Server 2005 - 9.00.2047.00 (Intel X86) Apr 14 2006 01:12:25 Copyright (c) 1988-2005 Microsoft CorporationStandard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)
SQL 80
Microsoft SQL Server 2000 - 8.00.2148 (Intel X86) Jul 7 2005 20:33:10 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
If the installation is 32 bit then as highlighted above you will see 'Intel X86', if not then you will see 'X64' after the SQL Server version number.
SQL Server Installation Directory:
The default installation directory for SQL Server is \Program Files\ and the same can be referenced as below on a x64 operating system:
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn
32-bit:
C:\Program Files (x86)\Microsoft SQL Server\MSSQL.1\MSSQL\Binn
For more information on the SQL Server 2005 x64 bit refer to
http://www.microsoft.com/sqlserver/2005/en/us/64-bit.aspx
How to import an Excel file which has columns with more than 255 characters to SQL Server using SSIS?
You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. By default it is 8 rows.
Uninstall and Remove Multiple Database Instances of Microsoft SQL Server 2005
When you run Add and Remove Programs or Programs and Features (for Windows Vista) in Control Panel, it actually executes the ARPWrapper.exe program that is installed by Microsoft SQL Server Setup Support Files component by using the /Remove option. However, when ARPWrapper.exe completes its removal task, the program also uninstall Microsoft SQL Server Setup Support Files component and hence removes itself or deletes the reference to the ARPWrapper.exe program. This is the cause that prevents administrators to run the uninstaller again to remove database instances or uninstall SQL Server 2005.
The resolution to remove multiple database engine instances from Microsoft SQL Server 2005 is to install ARPWrapper.exe again after removing each components. The following guide provides step-by-step instructions on how to uninstall stand-alone or multiple instance of SQL Server 2005 manually.
1. Login to the server with an administrator user account, or SQL Server 2005 service account.
2. Stop all active SQL Server services in Services applet in Control Panel as any active connections may prevent the uninstallation process from finishing successfully.
3. Go to Control Panel and then Add or Remove Programs applet.
4. Search and verify that Microsoft SQL Server Setup Support Files appears in the list of installed programs. If it’s found, proceed to next step. If Microsoft SQL Server Setup Support Files component is not listed as one of the installed program, install the component by running sqlsupport.msi file locates in Servers\setup\ folder of the original SQL Server 2005 installation media, or disc, folder and USB flash drive that contains SQL Server 2005 setup files.
5. Open command prompt (in Vista, use command prompt with administrator rights), and run the following command to uninstall the SQL Server components:
%ProgramFiles%\Microsoft SQL Server\90\Setup Bootstrap\ARPWrapper.exe /Remove
6. Uninstall the SQL Server instance components such as database engine one at a time.
Uninstall SQL Server 2005 Components
7. Once the removal completes, repeat all of the above steps again until all the SQL Server components are uninstalled.
8. Finally, uninstall the Microsoft SQL Server Setup Support Files component by using Add or Remove Programs (It should has been removed, but double check to ensure that).
Above guide assumes that you didn’t encounter any other problems, and simply want to get rid of all DB instances of the SQL Server 2005. If you have other issues, refer to KB909906 on how to manually uninstall SQL Server.
Tuesday, January 20, 2009
25 most dangerous programming errors
The impact of these errors is far reaching. Just two of them led to more than 1.5 million web site security breaches during 2008 - and those breaches cascaded onto the computers of people who visited those web sites, turning their computers into zombies.
The 25 errors, organized by type, are:
Insecure Interaction Between Components
* Improper Input Validation
* Improper Encoding or Escaping of Output
* Failure to Preserve SQL Query Structure
* Failure to Preserve Web Page Structure
* Failure to Preserve OS Command Structure
* Cleartext Transmission of Sensitive Information
* Cross-Site Request Forgery
* Race Condition
* Error Message Information Leak
Risky Resource Management
* Failure to Constrain Operations within the Bounds of a Memory Buffer
* External Control of Critical State Data
* External Control of File Name or Path
* Untrusted Search Path
* Failure to Control Generation of Code
* Download of Code Without Integrity Check
* Improper Resource Shutdown or Release
* Improper Initialization
* Incorrect Calculation
Porous Defenses
* Improper Access Control
* Use of a Broken or Risky Cryptographic Algorithm
* Hard-Coded Password
* Insecure Permission Assignment for Critical Resource
* Use of Insufficiently Random Values
* Execution with Unnecessary Privileges
* Client-Side Enforcement of Server-Side Security
Sunday, January 18, 2009
Seek or scan?
One of the more common reasons that this happens is because the index in question is not covering, and SQL has determined that the cost of doing the lookups to fetch the extra columns is higher than the cost of scanning the entire table.
If an index does not cover a query, then bookmark lookups are required to get the additional columns, bookmark lookups are run one row at a time, and are seeks on the clustered index. Hence it’s clear that bookmark lookups on a large number of rows are exceedingly expensive and that is why SQL will switch to a clustered index/table scan when lookups are required on a significant percentage of the rows in the table.
So, what consitutes a significant percentage of the rows in the table? 50%? 20%? 10%?
The answer’s often surprising. It’s under 1% of the total rows in the table.
Here’s some test code to demonstrate that.
Setup code:
# CREATE TABLE [dbo].[SeekOrScan](
# [ID] [int] IDENTITY(1,1) NOT NULL primary key,
# SomeNumber int,
# padding char(100)
# ) ON [PRIMARY]
#
# CREATE NonCLUSTERED INDEX [idx_SeekOrScan] ON [dbo].[SeekOrScan] (SomeNumber)
Test code:
1. insert into [SeekOrScan] (SomeNumber)
2. select top 1000000 0
3. from master..spt_values a cross join master..spt_values b
4. where a.name is null and b.name is null
5.
6. update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between
7. GO
8.
9. dbcc freeproccache
10.
11. select * from [SeekOrScan]
12. where somenumber between 1 and 100000 -- 10% of table
13. -- Clustered index scan
14.
15. dbcc freeproccache
16.
17. select * from [SeekOrScan]
18. where somenumber between 1 and 50000 -- 5% of table
19. -- Clustered index scan
20.
21. dbcc freeproccache
22.
23. select * from [SeekOrScan]
24. where somenumber between 1 and 10000 -- 1%
25. -- Clustered index scan
26.
27. dbcc freeproccache
28.
29. select * from [SeekOrScan]
30. where somenumber between 1 and 5000 -- 0.5% of table
31. -- clustered index scan
32.
33. dbcc freeproccache
34.
35. select * from [SeekOrScan]
36. where somenumber between 1 and 3000 -- 0.3% of table
37. -- nonclustered index seek
insert into [SeekOrScan] (SomeNumber) select top 1000000 0 from master..spt_values a cross join master..spt_values b where a.name is null and b.name is null update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between GO dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 100000 -- 10% of table -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 50000 -- 5% of table -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 10000 -- 1% -- Clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 5000 -- 0.5% of table -- clustered index scan dbcc freeproccache select * from [SeekOrScan] where somenumber between 1 and 3000 -- 0.3% of table -- nonclustered index seek
So somewhere around 0.4% of the table, the index seek with bookmark lookup became more expensive than a table scan.
Wednesday, January 14, 2009
Optimizing SQLServer 2008
Using parameterized queries is a well known SQL Server Best Practice. This technique ensures caching and reuse of existing query execution plans (instead of constantly compiling new plans), as well as avoiding SQL injection by mandating that input data be data type safe.
See more about SQL Server parameterization Best Practices here: http://blogs.msdn.com/sqlprogrammability/archive/2007/01/13/6-0-best-programming-practices.aspx
An application that I work with presented me with an interesting dilemma; It wanted to utilize the benefits of plan reuse but the parameter values that the application initially sends to SQL Server are not representative of the values passed in the subsequent re-execution of the statement. SQL Server compiled and cached a ‘good’ plan for the first parameter values. Unfortunately, this had the unintended side effect of caching a poor execution plan for all subsequent parameter values. To make this clearer let’s look at the following example query;
Select * from t where col1 > @P1 or col2 > @P2 order by col1;
Let’s assume for simplicities sake that col1 is unique and is ever increasing in value, col2 has 1000 distinct values and there are 10,000,000 rows in the table, and that the clustered index consists of col1, and a nonclustered index exists on col2.
Imagine the query execution plan created for the following initially passed parameters: @P1= 1 @P2=99
These values would result in an optimal queryplan for the following statement using the substituted parameters:
Select * from t where col1 > 1 or col2 > 99 order by col1;
Now, imagine the query execution plan if the initial parameter values were: @P1 = 6,000,000 and @P2 = 550.
As before, an optimal queryplan would be created after substituting the passed parameters:
Select * from t where col1 > 6000000 or col2 > 550 order by col1;
These two identical parameterized SQL Statements would potentially create and cache very different execution plans due to the difference of the initially passed parameter values. However, since SQL Server only caches one execution plan per query, chances are very high that in the first case the query execution plan will utilize a clustered index scan because of the ‘col1 > 1’ parameter substitution. Whereas, in the second case a query execution plan using index seek would most likely be created.
Unfortunately if the initial parameter values are similar to the first example above, then a ‘table scan’ execution plan gets created and cached, even though most of the following queries would rather use a plan that contains the index seek.
There are a number of ways to work-around this issue;
· Recompile every time the query is executed using the RECOMPILE hint - This can be very CPU intensive and effectively eliminates the benefits of caching queryplans.
· Unparameterize the query – Not a viable option in most cases due to SQL injection risk.
· Hint with specific parameters using the OPTIMIZE FOR hint (However, what value(s) should the app developer use?) This is a great option if the values in the rows are static, that is; not growing in number, etc. – However in my case the rows were not static.
· Forcing the use of a specific index
· Use a plan guide – Using any of the recommendations above.
SQL Server 2008 provides another alternative: OPTIMIZE FOR UNKNOWN
SQL Server 2008 provides a different alternative; the OPTIMIZE FOR UNKNOWN optimizer hint. This hint directs the query optimizer to use the standard algorithms it has always used if no parameters values had been passed to the query at all. In this case the optimizer will look at all available statistical data to reach a determination of what the values of the local variables used to generate the queryplan should be, instead of looking at the specific parameter values that were passed to the query by the application.
Full documentation of optimizer hints can be found here:
http://msdn.microsoft.com/en-us/library/ms181714(SQL.100).aspx
Example:
@p1=1, @p2=9998,
Select * from t where col > @p1 or col2 > @p2 order by col1
option (OPTIMIZE FOR (@p1 UNKNOWN, @p2 UNKNOWN))
Using this new optimizer hint option has allowed the ISV to generate queries that result in the benefits of parameterization; such as plan reuse, while eliminating the problems caused by the caching of queryplans that were created using nontypical initially passed parameter values.
NOTE: This new optimizer hint option, like all optimizer hints, should be used only by experienced developers and database administrators in cases where SQL Server cannot create an optimal plan.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 4 Comments
Friday, October 31, 2008 1:49 PM
SQL Server 2008 : new binary – hex string conversion functionality can dramatically improve related query performance by orders of magnitude.
In previous SQL Server releases it wasn’t possible to convert binary data to string characters in hex format directly, because SQL Server did not have a built-in Transact-SQL command for converting binary data to a hexadecimal string. The Transact-SQL CONVERT command converted binary data to character data in a one byte to one character fashion. SQL Server would take each byte of the source binary data, convert it to an integer value, and then uses that integer value as the ASCII value for the destination character data. This behavior applied to the binary, varbinary, and timestamp datatypes.
The only workarounds were to use either a stored procedure as described in a Knowledge Base Article: "INFO: Converting Binary Data to Hexadecimal String" ( http://support.microsoft.com/kb/104829 ) or by writing a CLR function.
An ISV I work with doesn’t support CLR and therefore they implemented their own version of a custom convert function in form of a stored procedure. This one was even faster than everything else they found on the Internet.
NEW – IN SQL SERVER 2008 the convert function was extended to support binary data – hex string conversion. It looks like a tiny improvement almost not worth mentioning.
However, for the ISV it was a big step forward as some critical queries need this functionality. Besides the fact that they no longer have to ship and maintain their own stored procedure, a simple repro showed a tremendous performance improvement.
Repro:
=====
I transformed the procedure described in the KB article mentioned above into a simple function. The stored procedure below will create a simple test table with one varbinary column and insert some test rows in 10K packages ( e.g. nr_rows = 100 -> 1 million rows in the table ).
The repro shows two different test cases:
1. insert 0x0 two million times
2. insert 0x0123456789A12345 two million times
Depending on the length of the value the disadvantage of the stored procedure solution will be even bigger. On my test machine the results of the test queries below were:
(both tests were done with the same SQL Server 2008 instance - no change of any settings)
1. two million times value 0x0
a, using stored procedure : about 3460 logical reads, no disk IO, ~52 secs elapsed time
b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time
2. two million times value 0x0123456789A12345
a, using stored procedure : about 3460 logical reads, no disk IO, ~157 secs elapsed time
b, using new convert feature : about 5200 logical reads, no disk IO, < 1 sec elapsed time
Repro Script:
========
create function sp_hexadecimal ( @binvalue varbinary(255) )
returns varchar(255)
as
begin
declare @charvalue varchar(255)
declare @i int
declare @length int
declare @hexstring char(16)
select @charvalue = '0x'
select @i = 1
select @length = datalength(@binvalue)
select @hexstring = '0123456789abcdef'
while (@i <= @length)
begin
declare @tempint int
declare @firstint int
declare @secondint int
select @tempint = convert(int, substring(@binvalue,@i,1))
select @firstint = floor(@tempint/16)
select @secondint = @tempint - (@firstint*16)
select @charvalue = @charvalue +
substring(@hexstring, @firstint+1, 1) +
substring(@hexstring, @secondint+1, 1)
select @i = @i + 1
end
return ( @charvalue )
end
create procedure cr_conv_test_table ( @value varbinary(16), @nr_rows int )
as
begin
declare @exist int
declare @counter int
set NOCOUNT ON
set statistics time off
set statistics io off
set statistics profile off
set @exist = ( select count(*) from sys.objects
where name = 'conv_test_table' and
type = 'U' )
if( @exist = 1 )
drop table conv_test_table
set @exist = ( select count(*) from sys.objects
where name = 'conv_test_table_temp' and
type = 'U' )
if( @exist = 1 )
drop table conv_test_table_temp
create table conv_test_table ( varbincol varbinary(16) )
create table conv_test_table_temp ( varbincol varbinary(16) )
set @counter = 10000
while @counter > 0
begin
insert into conv_test_table_temp values ( @value )
set @counter = @counter - 1
end
set @counter = @nr_rows
while @counter > 0
begin
insert into conv_test_table select * from conv_test_table_temp
set @counter = @counter - 1
end
end
-- create 2 million test rows
execute cr_conv_test_table 0x0, 200
set statistics time on
set statistics io on
-- compare runtime of stored procedure with new convert feature
select count(*) from conv_test_table
where dbo.sp_hexadecimal(varbincol) = '0x00'
select count(*) from conv_test_table
where CONVERT(varchar(255),varbincol,1) = '0x00'
-- create 2 million test rows
execute cr_conv_test_table 0x0123456789A12345, 200
set statistics time on
set statistics io on
-- compare runtime of stored procedure with new convert feature
select count(*) from conv_test_table
where dbo.sp_hexadecimal(varbincol) = '0x0123456789A12345'
select count(*) from conv_test_table
where CONVERT(varchar(255),varbincol,1) = '0x0123456789A12345'
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 0 Comments
Friday, August 22, 2008 1:54 PM
How to create an autonomous transaction in SQL Server 2008
I have been asked by many customers and partners, especially those migrating from Oracle, this question: how to create an autonomous transaction in SQL Server? It turns out to be a tricky thing to do since SQL Server doesn't have built-in autonomous transaction support like Oracle.
An Autonomous transaction is essentially a nested transaction where the inner transaction is not affected by the state of the outer transaction. In other words, you can leave the context of current transaction (outer transaction) and call another transaction (autonomous transaction). Once you finish work in the autonomous transaction, you can come back to continue on within current transaction. What is done in the autonomous transaction is truly DONE and won't be changed no matter what happens to the outer transaction. To make it easier to understand, here is an example of the described scenario.
BEGIN TRAN OuterTran
INSERT TABLE1
BEGIN “AUTONOMOUS” TRAN InnerTran
INSERT TABLE2
COMMIT “AUTONOMOUS” TRAN InnerTran
ROLLBACK TRAN OuterTran
The above pseudo script is meant to preserve result of “INSERT TABLE2”. In SQL Server 2008 or prior versions, "ROLLBACK TRAN" would always rollback all inner transactions to the outermost "BEGIN TRAN" statement (without specifiying savepoint). So the "InnerTran" transaction would be rolled back as well, which is not the desired behavior for the particular scenario.
You could wonder why we need an autonomous transaction in the first place. Why can't we just implement two separate transactions so they don't interfere with each other? There are scenarios where people do need logic structured like this. Logging errors in database is one of the most common scenarios. Below is a TSQL script demonstrating a nested transaction where the inner transaction attempts to save the runtime errors in a table.
USE TEMPDB
GO
CREATE TABLE ErrorLogging (logTime DATETIME, msg VARCHAR(255))
CREATE TABLE TestAT (id INT PRIMARY KEY)
GO
CREATE PROCEDURE usp_ErrorLogging
@errNumber INT
AS
INSERT INTO ErrorLogging VALUES (GETDATE(), 'Error ' + CAST(@errNumber AS VARCHAR(8)) +' occurred.')
GO
DECLARE @ERROR AS INT
INSERT INTO TestAT VALUES (1)
BEGIN TRAN OuterTran
INSERT INTO TestAT VALUES (1) -- This will raise primary key constraint violation error
SELECT @ERROR = @@ERROR
IF @ERROR <> 0
BEGIN
BEGIN TRAN InnerTran
EXEC usp_ErrorLogging @ERROR
COMMIT TRAN InnerTran
ROLLBACK TRAN OuterTran
END
IF @@TRANCOUNT > 0
COMMIT TRAN OuterTran
GO
SELECT * FROM TestAT
SELECT * FROM ErrorLogging
GO
If you run above script against SQL Server, you would see no error message recorded in table "ErrorLogging" due to the "ROLLBACK TRAN OuterTran" statement. So, how can we make it work?
In SQL Server 2008, you can implement a loopback linked server to achieve the same goal. For more information about loopback linked server, check Books Online for details (http://msdn.microsoft.com/en-us/library/ms188716.aspx).
USE MASTER
GO
EXEC sp_addlinkedserver @server = N'loopback',@srvproduct = N' ',@provider = N'SQLNCLI', @datasrc = @@SERVERNAME
GO
EXEC sp_serveroption loopback,N'remote proc transaction promotion','FALSE'
Go
Note 'remote proc transaction promotion' is a new option on SQL Server 2008, which allows you to control whether or not you want to enlist remote stored procedure call in a distributed transaction. When this option is off (FALSE) as we set in the above example, the local transaction will not be promoted to distributed transaction. This is how we are able to separate outer and inner transactions in a "autonomous transaction" fashion.
The Inner transaction above can be replaced by:
BEGIN TRAN InnerTran
EXEC loopback.tempdb.dbo.usp_ErrorLogging @ERROR
COMMIT TRAN InnerTran
Full working script is in the appendix below. I want to point out that this method of using a loopback linked server might not scale well if it's executed very frequently. And it only works in SQL Server 2008 due to new server option of 'remote proc transaction promotion' as discussed above. As always, test before you use it.
If you are looking for alternative ways of creating autonomous transaction on SQL 2008 or 2005, you have these options:
Loopback connection from SQLCLR procedure to start a new transaction. Compared to more rigid structure need of loopback linked server, SQLCLR is more flexible and gives you more control over how you want to handle interaction with database. If the logic of the autonomous transaction includes computational tasks, it's one of SQLCLR's strengths to provide performance gain as extra benefit.
Using table variable to save data within transaction. Table variables are not affected by transaction rollback thus serve as temporary buffer for transaction data. Once transaction is done, you can dump data out of table variable to a permanent table. Table variables have limited scope and are less flexible. Usually they would also be slower due to lack of index/statistics. However, it does offer you a pure TSQL option with no need to create anything new.
Loopback connection from Extended Stored Procedures.
Extended Stored Procedure are on the SQL Server deprecation list and we strongly recommend NOT using it.
In a future blog, I'll provide sample SQLCLR code and a script using a table variable to create autonomous transactions. I will also compare their performance differences with loopback linked server in a scalability test. Stay tuned.
Appendix
USE MASTER
GO
EXEC sp_addlinkedserver @server = N'loopback',@srvproduct = N' ',@provider = N'SQLNCLI', @datasrc = @@SERVERNAME
GO
EXEC sp_serveroption loopback,N'remote proc transaction promotion','FALSE'
EXEC sp_serveroption loopback,N'RPC OUT','TRUE' -- Enable RPC to the given server.
Go
USE TEMPDB
GO
CREATE TABLE ErrorLogging (logTime DATETIME, msg VARCHAR(255))
CREATE TABLE TestAT (id INT PRIMARY KEY)
GO
CREATE PROCEDURE usp_ErrorLogging
@errNumber INT
AS
INSERT INTO ErrorLogging VALUES (GETDATE(), 'Error ' + CAST(@errNumber AS VARCHAR(8)) +' occurred.')
GO
DECLARE @ERROR AS INT
INSERT INTO TestAT VALUES (1)
BEGIN TRAN OuterTran
INSERT INTO TestAT VALUES (1) -- This will raise primary key constraint violation error
SELECT @ERROR = @@ERROR
IF @ERROR <> 0
BEGIN
BEGIN TRAN InnerTran
EXEC loopback.tempdb.dbo.usp_ErrorLogging @ERROR
COMMIT TRAN InnerTran
ROLLBACK TRAN OuterTran
END
IF @@TRANCOUNT > 0
COMMIT TRAN OuterTran
GO
SELECT * FROM TestAT
SELECT * FROM ErrorLogging
GO Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Friday, July 11, 2008 1:51 PM
UPDATE with OUTPUT clause – Triggers – and SQLMoreResults
NOTE: the code in this BLOG is TSQL instead of ODBC calls. Since ODBC can be hard to understand and other API’s will have the same basic issues, I decided to use the simpler and more concise TSQL, which should also appeal to a wider audience.
An ISV I work with recently ran into an interesting problem; here is the description and solution.
PROBLEM:
Adding an unexpected trigger caused application code to fail due to incomplete SQL Syntax, and not reading through all returned results.
The ISV wanted to utilize the OUTPUT Clause of the UPDATE statement in their ODBC (SNAC) based application. The OUTPUT clause is very useful in providing data back to the application regarding the row, or rows, which were updated (or: inserted / deleted). In the example I use below, the application is interested in knowing the date/time of the updated row(s).
This could be accomplished by issuing the following statement:
UPDATE T SET COL2 = @Pcol2, COL3 = getdate() OUTPUT CAST(INSERTED.COL3 AS varchar(30))WHERE COL1 = @Pcol1
The ISV coded up the application expecting a return value for number of rows affected, and if that value was greater than 0 then it also returned the value of the inserted date/time.
This worked well, until an external Partner application added a trigger to the table listed in the UPDATE statement.
Example: CREATE TRIGGER [dbo].[TTrigger1] on [dbo].[T] after update as update t2 set col3 = 0
Now the application failed on the UPDATE statement with the following error message:
[Microsoft][SQL Native Client][SQL Server]The target table 'T' of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause.
The error message is self-explanatory, but was a surprise to the ISV application (and the application developer). The developer did not expect a trigger to ever be created on the table.
There are two different methods of getting OUTPUT data from an UPDATE statement;
· UPDATE with the OUTPUT clause only – this returns output results directly as part of the statement. This option cannot have a trigger defined on the table.
· UPDATE with OUTPUT and INTO clauses – this returns the output a specific table, or table variable. This option must be used if there is any possibility the table will have a trigger on it at any point.
· See the following website for complete the OUTPUT Clause documentation:
http://msdn.microsoft.com/en-us/library/ms177564.aspx
The developer then utilized the following syntax to send the same statement to SQL Server, and also to get the expected result back: declare @p165 table (col2 varchar(30));UPDATE T SET COL2 = ?, COL3 = getdate() OUTPUT CAST(INSERTED.COL3 AS varchar(30)) into @p165 WHERE COL1 = 1;select * from @p165
Now a subtlety occurred, can you guess what it was? If you guessed that additional results are returned you are correct.
The ODBC code returned data in a loop utilizing the following API calls: SQLFetch, SQLNumResultCols, SQLRowCount, SQLMoreResults:
· The first results returned were the number of rows affected by the trigger, not the number of rows affected by the UPDATE statement, which was what the application was actually expecting
· The second set of results were the number of rows affected by the UPDATE statement
· The third set of results were the number of rows returned by the SELECT statement reading the table variable
· And finally, the actual data from the updated row(s) – which is what we really wanted in the first place!
So, the lessons to be learned here are:
1. Be aware that triggers will affect your UPDATE statements if utilizing the OUTPUT clause
2. You should utilize the INTO clause to avoid the issue
3. Always use SQLMoreResults to read all of the result-sets that could be returned from SELECT, UPDATE, INSERT, or DELETE statements.
4. Triggers should include the ‘SET NOCOUNT ON’ statement to avoid returning the ‘affected number of rows’.
SOLUTION:
The application was changed to utilize the INTO clause, and SQLMoreResults was used to return all the resulting data. Using SET NOCOUNT ON in trigger logic is also a best practice that prevents additional results ‘Rows affected’ from being generated.
Here is a script to duplicate the issues I’ve described:
USE tempdb
GO
------You may want to run this script in steps from comment – to comment
------so you can follow along, instead of running the entire script at once
CREATE TABLE t(
[col1] [int] NOT NULL,
[col2] [varchar](30) NULL,
[col3] [datetime] NULL
) ON [PRIMARY]
GO
insert into t values (1,'abc', getdate())
insert into t values (1,'abc', getdate())
insert into t values (1,'abc', getdate())
GO
select * from t
GO
UPDATE t SET col2 = 'Peter', col3 = getdate()
OUTPUT CAST(INSERTED.col3 AS varchar(30))WHERE col1 = 1
GO
select * from t
GO
------So far everything is good, Now let’s add the new table and the trigger
CREATE TABLE t2(
[col1] [int] NULL,
[col2] [datetime] NULL
) ON [PRIMARY]
GO
insert into t2 values (2, getdate())
insert into t2 values (2, getdate())
GO
select * from t2
GO
------In this example, the trigger: ttr1 will update the rows
------of a second table: t2
CREATE TRIGGER ttr1 on t after update as update t2 set col1 = 0
GO
------OK, let’s try now with the trigger on
UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))WHERE col1 = 1
GO
------Chances are good you got the following error message
--Msg 334, Level 16, State 1, Line 1
--The target table 't' of the DML statement cannot have any enabled triggers --if the statement contains an OUTPUT clause without INTO clause.
----- let’s fix that now.
declare @p1 varchar(30)
UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))into @p1 WHERE col1 = 1
GO
------Notice this failed as well with the following error message
--Msg 1087, Level 16, State 1, Line 2
--Must declare the table variable "@p1".
------We need to use a table
------for this to work correctly we must use a table or
------a table variable where the ‘INTO’ data will reside,
------and be retrieved from
declare @p1 table (col2 varchar(30))
UPDATE t SET col2 = 'Peter', col3 = getdate() OUTPUT CAST(INSERTED.col3 AS varchar(30))into @p1 WHERE col1 = 1
select * from @p1
--Now you get what we were originally looking for
-- the date/times of the rows that were updated
--Look at the results under the 'Messages' tab as well...
--you will see the number of rows affected:
-- 2 for the rows inserted as part of the trigger
-- 3 for the rows Updated
-- and 3 for the rows we selected from the table variable
--Now, you can see that the application must utilize SQLMoreResults if it
--wants to return all the valid results.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Thursday, June 26, 2008 10:13 AM
Use SQL Server replay tools to reproduce and resolve customer issues
For many ISVs run that into issues at customer sites, it is sometimes difficult to isolate underlying problems, especially on a 24x7 production environment, where limitations apply to real time troubleshooting and live debugging. In situations like this, constructing a repro scenario in a separate environment would be ideal to minimize impact to live production system, and to speed up resolution process.
SQL Server Profiler
Allow me introduce SQL Profiler, which offers replay trace function. Well, it’s not something new. First shipped in SQL Server 7.0, the feature has gone through many improvements in later releases. You can use the tool to take a captured trace as input and replay it against test database(s). It helps identify issues that could be reproduced by replaying the events in the trace. Profiler itself uses ODBC only. In SQL Server 2005 and 2008, the replay function can be configured to use multiple threads (up to 64) to replay workloads.
Advantages:
1. SQL Server profiler is a built-in tool with full support of Microsoft product team. It works out of box.
2. Easy to set up and run. Capture a trace using predefined replay template with all required events, and replay it against original database(s) (target machine needs to meet certain requirements http://msdn2.microsoft.com/en-us/library/ms175525.aspx)
3. In addition to multi-threaded replay, it also provides option of step through to replay events in the order they were traced.
Disadvantages:
1. Certain events can’t be replayed including replication, events involving GUID, session binding events, operations on Blobs using bcp, full-text, READTEXT, WRITETEXT, and etc. See BOL for more details (http://msdn2.microsoft.com/en-us/library/ms188238.aspx)
2. The tool does not support multiple machine replay (running multiple instances of Profiler from multiple machines to replay the trace).
3. Profiler GUI tool is client side tracing and might be intrusive and generate significant performance overhead when capturing events for replay. Be careful of what events to capture and consider using server side tracing (sp_trace_create).
RML Utilities
Starting in SQL Server 2000, SQL Server Customer Support Services team (CSS) started a project of similar tool, called Ostress, with higher flexibility and scalability to help troubleshoot some of the challenging SQL problems. The latest version is packaged in “Replay Markup Language(RML) Utilities” supporting both SQL 2005 and SQL 2000 (http://support.microsoft.com/kb/944837). The tool can replay multi-threaded events as profiler does but with multiple machine replay support. It can simulate up to 1000 concurrent threads. The tool has a component called Ostress (just like old version), which takes a TSQL batch file and “stress” test it by opening arbitrary number of connections and iterate the TSQL batch in each connection configurable number of loops. This is useful when workload can be characterized as same or similar batch from various number of users (connections).
Advantages:
1. The tool offers both replay and stress test options.
2. It supports multiple machine replay (multiple instances of OStress) with up to 1000 concurrent threads.
3. OStress supports 7.0, 2000, and 2005 trace formats.
Disadvantages:
1. The tool is provided as is, no technical support from Microsoft. But you can submit questions via contact in readme of the tool.
2. Requires extra steps to process trace file and convert to RML format before being replayed.
3. Does not support MARS replay.
Recommendation
When to use SQL profiler and when to use RML Utilities? If you have a workload that can be replayed/reproduced with no or low concurrency requirement (<64> 64 threads) or an isolated batch that can be “stress” tested for simulation, use RML Utilities. Keep in mind, for concurrency replay, full sync of ordered events is very hard to replay and no tools exist today to exactly duplicate the original trace. So the issues that happened on traced source server might not be reproduced consistently afterwards even on same environment.
Both tools above are for database replay. For a simulation test of multi-tier application environment, consider load-test tool of Visual Studio (Team edition) or 3rd-party vendor products.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Tuesday, May 27, 2008 10:49 AM
SQL Server Intermittent Connectivity Issue
Recently many customers of an ISV I work with, reported intermittent connectivity issues when running the ISV application on SQL Server. Some customers reported the issue to be SQL Server 2005 specific. Others stated that they are experiencing the same issue on both SQL Server 2000 and 2005. Due to the intermittent nature, and the variation of the issue, it took us quite a while to collect all the data, (odbc trace, netmon trace, sql trace…), analyse it, and understand the exact cause.
SynAttackProtect
The first issue we found was a subtle Winsock behavior change in Window 2003 SP1. Windows 2003 SP1 introduces a configurable registry setting: SynAttackProtect, that protects the server from network Denial-Of-Service attacks. By default the protection is on. In a SQL Server environment, when the number of simultaneous client connection requests is more than the system can handle and SQL Server backlog queue is full, the client will receive a 'connection failed' error from SQL Server:
TCP Provider: An existing connection was forcibly closed by the remote host
The SQL Protocols team has a good Blog that explains the detailed interaction between SynAttackProtect setting and SQL Server. See http://blogs.msdn.com/sql_protocols/archive/2006/04/12/574608.aspx.
In Windows 2003, this issue could be worked-around by configuring the registry setting to disable SynAttackProtect.
1) Launch regedit.exe
2) Add DWORD value named SynAttackProtect under registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
3) Set data value to 0
See http://technet2.microsoft.com/windowsserver/en/library/8d3a9f4d-13d1-4280-ac57-30242504d8ba1033.mspx?mfr=true for additional information.
After applying the registry change, two customers reported the intermittent connection issue went away. Both customers had been stress-testing SQL Server. SynAttackProtect is more likely to become an issue in a lab environment where SQL Server is being stress-tested with extreme load. Customers running stress-testing in lab environments should turn off SynAttackProtect. I am not sure I’d recommend proactively turning it off in a production system given the potential security risk. If a production system ever runs into the SynAttackProtect issue, the where/why of the large number of connection requests should be examined first.
Windows “Scalable Networking Pack”
Windows Scalable Networking Pack was a second network stack change that was released as part of Windows 2003 SP1 + KB91222, or Windows 2003 SP2. With the Scalable Networking Pack, the TCP Chimney Offload feature is enabled by default to increase performance. However implementations on certain network cards are problematic when TCP Chimney Offload enabled, and can cause intermittent connection drop. When the connection is dropped due to incompatibility between the network card and Windows Scalable Networking Pack, typical error message is
[08S01] [Microsoft][SQL Native Client]Communication link failure
A workaround for this issue could be to disable to TCP Chimney Offload feature.
1) Launch regedit.exe
2) Edit DWORD EnableTCPChimney under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
3) Edit DWORD EnableRSS under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
4) Edit DWORD EnableTCPA under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
5) Restart the server
Amongst others, network cards using Broadcom 5708 chipsets are known to have compatibility issues w/ TCP Chimney Offload. Disabling TCP Chimney Offload would fix the incompatibility issue. However it would also remove the benefit of Windows TCP stack performance improvements. It should only be used as a workaround until a fix becomes available from the network card vendor.
See support article http://support.microsoft.com/kb/942861 for additional information.
By now, the majority of customers solved the intermittent connectivity issue after applying the SynAttackProtect and/or TCP Chimney Offload changes. Some customers connecting to SQL Server through Citrix have to turn off the TCP Chimney Offload feature on the Citrix server to fix the issue.
Query Timeout
The particular ISV application runs with a configurable Query timeout, by default the timeout is set to 1 second. When a query timeout happens, the application will retry the query for 16 times, if it still fails, the query will be submitted again with nolock hint. How would that affect the connection? At the first glance, it seems to be irrelevant, but it does. Here’s the sequence of what could happen.
1) The query is submitted to SNAC for execution
2) The execution of the query takes too long so the timeout expires
3) Once the timeout happens, client attempts to cancel the query and sends an ATTN packet to the server. After sending ATTN, the client then waits for the response from the server. A timeout is set for this wait, the value of the timeout is obtained through SQL_ATTR_CONNECT_TIMEOUT and if not set, the default to 120 seconds.
4) The wait for server to response also times out. This is treated as an error from the client and basically the connection is considered dead in such scenarios
5) The client marks the connection as dead and then returns the “Query timeout expired” error message.
6) The application code, on seeing a timeout expired message attempts to execute the request again on the same connection, but immediately hits the “communication link failure” message because the connection is deemed as dead by the client
During the investigation, we discovered a regression in SQL Server 2005. SQL Server 2005 may not response to a query cancel (timeout) request in a timely manner if the query requires index scan through a large number of pages. Checking for any attention requests is delayed when the SQL Server storage engine is busy bringing pages in from disk. Resulting in
[08S01] [Microsoft][SQL Native Client]Communication link failure
A hotfix is available to fix the regression, see details in
http://support.microsoft.com/kb/945442.
All the remaining customers had some long running batch queries that index scan a large table. This is just the last piece we needed to complete the puzzle. After applying the hotfix, all customers reported the problem solved.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Tuesday, March 18, 2008 4:25 PM
Using time zone data in SQL Server 2008
In SQL Server 2008 Microsoft has introduced a number of new date and time data types. One of these is the datetimeoffset data type. This data type includes an offset from UTC time as well as the datetime value and ensures that the datetime can be retrieved in UTC or a particular timezone based on this offset. There are also new functions to allow for conversions between different time zones using the new function SWITCHOFFSET().
An example from SQL Server 2008 Books On Line (BOL):
CREATE TABLE dbo.test
(
ColDatetimeoffset datetimeoffset
);
GO
INSERT INTO dbo.test
VALUES ('1998-09-20 7:45:50.71345 -5:00');
GO
SELECT SWITCHOFFSET (ColDatetimeoffset, '-08:00')
FROM dbo.test;
GO
--Returns: 1998-09-20 04:45:50.7134500 -08:00
SELECT ColDatetimeoffset
FROM dbo.test;
--Returns: 1998-09-20 07:45:50.7134500 -05:00
One of the most common questions we are asked is why we use the offset and not a timezone name. A timezone name is much easier to remember than an offset, and most people do not know an offset without looking it up, making queries more difficult.
Unfortunately, there is no current international standard authority for timezone names and values. Each system needs to use a system of their own choosing, and until there is an international standard, it is not feasible to try to have SQL Server provide one, and would ultimately cause more problems than it would solve. However, there are a couple of common systems that are well recognized around the world. One is the Dynamic timezone data that is stored in the Windows Vista registry. This data can be read from the registry into a file, which is then imported into SQL Server.
Another cross-platform standard is the public domain Olson Timezone database (http://www.twinsun.com/tz/tz-link.htm). There are many public domain programs for extracting a time zone from these files, but at this time most are programmatic solutions. So a programmatic function could be written in the CLR, but to provide full database functionality and query-ability, a table is needed.
The programmatic solutions take a date, then apply the many different rules that determine when a zone is in daylight savings time and when it is not. However, there are also historical changes. Daylight savings time changed in 2007, meaning that determining what the offset for a particular zone is at a particular time is different depending on the year. Then there are times when leap seconds need to be added. Therefore any data-driven solution must have rows that have valid time ranges as well.
The approach recommended here is to take one of the DLLs found on the web and instead of providing a programmatic solution around a specific date – to write all of the rows out into a database as an offset with the valid ranges. Currently this example uses the Windows standard naming conventions for timezones, with a mapping to the Olson timezone names, but you could easily add other names in other languages as well.
Working with timezones is very complex, and the following is a suggestion only for some ideas on how to use time zone data more effectively. This is an example program (no guarantees) that uses a .NET library and writes the data from the Olson tz files in table format, and which can then be imported into SQL Server. The .NET timezone library can be found at http://www.babiej.demon.nl/Tz4Net/main.htm and they request a small donation.
Here is some sample code to write to files the timezone data (no guarantees – does not include leap seconds):
StreamWriter sr = File.CreateText(@"D:\TZMapping.txt");
StreamWriter tr = File.CreateText(@"D:\TZZones.txt");
string[] zoneNames = OlsonTimeZone.AllNames;
sr.WriteLine("ID\tDaylightName\tStandardName\tRawUtcOffset\tOffsetSeconds\tWin32Id");
tr.WriteLine("ID\tTransitionStart\tTransitionEnd\tDeltaSeconds\tDST");
for (int i = 0; i < zoneNames.Length; i++)
{
OlsonTimeZone tz = OlsonTimeZone.GetInstanceFromOlsonName(zoneNames[i].ToString());
sr.Write(i.ToString() + "\t");
sr.Write(tz.DaylightName.Trim() + "\t");
sr.Write(tz.StandardName.Trim() + "\t");
sr.Write(tz.RawUtcOffset.ToString() + "\t");
sr.Write(tz.RawUtcOffset.TotalSeconds.ToString() + "\t");
sr.WriteLine(tz.Win32Id == null ? "" : tz.Win32Id.Trim());
DaylightTime[] times = tz.AllTimeChanges;
for (int j = 0; j < times.Length; j++)
{
tr.Write(i.ToString() + "\t");
tr.Write(times[j].Start.ToString("yyyy-MM-dd HH:mm:ss") + "\t");
tr.Write(times[j].End.ToString("yyyy-MM-dd HH:mm:ss") + "\t");
tr.Write(times[j] is StandardTime ? "0\t" :times[j].Delta.TotalSeconds.ToString() + "\t");
tr.WriteLine(times[j] is StandardTime ? false.ToString() : true.ToString() );
}
}
tr.WriteLine();
sr.WriteLine();
tr.Close();
sr.Close();
Import the TZMapping file, which will become the parent table, with the ID as the primary key. Your table structure might look like this:
Please note: If you use the Flat File Datasource in the Import Data Wizard in SQL Server 2008 Management Studio, you will need to open the Advanced Tab to set the source OutPutColumnWidth to greater than the default of 50. Then import the TZZones file, which will become the child table with the ID, TransitionStart, and TransitionEnd as the composite primary key with a foreign key reference to the TZMapping table. The TZZones table includes historical timezone data. Joining these new tables into your data into queries now allows for queries that include standard names, Windows IDs, etc.
For example, offsets can now be retrieved by a preferred name:
select UtcOffset from TZmapping where StandardName = 'US/Pacific (PST)'
The following two queries return different offset amounts for the same day in two different years. This is because the US changed daylight savings time, and the date in March now falls into daylight savings when it did not before.
select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600
from dbo.TZMapping
join dbo.TZZones
on dbo.TZMapping.id = dbo.TZZones.id
where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'
and '2006-03-15'
between dbo.TZZones.TransitionStart
and dbo.TZZones.TransitionEnd
select (dbo.TZMapping.OffsetSeconds + dbo.TZZones.DeltaSeconds)/3600
from dbo.TZMapping
join dbo.TZZones
on dbo.TZMapping.id = dbo.TZZones.id
where dbo.TZMapping.StandardName = 'America/Los_Angeles (PST)'
and '2007-03-15'
between dbo.TZZones.TransitionStart
and dbo.TZZones.TransitionEnd
Again, timezones are a complex area and each application will need to address how you are going to handle time zone data to make programs more user friendly. This is just one small example.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 5 Comments
Tuesday, March 18, 2008 3:49 PM
Increase your SQL Server performance by replacing cursors with set operations
You have probably heard many times, from different sources, that as a best practice; avoid using TSQL cursors.
During a recent visit to a partner we ran into a common cursor case, which I wanted to use as an example to demonstrate why you should avoid TSQL cursors in most cases, and how to convert cursor logic to simple set join operations. Now there are certain scenarios where using a cursor makes sense. For example, a cursor is ideal for row by row processing that can’t be accomplished by set based operations. A cursor is flexible in that it provides a window, or subset, of data and that allows manipulation of the data in various ways. Study carefully what you want to achieve on case by case basis before using a cursor. Keep in mind SQL Server, as a modern RDBMS system, performs much better with set operations.
Here is simplified version of a real cursor that was used to update a big table with over 200 million rows.
DECLARE @EntityId Varchar(16)
DECLARE @PerfId Varchar(16)
DECLARE @BaseId Varchar(16)
DECLARE @UpdateStatus Int
DECLARE outerCursor CURSOR FOR
SELECT EntityId, BaseId
FROM outerTable
--Returns 204,000 rows
OPEN outerCursor
FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE innerCursor CURSOR FOR
SELECT PRFMR_ID
FROM innerTable
WHERE ENTY_ID = @BaseId
OPEN innerCursor
FETCH NEXT FROM innerCursor INTO @PerfId
SET @UpdateStatus = @@FETCH_STATUS
WHILE @UpdateStatus = 0
BEGIN
UPDATE 200MilRowTable
SET ENTY_ID = @EntityId
WHERE PRFMR_ID = @PerfId
FETCH NEXT FROM innerCursor INTO @PerfId
SET @UpdateStatus = @@FETCH_STATUS
END
CLOSE innerCursor
DEALLOCATE innerCursor --clean up inner cursor
FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId
END
CLOSE outerCursor
DEALLOCATE outerCursor –cleanup outer cursor
You might notice that this is a nested cursor with 204,000 loops in total for outerCursor. The innerTable has 10 million rows but innerCursor varies in number of loops depending on @BaseId of outerCursor. When I arrived at the customer this cursor had been running for over a day. The developer was “hoping” that it would finish soon given another day or two. The problem was nobody knew for sure how much time this thing would need to complete. Well, we can find out how much progress it has made so far to make an educated guess:
SELECT execution_count, st.text
FROM sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
WHERE st.text like '%FETCH NEXT FROM outerCursor INTO%'
This would tell us how many times the outer cursor has looped already. It returned 107. That’s only around 0.5% (107/204,000) after 30 hours of running. If the trend were to continue, the cursor would need another 8+ months!!!
A rewrite of the batch to take advantage of set operations is as simple as this:
SELECT i.PRFMR_ID, o.EntityId INTO #tempTable
FROM innerTable i join outerTable o on i.ENTY_ID = o.BaseId
Go
UPDATE 200MilRowTable
SET m.ENTY_ID = t.EntityId
FROM 200MilRowTable m join #tempTable t on m.PRFMR_ID = t.PRFMR_ID
Go
--note this is only one of a few ways to rewrite.
In this particular case, “SELECT INTO” is minimally logged under simple recovery mode. The two statement approach makes it easier to understand the conversion logic.
This batch took approximately 17 hours to complete. Between the statement, I also put the database into simple recovery mode and added appropriate indexes to the temp table. I also dropped indexes from 200MilRowTable that touched “ENTY_ID” to speed this up. Adding indexes back took another 7 hours. The total time was approximately 24 hours, which is just a small fraction of the original cursor batch. I need to point out that the non-cursor batch uses more resources since the UPDATE now spawns multiple threads to process parallely. Remember our goal here is to make this finish faster not worrying about how much resources it consumes.
Note: this might not be a perfect example because the nested cursor is magnifying the slow performance.
However, the bottom line is; aviod cursors if possible and use joins / set operations whenever you can.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Thursday, March 06, 2008 11:33 AM
Appending Data Using SQL 2008 Filestream
SQL Server 2008 has a new feature called Filestream, which allows you to save large binary files in the file system, instead of in the database. This is targeted directly at the scenario that many document management and web applications have today where they save some metadata in the database, with a column holding the path to the actual file. However, there is no transactional context between the two; therefore, the data in the database can be changed to no longer point to the actual location, or the file moved without the pointer being updated. Users want to be able to have documents, pictures, video, etc. in the file system with the streaming support the file system provides, but have the metadata be consistent. This is exactly what the Filestream feature is targeted for.
When a large binary files gets ‘updated’ most users/applications will replace the file in its entirety (you do not usually update part of an image through the database). Filestream does not currently support in-place updates. Therefore an update to a column with the filestream attribute is implemented by creating a new zero-byte file, which then has the entire new data value written to it. When the update is committed, the file pointer is then changed to point to the new file, leaving the old file to be deleted at garbage collection time. This happens at a checkpoint for simple recovery, and at a backup or log backup.
Code that updates large value data types (varchar(max), nvarchar(max), and varbinary(max)) may use the .Write(offset) UPDATE clause; however since an in-place update of a column with the Filestream attribute is not supported, the .Write(offset) UPDATE clause will error out. Therefore, even though filestream is only a storage attribute, the update code does not work against both systems, requiring knowledge of underlying storage mechanism.
Many of the update actions taken against large data types are actually appends. And many logging scenarios continuously append to an ever-growing file. Filestream offers an option for this case, to avoid the scenario where the original data is pulled to the client, data appended, and then a new file written with the combined data – leaving the old file to be garbage collected later.
In order to avoid this round trip, a device FS control (FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) can be issued to trigger a server-side copy of the old content. This avoids the data moving over the network. This control is not yet documented; note that the DeviceIoControl ControlCode constant that is needed is: 599392. This will be added to the Filestream API documentation at a future date.
Example code:
[DllImport(“kernel32.dll”, SetLastError = true)]
Static extern bool DeviceIoControl(
IntPtr hDevice,
Uint dwIoControlCode,
IntPter lpInBuffer,
Uint ninBufferSize,
[out] IntPtr lpOutBuffer,
Uiny nOutBufferSize,
Ref uint lpBytesReturned,
IntPtr lpOverlapped );
IntPtr hFile = IntPtr.Zero;
Uint bytesReturned;
// hFile = mySafeHafle.DangereousGetHandle(); or something similar
Bool result = DeviceIoControl ( hFile, 599392, IntPtr.Zero, 0, IntPtr.Zero, 0, ref lpBytesReturned, InPtr.Zero );
As Filestream is minimally logged, if there is no activity other than filestream activity, there is usually not enough activity to trigger a checkpoint, and the old files will not get deleted. In non-simple recovery mode, the log will not increase, but the db will grow until those files get garbage collected. This will occur in a background thread, and can impact the server through this activity.
Therefore, it is a best practice to manually issue that checkpoint if in simple recovery mode; and to maintain an optimal transaction log backup to avoid the file growth.
Other quick notes:
Filestream deletes are significantly faster than blob deletes.
If an application needs to commonly delete large data values, filestream will be more scalable as the metadata is changed, then the file garbage collected asynchronously. For instance, the delete of a 6GB nvarbinary(max) file that took 6 seconds, happens in milliseconds with a filestream delete.
Filestream maximum size is different for SNAC 9 and SNAC 10.
Files using SNAC9 have a maximum size of 2GB. SNAC 10 supports unlimited size.
Note: Altering an existing column without a filestream attribute to having the filestream attribute is not supported.
This means that moving data from a blob storage mechanism to filestream mechanism requires that the data be copied to the new datatype with the filestream attribute.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Monday, February 25, 2008 10:57 AM
Now is time to try SQL Server 2008
Did you know that SQL Server 2008 is just around the corner, and that the SQL development team has just publicly released the latest Community Technology Preview (CTP 6) version?
This latest beta version of SQL 2008 includes many exciting new features like filtered indexes, sparse columns, data compression, Microsoft sync, integrated full text search, Report rendering in Microsoft Word, Data visualization enhancements, extensive Books Online documentation, the first feature pack, and many, many more.
Now is great time to try it for yourself! Download SQL Server 2008 here .
Also, the SQLCAT teams (including us on the SQLCAT ISV team) have been busy working with our customers and partners implementing pre-release (CTP) builds. We have many lessons learned to share with you; these can be found here
Additionally, we have a number of great best practices papers coming up:
· Index Defragmentation Best Practices
· Best Practices for using Resource Governor with a mixed (OLTP and Reporting) workload
· Data Compression: Design and Implementation Best Practices, Performance Impact
· XEvents Toolkit
· SQL 2008 Usage-Based Optimization Efficacy (subject to name change)
· Scale-Out Querying with Analysis Services 2008
· Best practices for Data Warehousing in SQL Server 2008
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Friday, June 29, 2007 8:56 AM
Detecting Overlapping Indexes in SQL Server 2005
When SQL Server has an optimal index that satisfies the search predicates of a query the optimizer performs an index SEEK operation as opposed to an index (or table) scan to retrieve the required rows; this is desirable. Based on this, one may be led to believe that having an index for every possible query predicate set would result in all the queries executing optimally. While true, one has to keep in mind that the indexes need to be maintained when the underlying table data in the column included in the index changes, which amounts to overhead for the database engine. So as you may guess, there are advantages of having indexes, but having too many can result in excessive overhead. This implies that you need to carefully evaluate the pros and cons before creating indexes.
In your first evaluation scenario you clearly want to avoid the case of having overlapping indexes as there is no additional value that an overlapping index provides. For example, consider a table ‘TabA’ and its three associated indexes created with the following definitions.
CREATE TABLE TabA
( Col1 INT, Col2 INT, Col3 INT, Col4 INT );
GO
CREATE INDEX idx1 ON TabA ( Col1, Col2, Col3 );
CREATE INDEX idx2 ON TabA ( Col1, Col2 );
CREATE INDEX idx3 ON TabA ( Col1 DESC, Col2 DESC );
GO
In the table structure above, the index idx1 is a superset (overlap) of the index idx2 and therefore redundant. As can be expected any query that needs to perform a search on Col1 and Col2 could use index idx1 just as well as the index idx2 as seen in the graphical query plan below.
Such overlapping indexes are often a result of multiple developers working on the same product and not evaluating and understanding the existing schema before making additions. Once created, detecting such overlapping indexes in a database can often be a laborious task requiring detailed analysis. More importantly, most DBAs do not like to disable or drop indexes because they are not certain of the queries they were created to help with and fear the side effects the action may have.
The script below uses the new system catalog introduced in SQL Server 2005 to report all duplicate indexes in the current database context.
CREATE FUNCTION dbo.INDEX_COL_PROPERTIES (@TabName nvarchar(128), @IndexId INT, @ColId INT)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @IsDescending INT;
SELECT @IsDescending = is_descending_key
FROM sys.index_columns SYSIDXCOLS
WHERE OBJECT_ID(@TabName) = SYSIDXCOLS.object_id
AND @IndexId = SYSIDXCOLS.index_id
AND @ColId = SYSIDXCOLS.key_ordinal;
-- Return the value of @IsDescending as the property
RETURN(@IsDescending);
END;
GO
-- Find Duplicate Indexes in SQL Server Database
CREATE VIEW IndexList_VW AS
SELECT
SYSSCH.[name] AS SchemaName,
SYSOBJ.[name] AS TableName,
SYSIDX.[name] AS IndexName,
SYSIDX.[is_unique] AS IndexIsUnique,
SYSIDX.[type_desc] AS IndexType,
SYSIDX.[is_disabled] AS IsDisabled,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 1 ) AS Column1,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 2 ) AS Column2,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 3 ) AS Column3,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 4 ) AS Column4,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 5 ) AS Column5,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 6 ) AS Column6,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 7 ) AS Column7,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 8 ) AS Column8,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 9 ) AS Column9,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 10 ) AS Column10,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 11 ) AS Column11,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 12 ) AS Column12,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 13 ) AS Column13,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 14 ) AS Column14,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 15 ) AS Column15,
INDEX_COL( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 16 ) AS Column16,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 1 ) AS Column1_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 2 ) AS Column2_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 3 ) AS Column3_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 4 ) AS Column4_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 5 ) AS Column5_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 6 ) AS Column6_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 7 ) AS Column7_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 8 ) AS Column8_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 9 ) AS Column9_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 10 ) AS Column10_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 11 ) AS Column11_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 12 ) AS Column12_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 13 ) AS Column13_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 14 ) AS Column14_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 15 ) AS Column15_Prop,
dbo.INDEX_COL_PROPERTIES( SCHEMA_NAME(SYSOBJ.SCHEMA_ID)+'.'+SYSOBJ.[name], SYSIDX.index_id, 16 ) AS Column16_Prop
FROM sys.indexes SYSIDX INNER JOIN sys.objects SYSOBJ ON SYSIDX.[object_id] = SYSOBJ.[object_id]
INNER JOIN sys.schemas SYSSCH ON SYSOBJ.schema_id = SYSSCH.schema_id
WHERE SYSIDX.index_id > 0
AND INDEXPROPERTY(SYSOBJ.[object_id], SYSIDX.[name], 'IsStatistics') = 0
GO
SELECT
vwA.SchemaName,
vwA.TableName,
vwA.IndexIsUnique,
vwA.IndexType,
vwA.IndexName,
vwB.IndexName AS OverlappingIndex,
vwA.Column1, vwA.Column2, vwA.Column3, vwA.Column4, vwA.Column5, vwA.Column6, vwA.Column7, vwA.Column8,
vwA.Column9, vwA.Column10, vwA.Column11, vwA.Column12, vwA.Column13, vwA.Column14, vwA.Column15, vwA.Column16
FROM IndexList_VW vwA INNER JOIN IndexList_VW vwB ON vwA.TableName=vwB.TableName
AND vwA.IndexName <> vwB.IndexName
AND vwA.IsDisabled = 0
AND vwB.IsDisabled = 0
AND (vwA.Column1=vwB.Column1 AND vwA.Column1_Prop=vwB.Column1_Prop)
AND ((vwA.Column2=vwB.Column2 AND vwA.Column2_Prop=vwB.Column2_Prop) OR vwA.Column2 IS NULL OR vwB.Column2 IS NULL)
AND ((vwA.Column3=vwB.Column3 AND vwA.Column3_Prop=vwB.Column3_Prop) OR vwA.Column3 IS NULL OR vwB.Column3 IS NULL)
AND ((vwA.Column4=vwB.Column4 AND vwA.Column4_Prop=vwB.Column4_Prop) OR vwA.Column4 IS NULL OR vwB.Column4 IS NULL)
AND ((vwA.Column5=vwB.Column5 AND vwA.Column5_Prop=vwB.Column5_Prop) OR vwA.Column5 IS NULL OR vwB.Column5 IS NULL)
AND ((vwA.Column6=vwB.Column6 AND vwA.Column6_Prop=vwB.Column6_Prop) OR vwA.Column6 IS NULL OR vwB.Column6 IS NULL)
AND ((vwA.Column7=vwB.Column7 AND vwA.Column7_Prop=vwB.Column7_Prop) OR vwA.Column7 IS NULL OR vwB.Column7 IS NULL)
AND ((vwA.Column8=vwB.Column8 AND vwA.Column8_Prop=vwB.Column8_Prop) OR vwA.Column8 IS NULL OR vwB.Column8 IS NULL)
AND ((vwA.Column9=vwB.Column9 AND vwA.Column9_Prop=vwB.Column9_Prop) OR vwA.Column9 IS NULL OR vwB.Column9 IS NULL)
AND ((vwA.Column10=vwB.Column10 AND vwA.Column10_Prop=vwB.Column10_Prop) OR vwA.Column10 IS NULL OR vwB.Column10 IS NULL)
AND ((vwA.Column11=vwB.Column11 AND vwA.Column11_Prop=vwB.Column11_Prop) OR vwA.Column11 IS NULL OR vwB.Column11 IS NULL)
AND ((vwA.Column12=vwB.Column12 AND vwA.Column12_Prop=vwB.Column12_Prop) OR vwA.Column12 IS NULL OR vwB.Column12 IS NULL)
AND ((vwA.Column13=vwB.Column13 AND vwA.Column13_Prop=vwB.Column13_Prop) OR vwA.Column13 IS NULL OR vwB.Column13 IS NULL)
AND ((vwA.Column14=vwB.Column14 AND vwA.Column14_Prop=vwB.Column14_Prop) OR vwA.Column14 IS NULL OR vwB.Column14 IS NULL)
AND ((vwA.Column15=vwB.Column15 AND vwA.Column15_Prop=vwB.Column15_Prop) OR vwA.Column15 IS NULL OR vwB.Column15 IS NULL)
AND ((vwA.Column16=vwB.Column16 AND vwA.Column16_Prop=vwB.Column16_Prop) OR vwA.Column16 IS NULL OR vwB.Column16 IS NULL)
ORDER BY
vwA.TableName, vwA.IndexName
GO
-- Drop function and view created above.
DROP FUNCTION dbo.INDEX_COL_PROPERTIES
GO
DROP VIEW IndexList_VW
GO
Executing the script in a test database with the table (TabA) mentioned above produces output similar to the following.
Indexes idx1 and idx2 are reported to have an overlap, and in this case the subset index that has fewer columns (idx2) can be disabled without any negative side-effects (provided there are no Index hints pointing to this index. This is explained in more detail below). This will result in the same performance for select statements but eliminate the overhead of the database engine having to maintain 2 indexes that serve the same general purpose. This is particularly important if the underlying columns (Col1 and Col2) are highly volatile.
A couple of additional points to keep in mind when using this script:
- Overlapping indexes are only those that have the columns in the same order. For example an index created on Col1, Col2, Col3 (in that order) does not overlap with an index created on Col2, Col1, Col3, even though the columns included are the same
- Indexes created on the same set of columns with the same column order, but with different sort order specifications (ASC, DESC) are distinct indexes and are not overlapping
- Indexes with included columns introduced in SQL Server 2005 are correctly reported as overlapping by the above script. For example, an index created as Col1, Col2 and Col3 as an included column overlaps with the index created on Col1, Col2, Col3. For overlapping indexes that involve included columns it is imperative to disable or drop the index with the higher number of included columns (refer to: http://msdn2.microsoft.com/en-us/library/ms188783(SQL.90).aspx for more information about indexes with included columns)
- Clustered indexes and non-clustered indexes are clearly marked in the script output. If a clustered index is reported as overlapped and you plan to drop it, you may want to take some additional steps to make sure that the overlapping index, or some other index, is created as clustered, because it is a SQL Server best practice to have a clustered index on every table (refer to: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/clusivsh.mspx for more information)
- It is recommended to either disable a duplicate index or save its definition before dropping it, in order to be able to recreate it if necessary
- Before disabling or dropping an index you should make sure that there are no index hints in your application that explicitly use the index being disabled, for example: SELECT * FROM TabA WITH (INDEX (idx2)) WHERE col1 = 28. Disabling or dropping an index that is explicitly used via an index hint will result in the query execution failing with an error message 308 or 315.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Tuesday, June 05, 2007 12:47 PM
New SQL Best Practice Articles now available
Please have a look at four new Best Practices Articles
SQL Server 2005 Predeployment I/O best practices http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx by SQLISVPM team member Mike Ruthruff
SQL Server 2005 Deployment Guidance for Web Hosting Environments http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/SQL2005DGWHE.mspxby Alex DeNeui
Resolving Common Connectivity Issues in SQL Server 2005 Analysis Services Connectivity Scenarioshttp://www.microsoft.com/technet/prodtechnol/sql/bestpractice/CISQL2005ASCS.mspxby Carl Rabeler
Implementing Application Failover with Database Mirroringhttp://www.microsoft.com/technet/prodtechnol/sql/bestpractice/implappfailover.mspxby Michael Thomassy, Sanjay Mishra
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 1 Comments
Monday, April 30, 2007 5:25 PM
Will 64-bit increase the performance of my SQL Server application?
With 64-bit servers becoming more mainstream and SQL Server customers running more deployments on native 64-bit Windows there is a common misperception that running on 64-bit will always provide an increase in performance over 32-bit. Many customers are surprised to find that some workloads actually run faster on 32-bit SQL Server than on the native 64-bit version. The assumption that 64-bit will always outperform 32-bit is not true and is determined largely by characteristics of the particular workload and hardware. Workloads that have a large volume of small uniform transactions using relatively simple query plans (OLTP-like in nature) which are not experiencing memory pressure may not benefit from running on native 64-bit SQL Server.
Differences between 32-bit and 64-bit
There are differences between 32-bit and 64-bit processes in general which apply to SQL Server. The working set of 64-bit SQL Server is larger than 32-bit both in code and data; mainly due to doubling the size for pointers, 64-bit instructions, structures that are cacheline aligned and more. This means 64-bit will be more sensitive to the size of onboard chip caches which may be used less efficiently when compared to 32-bit, resulting in a higher Cost per Instruction (CPI). The end result is applications which are not memory constrained on 32-bit (especially OLTP) may incur a performance penalty when running on 64-bit using the same hardware. We have observed approximately 10-15% degradation in most cases however the impact will depend on the characteristics of the workload and hardware being used.
Which applications will benefit most from 64-bit?
Many SQL Server applications will realize significant performance benefit running on native 64-bit. This platform will also provide much better scalability in the long term so it is important not to be short sighted when evaluating minor performance differences between the two platforms. Native 64-bit SQL Server typically benefits memory hungry applications the most and also applications requiring memory for items other than data pages. Some of these include:
• Procedure Cache: Lots of stored procedures and dynamic SQL batches (OLTP)
• Workspace Memory: Large number of concurrent hash joins or group by’s; large-scale sorts and index builds (Data Warehouse)
• Connection memory: Large numbers of connections
• Thread memory: High Concurrency OLTP, Complex query plans on multi-CPU machines
• CLR GC Heap memory: CLR SP allocations
• Lock Memory: Large scale OLTP
SQL Server 32-bit version allows use of memory beyond the 2 or 3 GB (32-bit limitation of user mode virtual address space) through the use of Address Windowing Extensions however there are restrictions on the use of AWE mapped memory. Specifically, the items above could not utilize AWE mapped memory (more on this below).
What to consider when deciding between 32-bit and 64-bit
When considering one platform over the other I would recommend thinking about the following:
1. Processor architecture plays a significant role. The processor used will impact the overhead introduced by running native 64-bit SQL Server on 64-bit Windows; specifically the amount of cache on the processor. Older generation processors with less L2/L3 cache will likely incur more of a penalty running native 64-bit SQL Server than will newer processors with larger amounts of cache. It is important to consider this and when possible favor chipsets that offer more cache. The performance gain offered by processors with larger cache sizes vary and can only be determined through testing. An exception to this rule would be AMD processors which have L2 cache but no L3 cache. The memory architecture of AMD processors is very different from Intel processors and the lack of L3 cache may be offset by the different architecture of the data transfer between processor and memory. It is important to note that we are not endorsing one processor over the other; we have observed excellent SQL Server performance on both Intel and AMD processors.
2. Memory latency will have a significant impact. The speed of data transfer between processors and memory will have a significant impact on performance. When accessing main memory latency is important for performance. Slower transfer speeds will result in higher latencies and a bigger performance penalty for cache misses. Faster transfer speed will have a positive, and often, significant impact on the performance of 64-bit SQL Server. Reducing memory latency is a key design point and strength of Non-Uniform Memory Access (NUMA) server architectures.
3. Consider the specifics of your particular workload. If your workload is running on 32-bit and is not experiencing memory pressure or does not have any of the requirements mentioned above then the workload will likely not benefit from 64-bit.
4. Look for unexpected behaviors when testing. We have seen a number of cases where moving to 64-bit resulted in unexpected behavior as a result of removing the virtual address space limitations. On 32-bit systems it is common to run with AWE enabled to allow SQL Server to utilize memory beyond 2 or 3 GB. This memory however can only be used for data cache. On 64-bit this restriction is lifted and non-data cache memory can grow much larger than was possible on 32-bit. This includes items such as procedure cache, lock memory, connection memory, workspace memory, etc… While beneficial in most cases there are also cases where this can expose issues with the application’s use of SQL Server. Some common scenarios include:
• Extensive growth of the procedure cache as a result of a high volume of non-parameterized SQL statements. Service Pack 2 for SQL 2005 has some optimizations to help address these issues but it is considered a best practice to parameterize queries when possible and the sp2 optimizations do not replace the need to follow this advice.
• Failure to put a limit on ‘max server memory’ resulting in system wide memory pressure. It is important on 64-bit systems to place an upper limit on ‘max server memory’ leaving sufficient room for the operating system and ensure the SQL Server service account is granted the ‘Lock pages in memory’ security privilege (On 64-bit SQL Server the lock pages in memory privilege is only valid for Enterprise edition). The goal of setting an upper limit is to ensure that there is sufficient memory for the OS, other applications on the machine and allocations within SQL Server that are satisfied from outside the buffer pool. SQL Server’s buffer pool is dynamic and will adjust to external memory pressure however being proactive is always a good strategy. It is not uncommon on systems with large amount of physical memory (>= 16GB) to leave several GB for these items. Refer to http://support.microsoft.com/kb/918483 for more information.
5. Think long term. Although there are times when a workload will perform slightly slower on native 64-bit the difference is most cases is not significant (usually 10% or less). It is important not to let a small impact to performance overshadow the long term benefits of 64-bit SQL Server. Native 64-bit will offer significantly better scalability than 32-bit and is the future of computing (especially for database applications). Even if a workload does not require or benefit from the additional virtual address space now it may in the future.
6. Monitor and Test. By monitoring current performance of your application you may be able to get an idea of whether or not your specific workload will benefit from 64-bit SQL Server. Some potential indicators of memory pressure which may indicate a scenario when 64-bit will help are:
• High rate of I/O coupled with low page life expectancy and/or constant lazy writer activity.
• Constant non-zero values for memory grants pending.
• High rate of temporary objects being created/destroyed.
• High value for stolen pages as a percentage of the total pages consumed by the SQL Server buffer pool.
Ultimately however the only way to definitely know whether or not your application will benefit from running native 64-bit is to test.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 3 Comments
Friday, March 23, 2007 8:36 AM
Upgrading to SQL Server 2005 and DEFAULT_SCHEMA setting.
Upgrading to SQL Server 2005 has many benefits . As we look back at the past year and the deployment of SQL Server 2005 with our ISV partners, one important but under-advertised feature in SQL Server 2005 became increasingly visible; the ability to control the dynamic name resolution behavior. This capability applies to workloads where object names are NOT fully-qualified with a two or a three part name (for example: “database.schema.object”). If you already fully qualify your object references, which is a SQL Server best practice, then this blog does not apply to you.
In this post, I will first explain the behavior in SQL Server 2000 and expand on what changed in SQL Server 2005. Then I’ll point out how to identify the issue in your workloads and how to change the DEFAULT_SCHEMA setting to take advantage of the benefits.
Dynamic Name Resolution - SQL Server 2000 vs. 2005
A query such as "select * from table1" in SQL Server 2000 goes through a set of steps to resolve and validate the object references before execution. The search first looks at the identity of the connection executing the query. Let’s assume we have a user connection through "user1". For “select * from table1”, SQL Server 2000 searches for the object "user1.table1" first. If object does not exist, the next step is to search for "dbo.table1". These searches complete very quickly but can still be visible to the naked eye in cases where you have a workload with many ad-hoc queries (with today’s hardware this typically means over 100 queries per second) that contain not-fully-qualified object references. There is also a chance that, with the new flexibilities introduced in SQL Server 2005 around separation of users and schemas (http://msdn2.microsoft.com/en-us/library/ms190387.aspx), you may see a higher overhead in name resolution compared to SQL Server 2000. In extreme cases, we have seen the impact of name resolution to be 10-15% on throughput.
However SQL Server 2005 provides a mechanism to allow finer control over name resolution to the administrators. By manipulatng the value of the default_schema_name columns in the sys.database_principals Dynamic Management View, you can change the way name resolution is performed. You can assign a default schema for each database principal (a.k.a user) to go directly to the correct schema if an object name is not fully qualified, and bypass the unnecessary steps.
Reusing the example from above; with DEFAULT_SCHEMA set to 'dbo' in a SQL Server 2005 database, user1 executing 'select * from table1' will directly resolve immediately to 'dbo.table1' without searching for 'user1.table1'.
In fact, if you use the new CREATE USER syntax from SQL server 2005, you will get ‘dbo’ as the DEFAULT_SCHEMA by default. So if you are developing in a new database under SQL Server 2005, you are already taking advantage of this name resolution shortcut. However, the behavior for upgraded databases is different . During upgrade from previous versions of SQL Server, to preserve the same behavior, SQL Server 2005 sets the DEFAULT_SCHEMA to the name of the user. Example: ‘user1’ gets DEFAULT_SCHEMA of 'user1' when you upgrade a database from an earlier version to SQL Server 2005 by attaching or any other method. The reality is, in most instances, applications don't use anything but the ‘dbo’ schema. Nevertheless, it is close to impossible to detect whether your database and queries contain references to only objects in a specific schema.
Why? Well, it may be that some of your objects and queries get generated dynamically and only parsed right before execution. Obviously preserving the backward compatible behavior also means that you still pay for the added cost of the late bound name resolution.
This was exactly the reason why a few of our partners saw higher CPU cycles after the upgrade and experienced slower throughput in SQL Server 2005. In those cases, DEFAULT_SCHEMA setting got us back the throughput we were expecting from SQL Server 2005. The improvements have been as much as 15% in throughput.
Obviously the best practice still stands: You should fully qualify all object names and not worry about the name resolution cost at all. The reality is, there are still many imperfect applications out there and this setting help great for those cases.
How do you know if this is an issue for your workload?
Lets take a look at ways we can tell if this setting could benefit your workload.
The brute force method is to look at your app code or SQL Server 2005 Profiler for the queries that you are executing. If you do not have fully qualified names and you are executing them through a database principal (user) that has a DEFAULT_SCHEMA value that does not have the intended schema name for the not-fully-qualified objects in your queries, then you have the problem.
Reiterating the example from above; user1, with DEFAULT_SCHEMA='user1', executing 'select * from table1' and resolving to 'dbo.table1' could benefit from setting DEFAULT_SCHEMA to 'dbo'.
The other option is to use sys.dm_exec_cached_plans or sys.dm_exec_query_stats to look at the cached plans. Again you can go after the text and look for name qualification. You can also look for the results from the following query.
You can use the following query to see the cached plans that may indicate an issue with not-fully-qualified object names;
SELECT * FROM
(
SELECT e.[text], e.[user_id], e.[dbid]
FROM (
SELECT b.text,c.attribute,c.value
FROM sys.dm_exec_cached_plans AS a
OUTER APPLY sys.dm_exec_sql_text(a.plan_handle) AS b
OUTER APPLY sys.dm_exec_plan_attributes(a.plan_handle) AS c
) AS d
PIVOT (MAX(d.value) FOR d.attribute IN ("user_id", "dbid")) AS e
) AS f
WHERE [dbid] not in (32767) -- leave out the resource database
AND [user_id] not in (-2,1,4) -- leave out the generic plans, plans that belong to dbo and sys
If the resultset is non-empty, your application may be a good candidate for the DEFAULT_SCHEMA setting consideration. This is a good shortcut but is not a 100% guarantee. Be aware that sys.dm_exec_cached_plans contains only the cached plans. The cache fluctuates under memory pressure so some queries may get pushed out of the cache due to aging and pressure or, in some cases, may not get cached at all due to cost or security restrictions.
One additional thing to watch for; If you do not see a range of user_ids in the resultset, the benefits from setting the DEFAULT_SCHEMA may not be significant due to other optimizations and shortcuts in the engine around plan caching.
How do you change the DEFAULT_SCHEMA?
Assuming your database is using objects only in the dbo schema here is what you would run for each user:
ALTER USER
You can validate the current DEFAULT_SCHEMA settings by looking at the default_schema_name column in the sys.database_principals Dynamic Management View.
Cross Posted from http://blogs.microsoft.com/mssqlisv
Posted by mssqlisv 2 Comments
Thursday, February 22, 2007 11:47 AM
why cannot i GRANT EXECUTE permission on assemblies anymore ?
Just a quick note that this is by design and no you don't need to use it.
What are we talking about ?
Books online talks about granting assembly permissions. You used to do
GRANT execute on ASSEMBLY ::
Your database holds your assemblies and as such you could grant / revoke limited set of permissions. We initially allowed you to grant execute permission on the assembly but now when you execute the same on a server with the SP2 service pack, you get the following :
Msg 102, Level 15, State 1, Line 0
Incorrect syntax near 'EXECUTE...'.
Why did this change ?
Execute permission gave a false sense of security and was really inappropriate. Here's why:
As we shipped SQL Server 2005, this permission set was supported in TSQL but never enforced. So you could say "deny execute on assembly" but nothing happened. With this fix to revert the support from TSQL, you will not be misled anymore and we wanted to get this fix to you in a service pack of the same product that introduced you this immensely easy way of expressing your logic in assemblies. Going forward, with database upgrades to next release, any lingering assemblies that display this permission will be automatically corrected. And so you don't have to worry about a thing.
So how can i secure assemblies registered to the server ?
The assembly is really a container for types, objects etc which are independently permissible. The assembly itself can be marked safe, external_access and unsafe. In addition, you can use the 'trustworthy' property of the database for similar effect. For a complete list with description of our security model, please refer CLR Integration security in Books Online.
Thanks,
Mayank Nagar