神刀安全网

Data Import Performance Comparison T-SQL vs SSIS for large import

By:Daniel Calbimonte |  |   Related Tips:More >Import and Export


Problem

I need to import an Excel file to SQL Server that has 1 million rows. Do you recommend using T-SQL or SSIS? Can you provide a comparison of the options? Which option is easier to code? Which option performs better?

Solution

We are going to try both options (T-SQL and SSIS) and you will be able to compare the results!

Requirements

  1. SQL Server relational database engine installed
  2. SSIS installed
  3. SQL Server Data Tools Installed (SSDT) or BIDS (Business Intelligence Development Studio) installed.
  4. You can use SQL Server 2005 or later versions. In this example, we are using SQL Server 2014.
  5. Microsoft Excel 2013 installed.

Import 1,000,000 Rows with SQL Server Integration Services

  1. In this fist example, we are going to import a million rows from Excel to SQL Server.
  2. In the SSDT, create a new SSIS project.
  3. Drag and drop the Data Flow Task to the design pane.
  4. Data Import Performance Comparison T-SQL vs SSIS for large import
  5. Double click on the task and drag and drop the Excel source and the SQL Server Destination and join the tasks.
  6. Data Import Performance Comparison T-SQL vs SSIS for large import
  7. In the Excel Source add a new connection to the Excel file.
  8. Data Import Performance Comparison T-SQL vs SSIS for large import
  9. Double click on the SQL Destination Task and select a SQL connection and in the use a table or view, press the new button to create a new table. Click on the Mappings page to match columns.
  10. Data Import Performance Comparison T-SQL vs SSIS for large import
  11. If you run the package, you will notice that only 65,535 rows were copied.
  12. Data Import Performance Comparison T-SQL vs SSIS for large import
  13. It took just 5 seconds, but we need to import a million rows!
  14. Data Import Performance Comparison T-SQL vs SSIS for large import
  15. The easiest way to solve this problem is to save the Excel file as a CSV file.
  16. Data Import Performance Comparison T-SQL vs SSIS for large import
  17. Let’s remove the other tasks and add the Flat File Source and a SQL Server Destination.
  18. Data Import Performance Comparison T-SQL vs SSIS for large import
  19. Press the browse button and select the CSV file.
  20. Data Import Performance Comparison T-SQL vs SSIS for large import
  21. Go to the columns page to match the columns.
  22. Data Import Performance Comparison T-SQL vs SSIS for large import
  23. If you run the package, you will see that the million rows were imported to SQL Server successfully.
  24. Data Import Performance Comparison T-SQL vs SSIS for large import
  25. If we check the time in the progress, we will notice that it took 15.959 seconds.
  26. Data Import Performance Comparison T-SQL vs SSIS for large import

Import 1,000,000 Rows with T-SQL

  1. Let’s try the same with T-SQL. We are going to import the data using Linked Servers.
  2. If you are not familiar with Linked servers, we strongly recommend you to check our tip about Linked Servers for Microsoft Excel.
  3. The following code will create the Linked Server:
  4. EXEC sp_addlinkedserver  @server = N'ExcelDataSource',  @srvproduct=N'ExcelData',  @provider=N'Microsoft.ACE.OLEDB.12.0',  @datasrc=N'C:/scripts/Excelfile.xlsx', @provstr=N'EXCEL 12.0' ;
  5. Try to expand the tables of the Linked Server:
  6. Data Import Performance Comparison T-SQL vs SSIS for large import
  7. A typical error is this: The OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "ExcelDataSource" reported and error. Access denied. Cannot obtain the required interface ("IID_IDBSchemaRowset") from OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "ExcelDataSource". (Microsoft SQL Server, Error: 7399)
  8. Data Import Performance Comparison T-SQL vs SSIS for large import
  9. One of the solutions is to install the following driver for MS Office . By default, this driver is not installed. You will need to restart the machine after installing this driver.
  10. If that does not work, run the following commands. The AllowInProcess is used to run the process as a SQL Server Process and the DynamicParameter option is used to run parameterized queries.
    USE[master]   GO   EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0',N'AllowInProcess',1   GO   EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0',N'DynamicParameters',1   GO
  11. Also, make sure that the query is running under an account with access to the Excel file.
  12. Once the Linked Server is working, run the following query to create a SQL Server table named MylinkedExcel.
  13. SELECT [BusinessEntityID] ,[PersonType] ,[NameStyle] ,[Title] ,[FirstName] ,[MiddleName] ,[LastName] INTO MylinkedExcel FROM [ExcelDataSource]...[Sheet1$]
  14. If we measure the time, we will notice that it takes more than 7 minutes and 47 seconds!
  15. Data Import Performance Comparison T-SQL vs SSIS for large import
  16. OK, now we know that using a Linked Server to import data from Excel is very slow. We are going to use the CSV we created above and then use the Bulk Insert command.
  17. To use the Bulk Insert command we will need to first create the table manually:
  18. CREATE TABLE [dbo].[excelcustomer2]( [BusinessEntityID] [varchar](50) NULL,  [PersonType] [varchar](50) NULL,  [NameStyle] [varchar](50) NULL,  [Title] [varchar](50) NULL,  [FirstName] [varchar](50) NULL,  [MiddleName] [varchar](50) NULL,  [LastName] [varchar](50) NULL,  [Column 7] [varchar](50) NULL,  [Column 8] [varchar](50) NULL,  [Column 9] [varchar](50) NULL ) ON [PRIMARY]  GO;
  19. This code will import the data from the CSV file to the table we just created.
  20. BULK INSERT excelcustomer2 FROM 'c:/scripts/excelfile.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '/n' ) GO;
  21. If we check the time, we will notice that it took 31 seconds to import the data.
  22. Data Import Performance Comparison T-SQL vs SSIS for large import

Conclusion

To summarize, SSIS has an Excel limitation of 65535 rows. We had to work with the flat file component and convert from Excel to a CSV file to import a million rows.

On the other hand, T-SQL had several problems to access the Linked Server file from Excel, but finally when it was working it took more than 7 minutes to import the data. We then tried again with the CSV file and used the Bulk Insert command to improve the performance.

Who is the winner?….the table summarizes the results:

Solution Time
SSIS using the Excel component 5 seconds, but it is limited to only 65535 rows
SSIS converting the Excel to CSV and using the flat file component 16 seconds
Linked Server to Excel 7 minutes, 47 seconds (but it worked)
The Bulk Insert statement converting the Excel file to a CSV file 31 seconds

As you can see, SSIS has very fast tools to import and export files. It also has the advantage that the destination table is created automatically while the Bulk Insert statement requires creating the table manually. In this example SSIS is the winner. Using SSIS can be hard at the beginning and you will find some limitations, but T-SQL has also limitations, so we recommend studying both options to find the best solution for your problem.

Next Steps

Here are several links that will be useful to you:

Last Update:

Data Import Performance Comparison T-SQL vs SSIS for large import

About the author

Data Import Performance Comparison T-SQL vs SSIS for large import Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and Microsoft Certified IT Professional.

View all my tips

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Data Import Performance Comparison T-SQL vs SSIS for large import

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮