The differences of SQL Server char, nchar, varchar and nvarchar are frequently discussed not just during interviews, but also by developers during discussions on database design. In this tip I would like to share not only the basic differences, but also what we need to know and be aware of when using each data type.
Char, nchar, varchar and nvarchar are all used to store text or string data in SQL Server databases.
- char – is the SQL-92 synonym for character. Data is padded with blanks/spaces to fill the field size. Fixed length data type.
- nchar – is the SQL-92 synonym for national char and national character. Fixed length data type.
- varchar – is the SQL-92 synonym for character varying. Variable length data type.
- nvarchar – is the SQL-92 synonym for national char varying and national character varying. Variable length data type.
You might wonder what the N stands for? N stands for National Language Character Set and is used to specify a Unicode string. When using Unicode data types, a column can store any character defined by the Unicode Standard, which includes all of the characters defined in the various character sets. Note that Unicode data types take twice as much storage space as non-Unicode data types.
Unicode is typically used in database applications which are designed to facilitate code pages which extend beyond the English and Western Europe code pages. It is designed so that extended character sets can still "fit" into database columns. What this means is that Unicode character data types are limited to half the space, because each byte actually takes two bytes to store the data (Unicode is sometimes referred to as "double-wide"). SQL Server has supported Unicode since SQL Server 7.0 by providing nchar/nvarchar/ntext data types. SQL Server doesn’t support UTF-8 encoding for Unicode data, but it does support UTF-16 encoding.
I made a table below that will serve as a quick reference.
Table of Differences
|Character Data Type||Non-Unicode fixed-length||Unicode fixed-length can store both non-Unicode and Unicode characters (i.e. Japanese, Korean etc.)||Non-Unicode variable length||Unicode variable length can store both non-Unicode and Unicode characters (i.e. Japanese, Korean etc.)|
|Maximum Length||up to 8,000 characters||up to 4,000 characters||up to 8,000 characters||up to 4,000 characters|
|Character Size||takes up 1 byte per character||takes up 2 bytes per Unicode/Non-Unicode character||takes up 1 byte per character||takes up 2 bytes per Unicode/Non-Unicode character|
|Storage Size||n bytes||2 times n bytes||Actual Length (in bytes)||2 times Actual Length (in bytes)|
|Usage||use when data length is constant or fixed length columns||use only if you need Unicode support such as the Japanese Kanji or Korean Hangul characters due to storage overhead||used when data length is variable or variable length columns and if actual data is always way less than capacity||use only if you need Unicode support such as the Japanese Kanji or Korean Hangul characters due to storage overhead|
|query that uses a varchar parameter does an index seek due to column collation sets||query that uses a nvarchar parameter does an index scan due to column collation sets|
Advantages and Disadvantages
Query performance is better since no need to move the column while updating.
No need to store the length of string in last two bytes.
If not properly used, it can take more space than varchar since it is fixed length and we don’t know the length of string to be stored.
It is not good for compression since it embeds space characters at the end.
Since it is variable length it takes less memory spaces.
Decreases the performance of some SQL queries.
Supports many client computers that are running different locales.
If not properly used it may use up a lot of extra storage space.
With the growth and innovation of web applications, it is even more important to support client computers that are running different locales. The easiest way to manage character data in international databases is to always use the Unicode nchar, nvarchar, and ntext data types, instead of their non-Unicode equivalents, char, varchar, and text.
Unicode is a standard for mapping code points to characters. Because it is designed to cover all the characters of all the languages of the world, there is no need for different code pages to handle different sets of characters. SQL Server supports the Unicode Standard, Version 3.2. If all the applications that work with international databases also use Unicode variables instead of non-Unicode variables, character translations do not have to be performed anywhere in the system. Clients will see the same characters in the data as all other clients.
SQL Server stores all textual system catalog data in columns having Unicode data types. The names of database objects, such as tables, views, and stored procedures, are stored in Unicode columns. This enables applications to be developed by using only Unicode, and helps avoid issues with code page conversions.
Remember when developing new applications to consider if it will be used globally because this will help you determine whether to use nchar and nvarchar to support different languages.
Learn more by reading and exploring the following:
- Take time to read this tip too which might help you in planning your database design Defining Data Types and Sizes
- Learn more about the importance of data type consistency SQL Server Data Type Consistency
- Comparing SQL Server and Oracle datatypes
- If you have an application you plan to take globally try exploring with global characters. Who knows if you are successful you might increase your sales and take your apps to the next level.
Last Update: 6/14/2016