SQL Data Types

 

Overview

SQL supports a number of data types in several categories: numeric types, date and time types, and string (character) types. This chapter first gives an overview of these data types, and then provides a more detailed description of the properties of the types in each category, and a summary of the data type storage requirements. The initial overview is intentionally brief. The more detailed descriptions later in the chapter should be consulted for additional information about particular data types, such as the allowable formats in which you can specify values.

Data type descriptions use these conventions:

M indicates the maximum display width for integer types. For floating-point and fixed-point types, M is the total number of digits that can be stored. For string types, M is the maximum length. The maximum allowable value of M depends on the data type.

D applies to floating-point and fixed-point types and indicates the number of digits following the decimal point. The maximum possible value is 30, but should be no greater than M-2.

Square brackets ("[" and "]") indicate optional parts of type definitions.

Overview of Numeric Types

A summary of the numeric data types follows. For additional information, see Section on "Numeric Types". Storage requirements are given in Section on "Data Type Storage Requirements".

M indicates the maximum display width for integer types. The maximum legal display width is 255. Display width is unrelated to the range of values a type can contain, as described in Section on "Numeric Types". For floating-point and fixed-point types, M is the total number of digits that can be stored.

If you specify ZEROFILL for a numeric column, SQL automatically adds the UNSIGNED attribute to the column.

Numeric data types that allow the UNSIGNED attribute also allow SIGNED. However, these data types are signed by default, so the SIGNED attribute has no effect.

SERIAL is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE.

SERIAL DEFAULT VALUE in the definition of an integer column is an alias for NOT NULL AUTO_INCREMENT UNIQUE.

Warning: When you use subtraction between integer values where one is of type UNSIGNED, the result is unsigned unless the NO_UNSIGNED_SUBTRACTION SQL mode is enabled.

Overview of Date and Time Types

A summary of the temporal data types follows.

For the DATETIME and DATE range descriptions, "supported" means that although earlier values might work, there is no guarantee.

The SUM() and AVG() aggregate functions do not work with temporal values. (They convert the values to numbers, which loses the part after the first non-numeric character.) To work around this problem, you can convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples:

(sql sqlHandle {SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name})
(sql sqlHandle {SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name})

Overview of String Types

A summary of the string data types follows.

In some cases, SQL may change a string column to a type different from that given in a CREATE TABLE or ALTER TABLE statement.

SQL interprets length specifications in character column definitions in character units. This applies to CHAR, VARCHAR, and the TEXT types.

Column definitions for many string data types can include attributes that specify the character set or collation of the column. These attributes apply to the CHAR, VARCHAR, the TEXT types, ENUM, and SET data types:

Character column sorting and comparison are based on the character set assigned to the column. For the CHAR, VARCHAR, TEXT, ENUM, and SET data types, you can declare a column with a binary collation or the BINARY attribute to cause sorting and comparison to use the underlying character code values rather than a lexical ordering.

Data Type Default Values

The DEFAULT value clause in a data type specification indicates a default value for a column. With one exception, the default value must be a constant; it cannot be a function or an expression. This means, for example, that you cannot set the default for a date column to be the value of a function such as NOW() or CURRENT_DATE. The exception is that you can specify CURRENT_TIMESTAMP as the default for a TIMESTAMP column.

BLOB and TEXT columns cannot be assigned a default value.

If a column definition includes no explicit DEFAULT value, SQL determines the default value as follows:

If the column can take NULL as a value, the column is defined with an explicit DEFAULT NULL clause.

If the column cannot take NULL as the value, SQL defines the column with no explicit DEFAULT clause. For data entry, if an INSERT or REPLACE statement includes no value for the column, SQL handles the column according to the SQL mode in effect at the time:

Suppose that a table t is defined as follows:

(sql sqlHandle {CREATE TABLE t (i INT NOT NULL)})

In this case, i has no explicit default, so in strict mode each of the following statements produce an error and no row is inserted. When not using strict mode, only the third statement produces an error; the implicit default is inserted for the first two statements, but the third fails because DEFAULT(i) cannot produce a value:

(sql sqlHandle {INSERT INTO t VALUES()})
(sql sqlHandle {INSERT INTO t VALUES(DEFAULT)})
(sql sqlHandle {INSERT INTO t VALUES(DEFAULT(i))})

For a given table, you can use the SHOW CREATE TABLE statement to see which columns have an explicit DEFAULT clause.

Implicit defaults are defined as follows:

SERIAL DEFAULT VALUE in the definition of an integer column is an alias for NOT NULL AUTO_INCREMENT UNIQUE.

Numeric Types

SQL supports all of the standard SQL numeric data types. These types include the exact numeric data types (INTEGER, SMALLINT, DECIMAL, and NUMERIC), as well as the approximate numeric data types (FLOAT, REAL, and DOUBLE PRECISION). The keyword INT is a synonym for INTEGER, and the keyword DEC is a synonym for DECIMAL.

The numeric types used for the results of calculations depends on the operations being performed and the numeric types of the operands.

The BIT data type stores bit-field values and is supported for MyISAM, MEMORY, InnoDB, and NDBCLUSTER tables.

As an extension to the SQL standard, SQL also supports the integer types TINYINT, MEDIUMINT, and BIGINT. The following table shows the required storage and range for each of the integer types.

Type Bytes Minimum Value Maximum Value
(Signed/Unsigned) (Signed/Unsigned)
TINYINT 1 -128 127
0 255
SMALLINT 2 -32768 32767
0 65535
MEDIUMINT 3 -8388608 8388607
0 16777215
INT 4 -2147483648 2147483647
0 4294967295
BIGINT 8 -9223372036854775808 9223372036854775807
0 18446744073709551615

Another extension is supported by SQL for optionally specifying the display width of integer data types in parentheses following the base keyword for the type (for example, INT(4)). This optional display width is used to display integer values having a width less than the width specified for the column by left-padding them with spaces.

The display width does not constrain the range of values that can be stored in the column, nor the number of digits that are displayed for values having a width exceeding that specified for the column. For example, a column specified as SMALLINT(3) has the usual SMALLINT range of -32768 to 32767, and values outside the range allowed by three characters are displayed using more than three characters.

When used in conjunction with the optional extension attribute ZEROFILL, the default padding of spaces is replaced with zeros. For example, for a column declared as INT(5) ZEROFILL, a value of 4 is retrieved as 00004. Note that if you store larger values than the display width in an integer column, you may experience problems when SQL generates temporary tables for some complicated joins, because in these cases SQL assumes that the data fits into the original column width.

Note: The ZEROFILL attribute is ignored when a column is involved in expressions or UNION queries.

All integer types can have an optional (non-standard) attribute UNSIGNED. Unsigned values can be used when you want to allow only non-negative numbers in a column and you need a larger upper numeric range for the column. For example, if an INT column is UNSIGNED, the size of the column's range is the same but its endpoints shift from -2147483648 and 2147483647 up to 0 and 4294967295.

Floating-point and fixed-point types also can be UNSIGNED. As with integer types, this attribute prevents negative values from being stored in the column. However, unlike the integer types, the upper range of column values remains the same.

If you specify ZEROFILL for a numeric column, SQL automatically adds the UNSIGNED attribute to the column.

Integer or floating-point data types can have the additional attribute AUTO_INCREMENT. When you insert a value of NULL (recommended) or 0 into an indexed AUTO_INCREMENT column, the column is set to the next sequence value. Typically this is value+1, where value is the largest value for the column currently in the table. AUTO_INCREMENT sequences begin with 1.

For floating-point data types, SQL uses four bytes for single-precision values and eight bytes for double-precision values.

The FLOAT and DOUBLE data types are used to represent approximate numeric data values. For FLOAT, the SQL standard allows an optional specification of the precision (but not the range of the exponent) in bits following the keyword FLOAT in parentheses. SQL also supports this optional precision specification, but the precision value is used only to determine storage size. A precision from 0 to 23 results in a four-byte single-precision FLOAT column. A precision from 24 to 53 results in an eight-byte double-precision DOUBLE column.

SQL allows a non-standard syntax: FLOAT(M,D) or REAL(M,D) or DOUBLE PRECISION(M,D). Here, "(M,D)" means than values can be stored with up to M digits in total, of which D digits may be after the decimal point. For example, a column defined as FLOAT(7,4) will look like -999.9999 when displayed. SQL performs rounding when storing values, so if you insert 999.00009 into a FLOAT(7,4) column, the approximate result is 999.0001.

SQL treats DOUBLE as a synonym for DOUBLE PRECISION (a non-standard extension). SQL also treats REAL as a synonym for DOUBLE PRECISION (a non-standard variation), unless the REAL_AS_FLOAT SQL mode is enabled.

For maximum portability, code requiring storage of approximate numeric data values should use FLOAT or DOUBLE PRECISION with no specification of precision or number of digits.

The DECIMAL and NUMERIC data types are used to store exact numeric data values. In SQL, NUMERIC is implemented as DECIMAL. These types are used to store values for which it is important to preserve exact precision, for example with monetary data.

SQL stores DECIMAL and NUMERIC values in binary format.

When declaring a DECIMAL or NUMERIC column, the precision and scale can be (and usually is) specified; for example:

salary DECIMAL(5,2)

In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. If the scale is 0, DECIMAL and NUMERIC values contain no decimal point or fractional part.

Standard SQL requires that the salary column be able to store any value with five digits and two decimals. In this case, therefore, the range of values that can be stored in the salary column is from -999.99 to 999.99.

In standard SQL, the syntax DECIMAL(M) is equivalent to DECIMAL(M,0). Similarly, the syntax DECIMAL is equivalent to DECIMAL(M,0), where the implementation is allowed to decide the value of M. SQL supports both of these variant forms of the DECIMAL and NUMERIC syntax. The default value of M is 10.

The maximum number of digits for DECIMAL or NUMERIC is 65, but the actual range for a given DECIMAL or NUMERIC column can be constrained by the precision or scale for a given column. When such a column is assigned a value with more digits following the decimal point than are allowed by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the allowable number of digits.)

The BIT data type is used to store bit-field values. A type of BIT(M) allows for storage of M-bit values. M can range from 1 to 64.

To specify bit values, b'value' notation can be used. value is a binary value written using zeros and ones. For example, b'111' and b'10000000' represent 7 and 128, respectively.

If you assign a value to a BIT(M) column that is less than M bits long, the value is padded on the left with zeros. For example, assigning a value of b'101' to a BIT(6) column is, in effect, the same as assigning b'000101'.

When asked to store a value in a numeric column that is outside the data type's allowable range, SQL's behavior depends on the SQL mode in effect at the time. For example, if no restrictive modes are enabled, SQL clips the value to the appropriate endpoint of the range and stores the resulting value instead. However, if the mode is set to TRADITIONAL, SQL rejects a value that is out of range with an error, and the insert fails, in accordance with the SQL standard.

In non-strict mode, when an out-of-range value is assigned to an integer column, SQL stores the value representing the corresponding endpoint of the column data type range. If you store 256 into a TINYINT or TINYINT UNSIGNED column, SQL stores 127 or 255, respectively. When a floating-point or fixed-point column is assigned a value that exceeds the range implied by the specified (or default) precision and scale, SQL stores the value representing the corresponding endpoint of that range.

Conversions that occur due to clipping when SQL is not operating in strict mode are reported as warnings for ALTER TABLE, LOAD DATA INFILE, UPDATE, and multiple-row INSERT statements. When SQL is operating in strict mode, these statements fail, and some or all of the values will not be inserted or changed, depending on whether the table is a transactional table and other factors.

Date and Time Types

The date and time types for representing temporal values are DATETIME, DATE, TIMESTAMP, TIME, and YEAR. Each temporal type has a range of legal values, as well as a "zero" value that may be used when you specify an illegal value that SQL cannot represent. The TIMESTAMP type has special automatic updating behavior, described later on.

SQL gives warnings or errors if you try to insert an illegal date. By setting the SQL mode to the appropriate value, you can specify more exactly what kind of dates you want SQL to support. You can get SQL to accept certain dates, such as '2009-11-31', by using the ALLOW_INVALID_DATES SQL mode. This is useful when you want to store a "possibly wrong" value which the user has specified (for example, in a web form) in the database for future processing. Under this mode, SQL verifies only that the month is in the range from 0 to 12 and that the day is in the range from 0 to 31. These ranges are defined to include zero because SQL allows you to store dates where the day or month and day are zero in a DATE or DATETIME column. This is extremely useful for applications that need to store a birthdate for which you do not know the exact date. In this case, you simply store the date as '2009-00-00' or '2009-01-00'. If you store dates such as these, you should not expect to get correct results for functions such as DATE_SUB() or DATE_ADD() that require complete dates. (If you do not want to allow zero in dates, you can use the NO_ZERO_IN_DATE SQL mode).

A DATE value is coerced to the DATETIME type by adding the time portion as '00:00:00'.

SQL also allows you to store '0000-00-00' as a "dummy date" (if you are not using the NO_ZERO_DATE SQL mode). This is in some cases more convenient (and uses less data and index space) than using NULL values.

Here are some general considerations to keep in mind when working with date and time types:

The DATETIME, DATE, and TIMESTAMP Types

The DATETIME, DATE, and TIMESTAMP types are related. This section describes their characteristics, how they are similar, and how they differ.

The DATETIME type is used when you need values that contain both date and time information. SQL retrieves and displays DATETIME values in 'YYYY-MM-DD HH:MM:SS' format. The supported range is '1000-01-01 00:00:00' to '9999-12-31 23:59:59'.

The DATE type is used when you need only a date value, without a time part. SQL retrieves and displays DATE values in 'YYYY-MM-DD' format. The supported range is '1000-01-01' to '9999-12-31'.

For the DATETIME and DATE range descriptions, "supported" means that although earlier values might work, there is no guarantee.

The TIMESTAMP data type has a range of '1970-01-01 00:00:01' UTC to '2038-01-09 03:14:07' UTC. It has varying properties, depending on the SQL version and the SQL mode the server is running in. These properties are described later in this section.

You can specify DATETIME, DATE, and TIMESTAMP values using any of a common set of formats:

A microseconds part is allowable in temporal values in some contexts, such as in literal values, and in the arguments to or return values from some temporal functions. Microseconds are specified as a trailing .uuuuuu part in the value. Example:

(sql sqlHandle {SELECT MICROSECOND('2010-12-10 14:12:09.019473')}) Returns
#(rec(|MICROSECOND('2010-12-10 14:12:09.019473')|:Integer:1)| |MICROSECOND('2010-12-10 14:12:09.019473')|:19473)

However, microseconds cannot be stored into a column of any temporal data type. Any microseconds part is discarded.

Conversion of TIME or DATETIME values to numeric form (for example, by adding +0) results in a double value with a microseconds part of .000000:

(sql sqlHandle {SELECT CURTIME(), CURTIME()+0}) Returns
#(rec(|CURTIME()|:Date:1 |CURTIME()+0|:Number:1)| |CURTIME()|:#Jan,1,0:21:57:52 |CURTIME()+0|:215753.0)
(sql sqlHandle {SELECT NOW(), NOW()+0}) Returns
#(rec(|NOW()|:Date:1 |NOW()+0|:Number:1)| |NOW()|:#Nov,9,2008:21:59:11 |NOW()+0|:2.008110921591E+013)

Illegal DATETIME, DATE, or TIMESTAMP values are converted to the "zero" value of the appropriate type ('0000-00-00 00:00:00' or '0000-00-00').

For values specified as strings that include date part delimiters, it is not necessary to specify two digits for month or day values that are less than 10. '1979-6-9' is the same as '1979-06-09'. Similarly, for values specified as strings that include time part delimiters, it is not necessary to specify two digits for hour, minute, or second values that are less than 10. '1979-10-30 1:2:3' is the same as '1979-10-30 01:02:03'.

Values specified as numbers should be 6, 8, 12, or 14 digits long. If a number is 8 or 14 digits long, it is assumed to be in YYYYMMDD or YYYYMMDDHHMMSS format and that the year is given by the first 4 digits. If the number is 6 or 12 digits long, it is assumed to be in YYMMDD or YYMMDDHHMMSS format and that the year is given by the first 2 digits. Numbers that are not one of these lengths are interpreted as though padded with leading zeros to the closest length.

Values specified as non-delimited strings are interpreted using their length as given. If the string is 8 or 14 characters long, the year is assumed to be given by the first 4 characters. Otherwise, the year is assumed to be given by the first 2 characters. The string is interpreted from left to right to find year, month, day, hour, minute, and second values, for as many parts as are present in the string. This means you should not use strings that have fewer than 6 characters. For example, if you specify '9903', thinking that represents March, 1999, SQL inserts a "zero" date value into your table. This occurs because the year and month values are 99 and 03, but the day part is completely missing, so the value is not a legal date. However, you can explicitly specify a value of zero to represent missing month or day parts. For example, you can use '990300' to insert the value '1999-03-00'.

You can to some extent assign values of one date type to an object of a different date type. However, there may be some alteration of the value or loss of information:

Be aware of certain pitfalls when specifying date values:

TIMESTAMP Properties

TIMESTAMP columns are displayed in the same format as DATETIME columns. In other words, the display width is fixed at 19 characters, and the format is 'YYYY-MM-DD HH:MM:SS'.

TIMESTAMP values are converted from the current time zone to UTC for storage, and converted back from UTC to the current time zone for retrieval. (This occurs only for the TIMESTAMP data type, not for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable.

The TIMESTAMP data type offers automatic initialization and updating. You can choose whether to use these properties and which column should have them:

TIMESTAMP columns are NOT NULL by default, cannot contain NULL values, and assigning NULL assigns the current timestamp. However, a TIMESTAMP column can be allowed to contain NULL by declaring it with the NULL attribute. In this case, the default value also becomes NULL unless overridden with a DEFAULT clause that specifies a different default value. DEFAULT NULL can be used to explicitly specify NULL as the default value. (For a TIMESTAMP column not declared with the NULL attribute, DEFAULT NULL is illegal.) If a TIMESTAMP column allows NULL values, assigning NULL sets it to NULL, not to the current timestamp.

The following table contains several TIMESTAMP columns that allow NULL values:

(sql sqlHandle {CREATE TABLE t (ts1 TIMESTAMP NULL DEFAULT NULL, ts2 TIMESTAMP NULL DEFAULT 0, ts3 TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP)})

Note that a TIMESTAMP column that allows NULL values will not take on the current timestamp except under one of the following conditions:

In other words, a TIMESTAMP column defined as NULL will auto-initialize only if it is created using a definition such as the following:

(sql sqlHandle {CREATE TABLE t (ts TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP)})

Otherwise - that is, if the TIMESTAMP column is defined to allow NULL values but not using DEFAULT CURRENT_TIMESTAMP, as shown here...

(sql sqlHandle {CREATE TABLE t1 (ts TIMESTAMP NULL DEFAULT NULL)})
(sql sqlHandle {CREATE TABLE t2 (ts TIMESTAMP NULL DEFAULT '0000-00-00 00:00:00')})

...then you must explicitly insert a value corresponding to the current date and time. For example:

(sql sqlHandle {INSERT INTO t1 VALUES (NOW())})
(sql sqlHandle {INSERT INTO t2 VALUES (CURRENT_TIMESTAMP)})

Note: The SQL server can be run with the MAXDB SQL mode enabled. When the server runs with this mode enabled, TIMESTAMP is identical with DATETIME. That is, if this mode is enabled at the time that a table is created, TIMESTAMP columns are created as DATETIME columns. As a result, such columns use DATETIME display format, have the same range of values, and there is no automatic initialization or updating to the current date and time.

To enable MAXDB mode, set the server SQL mode to MAXDB at startup using the --sql-mode=MAXDB server option or by setting the global sql_mode variable at runtime:

(sql sqlHandle {SET GLOBAL sql_mode=MAXDB})

A client can cause the server to run in MAXDB mode for its own connection as follows:

(sql sqlHandle {SET SESSION sql_mode=MAXDB})

The TIME Type

SQL retrieves and displays TIME values in 'HH:MM:SS' format (or 'HHH:MM:SS' format for large hours values). TIME values may range from '-838:59:59' to '838:59:59'. The hours part may be so large because the TIME type can be used not only to represent a time of day (which must be less than 24 hours), but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative).

You can specify TIME values in a variety of formats:

A trailing .uuuuuu microseconds part of TIME values is allowed under the same conditions as for other temporal values, "The DATETIME, DATE, and TIMESTAMP Types". This includes the property that any microseconds part is discarded from values stored into TIME columns.

For TIME values specified as strings that include a time part delimiter, it is not necessary to specify two digits for hours, minutes, or seconds values that are less than 10. '8:3:2' is the same as '08:03:02'.

Be careful about assigning abbreviated values to a TIME column. Without colons, SQL interprets values using the assumption that the two rightmost digits represent seconds. (SQL interprets TIME values as elapsed time rather than as time of day.) For example, you might think of '1112' and 1112 as meaning '11:12:00' (12 minutes after 11 o'clock), but SQL interprets them as '00:11:12' (11 minutes, 12 seconds). Similarly, '12' and 12 are interpreted as '00:00:12'. TIME values with colons, by contrast, are always treated as time of the day. That is, '11:12' mean '11:12:00', not '00:11:12'.

By default, values that lie outside the TIME range but are otherwise legal are clipped to the closest endpoint of the range. For example, '-850:00:00' and '850:00:00' are converted to '-838:59:59' and '838:59:59'. Illegal TIME values are converted to '00:00:00'. Note that because '00:00:00' is itself a legal TIME value, there is no way to tell, from a value of '00:00:00' stored in a table, whether the original value was specified as '00:00:00' or whether it was illegal.

For more restrictive treatment of invalid TIME values, enable strict SQL mode to cause errors to occur.

The YEAR Type

The YEAR type is a one-byte type used for representing years. It can be declared as YEAR(2) or YEAR(4) to specify a display width of two or four characters. The default is four characters if no width is given.

For four-digit format, SQL displays YEAR values in YYYY format, with a range of 1901 to 2155. For two-digit format, SQL displays values with a range of 70 (1970) to 69 (2069).

You can specify input YEAR values in a variety of formats:

Illegal YEAR values are converted to 0000.

Year 2000 Issues and Date Types

SQL Server itself has no problems with Year 2000 (Y2K) compliance:

Although SQL Server itself is Y2K-safe, you may run into problems if you use it with applications that are not Y2K-safe. For example, many old applications store or manipulate years using two-digit values (which are ambiguous) rather than four-digit values. This problem may be compounded by applications that use values such as 00 or 99 as "missing" value indicators. Unfortunately, these problems may be difficult to fix because different applications may be written by different programmers, each of whom may use a different set of conventions and date-handling functions.

Thus, even though SQL Server has no Y2K problems, it is the application's responsibility to provide unambiguous input. Any value containing a two-digit year is ambiguous, because the century is unknown. Such values must be interpreted into four-digit form because SQL stores years internally using four digits.

For DATETIME, DATE, TIMESTAMP, and YEAR types, SQL interprets dates with ambiguous year values using the following rules:

Remember that these rules are only heuristics that provide reasonable guesses as to what your data values mean. If the rules used by SQL do not produce the correct values, you should provide unambiguous input containing four-digit year values.

ORDER BY properly sorts YEAR values that have two-digit years.

Some functions like MIN() and MAX() convert a YEAR to a number. This means that a value with a two-digit year does not work properly with these functions. The fix in this case is to convert the TIMESTAMP or YEAR to four-digit year format.

String Types

The string types are CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET. This section describes how these types work and how to use them in your queries.

The CHAR and VARCHAR Types

The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.

The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.

The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

If strict SQL mode is not enabled and you assign a value to a CHAR or VARCHAR column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of non-space characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode.

For VARCHAR columns, excess trailing spaces are truncated prior to insertion and a warning is generated, regardless of the SQL mode in use. For CHAR columns, truncation of excess trailing spaces from inserted values is performed silently regardless of the SQL mode.

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.

The following table illustrates the differences between CHAR and VARCHAR by showing the result of storing various string values into CHAR(4) and VARCHAR(4) columns (assuming that the column uses a single-byte character set such as latin1):

Value CHAR(4) Storage Required VARCHAR(4) Storage Required
''
'    '
4 bytes
''
1 byte
'ab'
'ab  '
4 bytes
'ab'
3 bytes
'abcd'
'abcd'
4 bytes
'abcd'
5 bytes
'abcdefgh'
'abcd'
4 bytes
'abcd'
5 bytes

The values shown as stored in the last row of the table apply only when not using strict mode; if SQL is running in strict mode, values that exceed the column length are not stored, and an error results.

If a given value is stored into the CHAR(4) and VARCHAR(4) columns, the values retrieved from the columns are not always the same because trailing spaces are removed from CHAR columns upon retrieval. The following example illustrates this difference:

(sql sqlHandle {CREATE TABLE vc (v VARCHAR(4), c CHAR(4))}) Returns 0
(sql sqlHandle {INSERT INTO vc VALUES ('ab  ', 'ab  ')}) Returns 1
(sql sqlHandle {SELECT CONCAT('(', v, ')'), CONCAT('(', c, ')') FROM vc}) Returns
#(rec(|CONCAT('(', v, ')')|:Character:6 |CONCAT('(', c, ')')|:Character:6)| |CONCAT('(', v, ')')|:"(ab  )" |CONCAT('(', c, ')')|:"(ab)")

Values in CHAR and VARCHAR columns are sorted and compared according to the character set collation assigned to the column.

All SQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in SQL are compared without regard to any trailing spaces. For example:

(sql sqlHandle {CREATE TABLE names (myname CHAR(10), yourname VARCHAR(10))}) Returns 0
(sql sqlHandle {INSERT INTO names VALUES ('Monty ', 'Monty ')}) Returns 1
(sql sqlHandle {SELECT myname = 'Monty  ', yourname = 'Monty  ' FROM names}) Returns
#(rec(|myname = 'Monty  '|:Integer:1 |yourname = 'Monty  '|:Integer:1)| |myname = 'Monty  '|:1 |yourname = 'Monty  '|:1)

This is true for all SQL versions, and that it is not affected by the server SQL mode.

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

The BINARY and VARBINARY Types

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than non-binary strings. That is, they contain byte strings rather than character strings. This means that they have no character set, and sorting and comparison are based on the numeric values of the bytes in the values.

The allowable maximum length is the same for BINARY and VARBINARY as it is for CHAR and VARCHAR, except that the length for BINARY and VARBINARY is a length in bytes rather than in characters.

The BINARY and VARBINARY data types are distinct from the CHAR BINARY and VARCHAR BINARY data types. For the latter types, the BINARY attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary collation for the column character set to be used, and the column itself contains non-binary character strings rather than binary byte strings. For example, CHAR(5) BINARY is treated as CHAR(5) CHARACTER SET latin1 COLLATE latin1_bin, assuming that the default character set is latin1. This differs from BINARY(5), which stores 5-bytes binary strings that have no character set or collation.

If strict SQL mode is not enabled and you assign a value to a BINARY or VARBINARY column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For cases of truncation, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode.

When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte). Values are right-padded with 0x00 on insert, and no trailing bytes are removed on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space.

Example: For a BINARY(3) column, 'a ' becomes 'a \0' when inserted. 'a\0' becomes 'a\0\0' when inserted. Both inserted values remain unchanged when selected.

For VARBINARY, there is no padding on insert and no bytes are stripped on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space.

For those cases where trailing pad bytes are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad bytes will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a\0' causes a duplicate-key error.

You should consider the preceding padding and stripping characteristics carefully if you plan to use the BINARY data type for storing binary data and you require that the value retrieved be exactly the same as the value stored. The following example illustrates how 0x00-padding of BINARY values affects column value comparisons:

(sql sqlHandle {CREATE TABLE t (c BINARY(3))}) Returns 0
(sql sqlHandle {INSERT INTO t SET c = 'a'}) Returns 1
(sql sqlHandle {SELECT HEX(c), c = 'a', c = 'a\0\0' from t}) Returns
               #(rec(|HEX(c)|:Character:6 |c = 'a'|:Integer:1 |c = 'a\0\0'|:Integer:1)| |HEX(c)|:"610000" |c = 'a'|:0 |c = 'a\0\0'|:1)

If the value retrieved must be the same as the value specified for storage with no padding, it might be preferable to use VARBINARY or one of the BLOB data types instead.

The BLOB and TEXT Types

A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. These differ only in the maximum length of the values they can hold. The four TEXT types are TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. These correspond to the four BLOB types and have the same maximum lengths and storage requirements.

BLOB columns are treated as binary strings (byte strings). TEXT columns are treated as non-binary strings (character strings). BLOB columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values. TEXT columns have a character set, and values are sorted and compared based on the collation of the character set.

If strict SQL mode is not enabled and you assign a value to a BLOB or TEXT column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of non-space characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode.

Truncation of excess trailing spaces from values to be inserted into TEXT columns always generates a warning, regardless of the SQL mode.

If a TEXT column is indexed, index entry comparisons are space-padded at the end. This means that, if the index requires unique values, duplicate-key errors will occur for values that differ only in the number of trailing spaces. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error. This is not true for BLOB columns.

In most respects, you can regard a BLOB column as a VARBINARY column that can be as large as you like. Similarly, you can regard a TEXT column as a VARCHAR column. BLOB and TEXT differ from VARBINARY and VARCHAR in the following ways:

LONG and LONG VARCHAR map to the MEDIUMTEXT data type. This is a compatibility feature. If you use the BINARY attribute with a TEXT data type, the column is assigned the binary collation of the column character set.

Because BLOB and TEXT values can be extremely long, you might encounter some constraints in using them:

Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.

In some cases, it may be desirable to store binary data such as media files in BLOB or TEXT columns. You may find SQL's string handling functions useful for working with such data. For security and other reasons, it is usually preferable to do so using application code rather than allowing application users the FILE privilege.

The ENUM Type

An ENUM is a string object with a value chosen from a list of allowed values that are enumerated explicitly in the column specification at table creation time.

An enumeration value must be a quoted string literal; it may not be an expression, even one that evaluates to a string value. For example, you can create a table with an ENUM column like this:

(sql sqlHandle {CREATE TABLE sizes (name ENUM('small', 'medium', 'large'))})

However, this version of the previous CREATE TABLE statement does not work:

(sql sqlHandle {CREATE TABLE sizes (c1 ENUM('small', CONCAT('med','ium'), 'large'))})

You also may not employ a user variable as an enumeration value. This pair of statements do not work:

(sql sqlHandle {SET @mysize = 'medium'})
(sql sqlHandle {CREATE TABLE sizes (name ENUM('small', @mysize, 'large'))})

If you wish to use a number as an enumeration value, you must enclose it in quotes.

Duplicate values in the definition cause a warning, or an error if strict SQL mode is enabled.

The value may also be the empty string ('') or NULL under certain circumstances:

Each enumeration value has an index:

For example, a column specified as ENUM('one', 'two', 'three') can have any of the values shown here. The index of each value is also shown:

Value Index
NULL NULL
'' 0
'one' 1
'two' 2
'three' 3

An enumeration can have a maximum of 65,535 elements.

Trailing spaces are automatically deleted from ENUM member values in the table definition when a table is created.

When retrieved, values stored into an ENUM column are displayed using the lettercase that was used in the column definition. Note that ENUM columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column.

If you retrieve an ENUM value in a numeric context, the column value's index is returned. For example, you can retrieve numeric values from an ENUM column like this:

(sql sqlHandle {SELECT enum_col+0 FROM tbl_name})

If you store a number into an ENUM column, the number is treated as the index into the possible values, and the value stored is the enumeration member with that index. (However, this does not work with LOAD DATA, which treats all input as strings.) If the numeric value is quoted, it is still interpreted as an index if there is no matching string in the list of enumeration values. For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing. For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3:

numbers ENUM('0','1','2')

If you store 2, it is interpreted as an index value, and becomes '1' (the value with index 2). If you store '2', it matches an enumeration value, so it is stored as '2'. If you store '3', it does not match any enumeration value, so it is treated as an index and becomes '2' (the value with index 3).

(sql sqlHandle {INSERT INTO t (numbers) VALUES(2),('2'),('3')})
(sql sqlHandle {SELECT * FROM t}) Returns
+---------+
| numbers |
+---------+
| 1       | 
| 2       | 
| 2       | 
+---------+

ENUM values are sorted according to the order in which the enumeration members were listed in the column specification. (In other words, ENUM values are sorted according to their index numbers.) For example, 'a' sorts before 'b' for ENUM('a', 'b'), but 'b' sorts before 'a' for ENUM('b', 'a'). The empty string sorts before non-empty strings, and NULL values sort before all other enumeration values. To prevent unexpected results, specify the ENUM list in alphabetical order. You can also use GROUP BY CAST(col AS CHAR) or GROUP BY CONCAT(col) to make sure that the column is sorted lexically rather than by index number.

Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For ENUM values, the cast operation causes the index number to be used.

If you want to determine all possible values for an ENUM column, use SHOW COLUMNS FROM tbl_name LIKE enum_col and parse the ENUM definition in the Type column of the output.

The SET Type

A SET is a string object that can have zero or more values, each of which must be chosen from a list of allowed values specified when the table is created. SET column values that consist of multiple set members are specified with members separated by commas (","). A consequence of this is that SET member values should not themselves contain commas.

For example, a column specified as SET('one', 'two') NOT NULL can have any of these values:

''

'one'

'two'

'one,two'

A SET can have a maximum of 64 different members.

Duplicate values in the definition cause a warning, or an error if strict SQL mode is enabled.

Trailing spaces are automatically deleted from SET member values in the table definition when a table is created.

When retrieved, values stored in a SET column are displayed using the lettercase that was used in the column definition. Note that SET columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column.

SQL stores SET values numerically, with the low-order bit of the stored value corresponding to the first set member. If you retrieve a SET value in a numeric context, the value retrieved has bits set corresponding to the set members that make up the column value. For example, you can retrieve numeric values from a SET column like this:

(sql sqlHandle {SELECT set_col+0 FROM tbl_name})

If a number is stored into a SET column, the bits that are set in the binary representation of the number determine the set members in the column value. For a column specified as SET('a','b','c','d'), the members have the following decimal and binary values:

SET Member Decimal Value Binary Value
'a' 1 0001
'b' 2 0010
'c' 4 0100
'd' 8 1000

If you assign a value of 9 to this column, that is 1001 in binary, so the first and fourth SET value members 'a' and 'd' are selected and the resulting value is 'a,d'.

For a value containing more than one SET element, it does not matter what order the elements are listed in when you insert the value. It also does not matter how many times a given element is listed in the value. When the value is retrieved later, each element in the value appears once, with elements listed according to the order in which they were specified at table creation time. For example, suppose that a column is specified as SET('a','b','c','d'):

(sql sqlHandle {CREATE TABLE myset (col SET('a', 'b', 'c', 'd'))})

If you insert the values 'a,d', 'd,a', 'a,d,d', 'a,d,a', and 'd,a,d':

(sql sqlHandle {INSERT INTO myset (col) VALUES ('a,d'), ('d,a'), ('a,d,a'), ('a,d,d'), ('d,a,d')}) Returns 5

Then all of these values appear as 'a,d' when retrieved:

(sql sqlHandle {SELECT col FROM myset}) Returns
#(rec(5 col:Character:7)| [0](col:"a,d") [1](col:"a,d") [2](col:"a,d") [3](col:"a,d") [4](col:"a,d"))

If you set a SET column to an unsupported value, the value is ignored and a warning is issued:

(sql sqlHandle {INSERT INTO myset (col) VALUES ('a,d,d,s')}) Returns 1
(sql sqlHandle {SHOW WARNINGS}) Returns
#(rec(Level:Character:7 Code:Long:1 Message:Object:1)| Level:"Warning" Code:1265 Message:"Data truncated for column 'col' at row 1")
(sql sqlHandle {SELECT col FROM myset}) Returns
#(rec(6 col:Character:7)| [0](col:"a,d") [1](col:"a,d") [2](col:"a,d") [3](col:"a,d") [4](col:"a,d") [5](col:"a,d"))

If strict SQL mode is enabled, attempts to insert invalid SET values result in an error.

SET values are sorted numerically. NULL values sort before non-NULL SET values.

Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For SET values, the cast operation causes the numeric value to be used.

Normally, you search for SET values using the FIND_IN_SET() function or the LIKE operator:

(sql sqlHandle {SELECT * FROM tbl_name WHERE FIND_IN_SET('value',set_col)>0})
(sql sqlHandle {SELECT * FROM tbl_name WHERE set_col LIKE '%value%'})

The first statement finds rows where set_col contains the value set member. The second is similar, but not the same: It finds rows where set_col contains value anywhere, even as a substring of another set member.

The following statements also are legal:

(sql sqlHandle {SELECT * FROM tbl_name WHERE set_col & 1})
(sql sqlHandle {SELECT * FROM tbl_name WHERE set_col = 'val1,val2'})

The first of these statements looks for values containing the first set member. The second looks for an exact match. Be careful with comparisons of the second type. Comparing set values to 'val1,val2' returns different results than comparing values to 'val2,val1'. You should specify the values in the same order they are listed in the column definition.

If you want to determine all possible values for a SET column, use SHOW COLUMNS FROM tbl_name LIKE set_col and parse the SET definition in the Type column of the output.

Data Type Storage Requirements

The storage requirements for each of the data types supported by SQL are listed here by category.

The maximum size of a row in a MyISAM table is 65,535 bytes. (However, each BLOB or TEXT column contributes only 9-12 bytes toward this size.) This limitation may be shared by other storage engines as well.

Storage Requirements for Numeric Types

Data Type Storage Required
TINYINT 1 byte
SMALLINT 2 bytes
MEDIUMINT 3 bytes
INT, INTEGER 4 bytes
BIGINT 8 bytes
FLOAT(p) 4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT 4 bytes
DOUBLE [PRECISION], REAL 8 bytes
DECIMAL(M,D), NUMERIC(M,D) Varies; see following discussion
BIT(M) approximately (M+7)/8 bytes

Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the "leftover" digits require some fraction of four bytes. The storage required for excess digits is given by the following table:

Leftover Digits Number of Bytes
0 0
1 1
2 1
3 2
4 2
5 3
6 3
7 4
8 4

Storage Requirements for Date and Time Types

Data Type Storage Required
DATE 3 bytes
TIME 3 bytes
DATETIME 8 bytes
TIMESTAMP 4 bytes
YEAR 1 byte

The storage requirements shown in the table arise from the way that SQL represents temporal values:

Storage Requirements for String Types

In the following table, M represents the declared column length in characters for non-binary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.

Data Type Storage Required
CHAR(M) M * w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set
BINARY(M) M bytes, 0 <= M <= 255
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 - 255 bytes, L + 2 bytes if values may require more than 255 bytes
TINYBLOB, TINYTEXT L + 1 bytes, where L < 28
BLOB, TEXT L + 2 bytes, where L < 216
MEDIUMBLOB, MEDIUMTEXT L + 3 bytes, where L < 224
LONGBLOB, LONGTEXT L + 4 bytes, where L < 232
ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
SET('value1','value2',...) 1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)

Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value.

To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character.

VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors:

For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).

Note: The effective maximum number of bytes that can be stored in a VARCHAR or VARBINARY column is subject to the maximum row size of 65,535 bytes, which is shared among all columns. For a VARCHAR column that stores multi-byte characters, the effective maximum number of characters is less. For example, utf8 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared to be a maximum of 21,844 characters.

The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values.

The size of a SET object is determined by the number of different set members. If the set size is N, the object occupies (N+7)/8 bytes, rounded up to 1, 2, 3, 4, or 8 bytes. A SET can have a maximum of 64 members.

Choosing the Right Type for a Column

For optimum storage, you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. Of the types that represent all the required values, this type uses the least amount of storage.

All basic calculations (+, -, *, and /) with DECIMAL columns are done with precision of 65 decimal (base 10) digits.

If accuracy is not too important or if speed is the highest priority, the DOUBLE type may be good enough. For high precision, you can always convert to a fixed-point type stored in a BIGINT. This allows you to do all calculations with 64-bit integers and then convert results back to floating-point values as necessary.

PROCEDURE ANALYSE can be used to obtain suggestions for optimal column data types.

Using Data Types from Other Database Engines

To facilitate the use of code written for SQL implementations from other vendors, SQL maps data types as shown in the following table. These mappings make it easier to import table definitions from other database systems into SQL:

Other Vendor Type SQL Type
BOOL TINYINT
BOOLEAN TINYINT
CHARACTER VARYING(M) VARCHAR(M)
FIXED DECIMAL
FLOAT4 FLOAT
FLOAT8 DOUBLE
INT1 TINYINT
INT2 SMALLINT
INT3 MEDIUMINT
INT4 INT
INT8 BIGINT
LONG VARBINARY MEDIUMBLOB
LONG VARCHAR MEDIUMTEXT
LONG MEDIUMTEXT
MIDDLEINT MEDIUMINT
NUMERIC DECIMAL

Data type mapping occurs at table creation time, after which the original type specifications are discarded. If you create a table with types used by other vendors and then issue a DESCRIBE tbl_name statement, SQL reports the table structure using the equivalent SQL types. For example:

(sql sqlHandle {CREATE TABLE t (a BOOL, b FLOAT8, c LONG VARCHAR, d NUMERIC)}) Returns 0
(sql sqlHandle {DESCRIBE t}) Returns
#(rec(4 Field:Character:64 Type:Object:1 Null:Character:3 Key:Character:3 Default:Object:1 Extra:Character:27)| 
[0](Field:"a" Type:"tinyint(1)" Null:"YES" Key:"" Default:#void Extra:"") 
[1](Field:"b" Type:"double" Null:"YES" Key:"" Default:#void Extra:"") 
[2](Field:"c" Type:"mediumtext" Null:"YES" Key:"" Default:#void Extra:"") 
[3](Field:"d" Type:"decimal(10,0)" Null:"YES" Key:"" Default:#void Extra:""))