What does the undocumented uncompress function do – sqlservercentral electricity generation efficiency


Another thing that IntelliSense does, even though it was certainly not intentional, is show you the names of (some) undocumented functions. Many of these undocumented functions won’t let you execute them, so not much to investigate there 1 . But, several of these undocumented built-in functions can be executed. One of them is UNCOMPRESS. This function is not to be confused with DECOMPRESS, the companion to COMPRESS, which are actually GUnzip and GZip, respectively (and were introduced in SQL Server 2016, if you haven’t seen them before).

I did a bit of searching around, and all I could find were a couple of references to using it on the ctext column in sys.syscomments, but only if the status column (a bit-masked value) had the “2” bit set (i.e. status 2 = 2 ). The first two entries in the “ Mentions of UNCOMPRESS()” section at the end of this post are books that contain this same info. With only that one clue to go on, I used the following query electricity images cartoon to find some data that was meant to be passed into this function:

I tried in [master], [msdb], and even [MSSQLSystemResource] (in single-user mode) but no rows were ever returned (I tested in SQL Server 2017). Between being used for a deprecated (as of SQL Server 2005) system compatibility view, and that view not even returning rows that would make use of this function, it seems safe to conclude that this function is obsolete in addition to being undocumented. Why Document?

• because it’s undocumented we don’t have much (or any) info on it, AND because it shows up in SSMS IntelliSense people can find it, AND because it can be executed, people might attempt to use it in their code. Therefore, it’s important to understand how it works and why we shouldn’t use it (aside from it being “undocumented”, which means unsupported, which is enough to convince zyklon b gas canister for sale some folks, but not everyone).

With no clear indications of what the UNCOMPRESS function does, we can at least pass in some simple values to see what comes back, and see if we can make sense of the output. For the following tests, please keep in mind that “8-bit” refers power outage houston reliant to the VARCHAR, CHAR, and TEXT (deprecated) datatypes. And, “16-bit” refers to the NVARCHAR, NCHAR, NTEXT (deprecated),and XML datatypes. Single Character Tests

The first query passes a VARCHAR upper-case “A” (having a value of 0x41) into UNCOMPRESS , and gets back the same character, but with an extra byte of 0x00 added on. This should make sense since this function returns NVARCHAR, which is UTF-16 (characters are either 2 bytes or 4 bytes). The Unicode Code Point is actually U+0041, but SQL Server / Windows / .NET use Little Endian, so the bytes are in reverse order, hence 4100 2 . At this point, the UNCOMPRESS function is doing just what the CONVERT function does, so it seems a little redundant.

• Every single byte going into UNCOMPRESS comes back as UTF-16 LE (with the extra 0x00 byte added on). Hence, passing in a character that is already in UTF-16 LE encoding (e.g. “D” being the two bytes 0x44 and 0x00), will have each of its two bytes converted into UTF-16 LE, leaving us with 0x4400 and 0x0000, or 0x44000000 (as you can see in the “HexUncompressedDD” field).

Now that we know that we are dealing with single-byte characters, which single-byte characters specifically are they? Are they VARCHAR characters of various code pages? Are they VARCHAR characters from one particular code page? Are they NVARCHAR / UTF-16 characters in the U+0000 through U+00FF range that all have a trailing byte of 0x00? Something else perhaps?

Given that characters with values in the range of 0 – 127 (decimal) / 0x00 – 0x7F (hex) are the same across all code pages / encodings that can be represented in SQL Server, only gas efficient cars 2016 testing with those (i.e. US English, digits 0 – 9, and some punctuation) often hides / obscures important functional differences. So, we need to test values 128 – 255 / 0x80 – 0xFF across several different code pages / encodings. Create and Populate Table

The following queries will set up the test data that we need to see (or at least confirm) what is actually happening. Code page 1252 is Latin1 (we are looking at this because it’s used in several collations: anything with “Latin1_General” in the name, French, etc), and code page 1255 is Hebrew (which is distinctly different from 1252, so it will be easy to see differences). Finally, UTF-16 is the encoding used by NVARCHAR data. For each row, we are inserting a single byte in the range of 0x00 – 0xFF into each column. We can then easily compare the resulting character of each byte with the output of UNCOMPRESS .

The query below will show us the character that each byte represents in each of the three encodings. It also feeds that same byte to the UNCOMPRESS function, and shows the underlying byte representation of each character after that byte is stored in the NVARCHAR column and passed into the UNCOMPRESS function 3 gas laws. And, because the characters for each byte in the range of 0x00 – 0x7F o goshi are the same across the encodings, the query only returns the 0x80 – 0xFF range (you can easily comment out the WHERE clause to see the boring 0x00 – 0x7F range).

Mitch Schroeter suggested to me that perhaps the UNCOMPRESS function was intended to work on data coming directly from Access 2000 (or newer) and compressed via the WITH COMPRESSION option of the CREATE TABLE statement. The documentation for CREATE TABLE (for Microsoft Access, not SQL Server) states the following towards the end of the Remarks section:

The WITH COMPRESSION attribute was added for CHARACTER columns because of the change to the Unicode character representation format. Unicode characters uniformly require two bytes for each character. For existing Microsoft Jet databases that contain predominately character data, this could mean that the database file would nearly double in size when converted to the Microsoft Access database engine format. However, Unicode representation of many character sets, those formerly denoted electricity kwh to unit converter as Single-Byte Character Sets (SBCS), can easily be compressed to a single byte. If you define a CHARACTER column with this attribute, data will automatically be compressed as it is stored and uncompressed when retrieved from the column.

While this does sound similar, it is not the exact same compression that the UNCOMPRESS function expects. There is some overlap in the behavior, but the UNCOMPRESS function is more simplistic than Access’s “Unicode Compression” (that term is in quotes because it is not true Unicode Compression). If Access was doing nothing more than removing the “0x00” bytes, then there would be no way to determine when to add them back in upon uncompressing; very few of the 65,536 two-byte code points have 0x00 bytes, so any algorithm will need to deal with non-compressible code points. There needs to be an indicator of some sort to tell the parser when a byte should be prefixed with a 0x00, appended with 0x00, or left alone. For example, if it encounters two bytes — 0xD5E2 — should the next two bytes of output be: 0x00D5, 0xD500, or 0xD5E2? We need more info to figure this out.

The UNCOMPRESS function does nothing more than add a 0x00 byte to each byte passed in, the result of which is valid UTF-16 Little Endian (i.e. NVARCHAR ) data. Given the various limitations of this function, and the fact that you would have to write your own function to “compress” NVARCHAR data into this format (a simple CONVERT won’t work unless you can guarantee that none of the characters found in the 0x80 – 0x9F range exist in the input data):

Does this function do essentially the same 7 gas station thing as UTF-8? Not really. UTF-8 only has the first 128 characters / Code Points (i.e. 0x00 – 0x7F) as single-byte characters. And if that is all your data is, then you can simply convert to VARCHAR. And while UTF-8 is supported natively starting in SQL Server 2019, there are limited scenarios where you should UTF-8 (within SQL Server, that is). And, even when you do save space, you will most likely sacrifice performance (to varying degrees). For more info on UTF-8 support in SQL Server, please see:

The one potentially valid use for UNCOMPRESS is if you have ISO-8859-1-encoded data, that is already in binary format, and is no more 4000 bytes / characters. This would work due to the first 256 Unicode Code Points being the ISO-8859-1 character set. However, even if you have data that fits this description, it would be better to convert it to Unicode / UTF-16 Little Endian prior to importing it into SQL Server. Or, if the data does not contain any bytes in the range of 0x80 – 0x9F, then gas x directions just import it into a VARCHAR column that is using any of the various collations associated with code page 1252. Mentions of UNCOMPRESS()