SQL Server & PostgreSQL DBA & Dev Tips | MyTechMantra

COMPRESS and DECOMPRESS T-SQL Enhancement in SQL Server 2016

Learn how to implement native GZIP compression in SQL Server 2016 using COMPRESS and DECOMPRESS functions. This guide covers syntax, storage benefits, and performance trade-offs when handling large text, JSON, or XML data. Discover how to reduce storage footprints and minimize network I/O while maintaining high-performance data retrieval for your high-concurrency SQL Server environments.

What are the COMPRESS and DECOMPRESS functions in SQL Server?

The COMPRESS function in SQL Server 2016 and higher uses the GZIP algorithm to convert input data into a compressed VARBINARY(MAX) format, significantly reducing storage footprint for large strings, JSON, and XML. The DECOMPRESS function reverses this process. Because DECOMPRESS returns binary data, you must use an explicit CAST or CONVERT (e.g., CAST(DECOMPRESS(col) AS VARCHAR(MAX))) to retrieve the original human-readable value..

Series

You are reading Part 7 of the T-SQL Enhancements in SQL Server 2016 for Developers and DBAs. View All 10 Parts

Continue exploring T-SQL Enhancements in SQL Server 2016 for Developers and DBAs:

COMPRESS and DECOMPRESS are native T-SQL functions introduced in SQL Server 2016 that allow developers to handle data compression directly within the database engine using the GZIP algorithm. Unlike Row or Page compression which happens at the storage layer, these functions allow for application-level compression, meaning data can be stored in a compact binary format and only expanded when explicitly called.

This enhancement is a game-changer for environments dealing with massive amounts of semi-structured data like JSON, XML, or logs, as it significantly reduces the storage footprint and minimizes Network I/O during data transmission.

Quick Summary: Features and Benefits of T-SQL Compression

Using native compression in SQL Server 2016 provides a specific set of advantages depending on your data type.

Swipe left to view more ↔️
Feature Primary Benefit Best Use Case
GZIP Algorithm Reduces storage by 60-80% for text. Large JSON, XML, or Log strings.
Column-Level Control Granular control without locking tables. High-traffic OLTP tables.
Native T-SQL Reduces Network I/O and Cloud costs. Azure SQL and Hybrid environments.
CAST Requirement Ensures data integrity on retrieval. Standardizing application output.

COMPRESS vs. Row/Page Compression

How is this different from Row or Page Compression?

While SQL Server has offered Row and Page compression for years, the COMPRESS and DECOMPRESS functions are fundamentally different. Standard table compression is handled automatically by the Database Engine at the storage level. In contrast, these T-SQL functions are Application-Level compression tools.

Use COMPRESS when you want to save space on specific large text fields (like audit logs or JSON payloads) without the CPU overhead of compressing the entire table. It is also the only way to ensure data remains compressed while traveling over the network to your application, potentially reducing Network Latency in high-traffic environments.

How GZIP Compression Works in SQL Server

The COMPRESS function takes an input (string or binary) and transforms it into a VARBINARY(MAX) type using the GZIP algorithm. Conversely, DECOMPRESS takes that binary data and expands it back to its original state.

Because the compression happens at the column level, you have granular control over exactly what data gets shrunk. This is particularly useful for:

  • Cloud Cost Management: Reducing the size of data in Azure SQL Database to lower storage tier costs.
  • Offloading CPU: Allowing the SQL engine to store compressed data while the application handles the logic of when to expand it.

T-SQL COMPRESS and DECOMPRESS Syntax & Examples

To use these functions effectively, you must understand that DECOMPRESS always returns a VARBINARY(MAX) result. To see the original text, you must explicitly CAST or CONVERT the result back to its original data type (e.g., VARCHAR or NVARCHAR).

Example: COMPRESS and DECOMPRESS function of SQL Server 2016

The following script demonstrates the lifecycle of a string through the compression engine:

-- Example COMPRESS and DECOMPRESS function of SQL Server 2016
SELECT 
          'MyTechMantra' AS [Input Value To Compress]
          ,COMPRESS('SearchSQLServer') AS [Compressed Value]
          ,DECOMPRESS(COMPRESS('MyTechMantra')) AS [Decompressed Value]
          ,CAST(DECOMPRESS (COMPRESS ('MyTechMantra')) AS VARCHAR(MAX)) AS [Derive Actual Input Value]
GO

Key Observation: In the example above, the column [Decompressed Value] will appear as hexadecimal “gibberish” to the human eye. The [Derive Actual Input Value] column shows the corrected result after applying the CAST function.

⚠️ DBA WARNING: Performance Impact

While COMPRESS is excellent for storage, do not use it on columns that are frequently used in WHERE clauses or JOIN conditions. Because the data is stored as a binary blob, SQL Server must decompress the value for every single row to evaluate the filter. In large tables, this will lead to massive CPU spikes and slow query performance. Always decompress at the application layer or only on the final SELECT list.

Dealing with “Binary Filtering” (The Workaround)

Can you filter by a compressed column?

A common challenge with COMPRESS is that you cannot search for a string like WHERE CompressedColumn LIKE '%SearchTerm%' because the data is stored as a binary GZIP blob.

To maintain performance while benefiting from compression, the best practice is to use a Shadow Column or a Persisted Computed Column. By storing a small, uncompressed “Key Field” (like a CategoryID or a short Hash) alongside the compressed blob, you can filter your data efficiently without forcing the SQL engine to decompress every row during a scan.

Use Cases: T-SQL Compression vs. Page Compression

While SQL Server has offered Row and Page compression for years, the COMPRESS function serves a different purpose:

  1. JSON and XML Blobs: Page compression is often ineffective for large LOB (Large Object) data. Native COMPRESS can often shrink these by 60-80%.
  2. Archive Tables: For data that is rarely accessed but must be kept for compliance, storing it in a compressed binary format saves massive amounts of disk space.
  3. Network Bandwidth: Compressed data travels faster over the network. If your application resides in a different data center than your SQL Server, sending compressed VARBINARY can improve query response times.

Performance Considerations: CPU vs. Storage

Before implementing COMPRESS across your entire database, consider the trade-offs:

  • CPU Overhead: Compression and decompression are CPU-intensive tasks. If your server is already running at 90% CPU utilization, adding these functions may cause bottlenecks.
  • Indexing Limitations: You cannot create a useful index on a VARBINARY(MAX) compressed column. If you need to search the data, you should store the searchable keys in standard columns and compress only the “payload” data.

Conclusion

Implementing the COMPRESS and DECOMPRESS functions in SQL Server 2016 is a strategic move for modernizing data storage. By shifting from plain-text storage to GZIP-compressed binary, you can significantly reduce storage costs and optimize I/O performance. However, always remember to cast your decompressed results to ensure data readability within your T-SQL queries.

Next Step: In Part 9 of this series, we will explore STRING_SPLIT and STRING_ESCAPE, two essential functions that simplify how we handle delimited strings and JSON formatting in T-SQL.


Frequently Asked Questions (FAQs) on COMPRESS and DECOMPRESS

1. What algorithm does SQL Server use for the COMPRESS function?

SQL Server uses the standard GZIP algorithm. This means that data compressed in SQL Server can technically be decompressed by other applications that support GZIP.

2. Does DECOMPRESS automatically return the original data type?

No. DECOMPRESS always returns VARBINARY(MAX). You must use CAST or CONVERT to transform it back to VARCHAR, NVARCHAR, or other types.

3. Can I compress an existing column using an ALTER TABLE command?

Not directly. You would typically add a new VARBINARY(MAX) column, update it using the COMPRESS function on the old column, and then drop the original column.

4. Is there a limit to how much data I can compress?

The limit is the same as the VARBINARY(MAX) data type, which is 2GB.

5. How does this affect the Transaction Log?

Since the data is smaller, it results in smaller log records during INSERT and UPDATE operations, which can actually improve log throughput.

6. Can I use COMPRESS on numeric data types?

Yes, but you must first convert the numeric data to a string or binary format. However, the space savings on small integers are usually negligible.

7. Does this function work in Azure SQL Database?

Yes, COMPRESS and DECOMPRESS are fully supported in Azure SQL Database and Managed Instance.

8. Is it better than Columnstore compression?

They serve different purposes. Columnstore is for analytical processing (OLAP), while COMPRESS/DECOMPRESS is ideal for row-level operations (OLTP) on large text fields.

9. Will COMPRESS help with memory usage?

Yes, because the data remains compressed in the Buffer Pool (memory), it allows you to fit more data into the same amount of RAM.

Enjoyed this guide? Continue exploring the 2016 T-SQL Series:

Ashish Kumar Mehta

Ashish Kumar Mehta is a database manager, trainer and technical author. He has more than a decade of IT experience in database administration, performance tuning, database development and technical training on Microsoft SQL Server from SQL Server 2000 to SQL Server 2014. Ashish has authored more than 325 technical articles on SQL Server across leading SQL Server technology portals. Over the last few years, he has also developed and delivered many successful projects in database infrastructure; data warehouse and business intelligence; database migration; and upgrade projects for companies such as Hewlett-Packard, Microsoft, Cognizant and Centrica PLC, UK. He holds an engineering degree in computer science and industry standard certifications from Microsoft including MCITP Database Administrator 2005/2008, MCDBA SQL Server 2000 and MCTS .NET Framework 2.0 Web Applications.

Add comment

AdBlocker Message

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.


Newsletter Signup! Join 15,000+ Professionals




Be Social! Like & Follow Us

Follow us

Don't be shy, get in touch. We love meeting interesting people and making new friends.

Advertisement