Design tables using Synapse SQL in Azure Synapse Analytics

Understanding Different Types of Tables in Azure Synapse Analytics - Table persistence

In Azure Synapse Analytics, tables store data in different ways based on their types. These can either store data permanently, temporarily, or in an external data store. Here’s a breakdown of how tables manage data storage:

1. Regular Tables(Dedicated pool)

Internal tables store data within Azure Synapse and support Massively Parallel Processing (MPP). These tables are ideal for large-scale data warehousing and analytics. There are three types of internal table distributions:

  • Hash Distributed Tables: Data is distributed across nodes using a hash function applied to a specific column. This is suitable for large tables used in joins and aggregations.
  • Round-robin Distributed Tables: Data is evenly distributed across all nodes. These are used when no specific distribution logic is needed.
  • Replicated Tables: Data is copied to all nodes. This is best for small lookup tables used in frequent joins.

SQL Example:

    CREATE TABLE my_table (
        id INT,
        name NVARCHAR(100)
    )
    WITH (
            DISTRIBUTION = HASH(id),
            CLUSTERED COLUMNSTORE INDEX
    );

2. Temporary Tables

Temporary tables are session-specific tables that are automatically deleted when the session ends. They are useful for storing intermediate results or simplifying complex queries. There are two types:

  • Local Temporary Tables: Prefixed with #, these tables are visible only in the session that created them.
  • Global Temporary Tables: Prefixed with ##, these tables are visible to all sessions.

SQL Example:

    CREATE TABLE #temp_table (
        id INT,
        name NVARCHAR(100)
    );

3. External Tables(serverless SQL pool)

External tables reference data stored outside of Azure Synapse, such as in Azure Data Lake or Blob Storage. These tables are typically used in serverless SQL pools, allowing you to query large datasets without importing them into Synapse.

SQL Example:

   CREATE EXTERNAL TABLE external_table (
        id INT,
        name NVARCHAR(100)
    )
    WITH (
       DATA_SOURCE = my_data_source,
        LOCATION = '/path/to/data/',
        FILE_FORMAT = my_file_format
    );

4. Views

Views in Azure Synapse are virtual tables created from the results of a query. They don’t store data themselves but provide a way to simplify query logic. Materialized Views store query results and improve performance for repeated queries.

SQL Example:

    CREATE VIEW my_view AS
    SELECT id, name
    FROM my_table;

Conclusion

By leveraging these types of tables, Azure Synapse allows flexible data storage solutions depending on the specific use case, performance needs, and cost-efficiency considerations.