Azure SQL DW- Polybase


What is Polybase?

Polybase is a technology that accesses and combines both non-relational and relational data, all from within SQL Server. It allows you to run queries on external data in Hadoop or Azure blob storage.

  • It does not require to install additional software.
  • You can leverage T-SQL syntax for querying external data.
  • No knowledge about HADOOP or Azure is required.

PolyBase Scenarios:

  • Access data in Hadoop database with T-SQL.
  • Access Azure blob storage with T-SQL.
  • Import data from Hadoop or blobs as regular SQL Server tables.
  • Export data to Hadoop or Azure blob storage.
  • Integrate well BI tools – SSRS, SSAS, PowerPivot, PowerQuery, PowerView, Tableau, Microstrategy or Cognos.

PolyBase uses external tables to access data in Azure blob storage. Since data is not stored within Azure SQL DW, PolyBase handles authentication to external data by using a database-scoped credential.

Using PolyBase in Azure SQL DW Steps:

  • Create Master Key – to encrypt secret of your database scoped credential

    CREATE MASTER KEY;

  • Create Database Scoped Credential – to specify authentication information for your Azure storage account.

    CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
    WITH
    IDENTITY = ‘polybasepoc’,
    SECRET = ‘secretkeygenerated’
    ;

  • Create External Data Source – to specify the location of your Azure blob storage.

    CREATE EXTERNAL DATA SOURCE AzureStorage
    WITH (
    TYPE = HADOOP,
    LOCATION = ‘wasbs://sample@storage.blob.core.windows.net’,
    CREDENTIAL = AzureStorageCredential
    );

  • Create External File Format – to specify the format of your data.

    CREATE EXTERNAL FILE FORMAT DelimitedText
    WITH (
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS (FIELD_TERMINATOR = ‘[||]’, STRING_DELIMITER = ‘[~~]’),
    DATA_COMPRESSION = ‘org.apache.hadoop.io.compress.GzipCodec’
    );

  • Create External Table – to specify the table definition and location of the data.

    CREATE EXTERNAL TABLE ExternalSource_CC (
    [CC_ID] [nvarchar](50),
    [CC_CODE] [nvarchar](100),
    [CC_NAME] [nvarchar](100)
    )
    WITH (
    LOCATION=’/SourceFiles/CC/’,
    DATA_SOURCE=AzureStorage,
    FILE_FORMAT=DelimitedText
    );

How to use?

select * from ExternalSource_CC

Limitations of Polybase:

  • PolyBase works with defined structure specified in External Table & File Format.
  • PolyBase does not allow to exclude Text Qualifiers.
  • PolyBase provides row number for failed records but does not give cell level failure information.
  • PolyBase does not recognize delimiter within data and in case delimiter is repeated in text, fails to read data.
  • Polybase only works with delimited text file, no other tabular format supported except Hadoop.
  • At present Polybase supports loading data files that have been UTF-8 encoded.

 

Leave a comment