Azure SQL DW- Polybase

What is Polybase?

Polybase is a technology that accesses and combines both non-relational and relational data, all from within SQL Server. It allows you to run queries on external data in Hadoop or Azure blob storage.

  • It does not require to install additional software.
  • You can leverage T-SQL syntax for querying external data.
  • No knowledge about HADOOP or Azure is required.

PolyBase Scenarios:

  • Access data in Hadoop database with T-SQL.
  • Access Azure blob storage with T-SQL.
  • Import data from Hadoop or blobs as regular SQL Server tables.
  • Export data to Hadoop or Azure blob storage.
  • Integrate well BI tools – SSRS, SSAS, PowerPivot, PowerQuery, PowerView, Tableau, Microstrategy or Cognos.

PolyBase uses external tables to access data in Azure blob storage. Since data is not stored within Azure SQL DW, PolyBase handles authentication to external data by using a database-scoped credential.

Using PolyBase in Azure SQL DW Steps:

  • Create Master Key – to encrypt secret of your database scoped credential

    CREATE MASTER KEY;

  • Create Database Scoped Credential – to specify authentication information for your Azure storage account.

    CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
    WITH
    IDENTITY = ‘polybasepoc’,
    SECRET = ‘secretkeygenerated’
    ;

  • Create External Data Source – to specify the location of your Azure blob storage.

    CREATE EXTERNAL DATA SOURCE AzureStorage
    WITH (
    TYPE = HADOOP,
    LOCATION = ‘wasbs://sample@storage.blob.core.windows.net’,
    CREDENTIAL = AzureStorageCredential
    );

  • Create External File Format – to specify the format of your data.

    CREATE EXTERNAL FILE FORMAT DelimitedText
    WITH (
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS (FIELD_TERMINATOR = ‘[||]’, STRING_DELIMITER = ‘[~~]’),
    DATA_COMPRESSION = ‘org.apache.hadoop.io.compress.GzipCodec’
    );

  • Create External Table – to specify the table definition and location of the data.

    CREATE EXTERNAL TABLE ExternalSource_CC (
    [CC_ID] [nvarchar](50),
    [CC_CODE] [nvarchar](100),
    [CC_NAME] [nvarchar](100)
    )
    WITH (
    LOCATION=’/SourceFiles/CC/’,
    DATA_SOURCE=AzureStorage,
    FILE_FORMAT=DelimitedText
    );

How to use?

select * from ExternalSource_CC

Limitations of Polybase:

  • PolyBase works with defined structure specified in External Table & File Format.
  • PolyBase does not allow to exclude Text Qualifiers.
  • PolyBase provides row number for failed records but does not give cell level failure information.
  • PolyBase does not recognize delimiter within data and in case delimiter is repeated in text, fails to read data.
  • Polybase only works with delimited text file, no other tabular format supported except Hadoop.
  • At present Polybase supports loading data files that have been UTF-8 encoded.

 

Advertisements

Azure SQL Data Warehouse – Pros & Cons

What is Azure SQL Data Warehouse?
Azure SQL Data Warehouse is PaaS based cloud solution for data storage. It is a massively parallel processing (MPP) cloud-based, scale-out, relational database capable of processing massive volumes of data.

Why you should consider moving to SQL Data Warehouse?
– Massively parallel processing solution. This makes queries many times faster than SQL DB.
– Combine relational data with cloud scale-out capabilities.
– Handles large volume data with massive storage. No storage limit.
– Processing power (compute) can be increase / decrease processing on one click within few seconds.
– Pause or resume DW on-demand, helps in cost saving when not in use.
– Supports SQL Server T-SQL language
– Platform as a service (PaaS) cloud based solution.
– Manages, detect and mitigate security threats Azure authentication features.
– High availability, gives you 99.9% up-time SLA in regions available to public.
– Built-in automated database backups.
– Minimal maintenance, reduced dependency on IT team.
– Good for OLAP environment.

Why you should not consider moving to SQL Data Warehouse?
– Supports SQL Server T-SQL language (not everything though).
– Supports only 32 concurrent connections.
– Supports only 1024 active connections.
– Does not support in-memory OLTP.
– Does not support CROSS database queries.
– Migration from on-premise or IaaS to SQL DW is a challenge.
– Blocking issues may take down entire Azure SQL DW.
– INSERT BULK API is not supported thoroughly.
– Not good for OLTP environment due to frequent changes.

Capture

Reference Link :- https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is

Azure Cosmos DB

Azure Cosmos DB (“Project Florence”) revolutionary way for globally replicating your data, multi-model database service. Azure Cosmos DB lets you store your data globally-distributed and elastically scale throughput and storage across any number of geographical regions at anytime with single click. Addition to this, Azure Cosmos DB automatically indexes your data which helps in fast querying & also guarantees millisecond latencies.  On 10th May 2017 Microsoft announces general availability of Azure Cosmos DB.

 

Capture

Checkout msdn blog for detailed info : https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/

 

Query Hints – FORCE ORDER

SQL Query Optimizer basically determines most efficient way to execute given query by considering several query plans. Query Hints tell the optimizer to apply hints throughout execution of query. Here we will talk about FORCE ORDER query hint.

Example Script:

CREATE TABLE DIM_1(
ROWID INT IDENTITY(1, 1) NOT NULL,
CODE NVARCHAR(50) NULL
)
CREATE TABLE DIM_2(
ROWID INT IDENTITY(1, 1) NOT NULL,
CODE NVARCHAR(50) NULL
)

CREATE TABLE FACT(
ROWID INT IDENTITY(1, 1) NOT NULL,
VALUE_1 NVARCHAR(50) NOT NULL,
VALUE_2 NVARCHAR(50) NOT NULL,
DIM_1_ID INT,
DIM_2_ID INT
)

SELECT *
FROM FACT t1
JOIN DIM_1 t2 ON t1.DIM_1_ID = t2.ROWID
JOIN DIM_2 t3 ON t1.DIM_2_ID = t3.ROWID

Query without hint:

Capture

If you observed SQL Server optimizer rearranges joins order that it thinks will be optimal for your query to execute.

Query with hint:

SELECT *
FROM FACT t1
JOIN DIM_1 t2 ON t1.DIM_1_ID = t2.ROWID
JOIN DIM_2 t3 ON t1.DIM_2_ID = t3.ROWID
OPTION (FORCE ORDER)

Capture1.PNG

When you put this query hint on to your query, it tells SQL Server that when it executes the statement to not change the order of the joins in the query. It will join the tables in the exact order that is specified in the query.

The FORCE ORDER query hint is only used when you want to override the way that SQL Server wants to execute this query.