PolyBase 指南


PolyBase 是一種可通過 t-sql 語言訪問數據庫外部數據的技術。PolyBase is a technology that accesses data outside of the database via the t-sql language. 在 SQL Server 2016 中,可以對 Hadoop 中的外部數據運行查詢或將數據導入/導出 Azure Blob 存儲。In SQL Server 2016, it allows you to run queries on external data in Hadoop or to import/export data from Azure Blob Storage. 查詢會進行優化以將計算推送到 Hadoop。Queries are optimized to push computation to Hadoop. 在 Azure SQL 數據倉庫中,可以將數據導入/導出 Azure Blob 存儲和 Azure Data Lake Store。In Azure SQL Data Warehouse, you can import/export data from Azure Blob Storage and Azure Data Lake Store.

若要使用 Polybase,請參閱 PolyBase 入門To use PolyBase, see Get started with PolyBase.

PolyBase 邏輯PolyBase logical

為什么要用 PolyBase?Why use PolyBase?

若要作出正確決策,你需要同時分析關系數據和其他未構建到表中的數據 - 尤其是 Hadoop 數據。To make good decisions, you want to analyze both relational data and other data that is not structured into tables —notably Hadoop. 除非有方法能夠在不同數據存儲類型之間傳輸數據,否則這將很難執行。This is difficult to do unless you have a way to transfer data among the different types of data stores. PolyBase 通過處理 SQL Server 外部的數據填補了這一差距。PolyBase bridges this gap by operating on data that is external to SQL Server.

為了簡單起見,PolyBase 不要求向 Hadoop 環境安裝其他軟件。To keep it simple, PolyBase does not require you to install additional software to your Hadoop environment. 查詢外部數據使用與查詢數據庫表一樣的語法。Querying external data uses the same syntax as querying a database table. 所有的一切均透明發生。This all happens transparently. PolyBase 會在后台處理所有詳細信息,並且最終用戶不需要 Hadoop 的任何相關知識便可查詢外部表。PolyBase handles all the details behind-the-scenes, and no knowledge about Hadoop is required by the end user to query external tables.

PolyBase 能夠:PolyBase can:

  • 通過 SQL Server 或 PDW 查詢 Hadoop 中存儲的數據。Query data stored in Hadoop from SQL Server or PDW. 用戶將數據存儲在經濟高效的分布式、可擴展系統中,例如 Hadoop。Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase 使得使用 T-SQL 查詢數據更加容易。PolyBase makes it easy to query the data by using T-SQL.

  • 查詢存儲在 Azure Blob 存儲中的數據。Query data stored in Azure Blob Storage. Azure blob 存儲是一個方便存儲供 Azure 服務使用的數據的位置。Azure blob storage is a convenient place to store data for use by Azure services. PolyBase 使得使用 T-SQL 訪問數據變得更加容易。PolyBase makes it easy to access the data by using T-SQL.

  • 從 Hadoop、Azure Blob 存儲或 Azure Data Lake Store 導入數據 通過將數據從 Hadoop、Azure Blob 存儲或 Azure Data Lake Store 導入到關系表中,利用 Microsoft SQL 的列存儲技術和分析功能的速度。Import data from Hadoop, Azure Blob Storage, or Azure Data Lake Store Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. 不需要單獨的 ETL 或導入工具。There is no need for a separate ETL or import tool.

  • 將數據導出到 Hadoop、Azure Blob 存儲或 Azure Data Lake Store。Export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store. 將數據存檔到 Hadoop、Azure Blob 存儲或 Azure Data Lake Store,以獲得經濟高效的存儲,並使數據保持聯機以便於訪問。Archive data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.

  • 與 BI 工具集成Integrate with BI tools. 結合使用 PolyBase 和 Microsoft 的商業智能和分析堆棧,或使用任何與 SQL Server 兼容的第三方工具。Use PolyBase with Microsoft’s business intelligence and analysis stack, or use any third party tools that are compatible with SQL Server.

“性能”Performance

  • 將計算推送到 Hadoop。 查詢優化器制定了基於開銷的決策,以在執行此操作將提升查詢性能時將計算推送到 Hadoop。Push computation to Hadoop. The query optimizer makes a cost-based decision to push computation to Hadoop when doing so will improve query performance. 它使用外部表上的統計以制定基於開銷的決策。It uses statistics on external tables to make the cost-based decision. 推送計算會創建 MapReduce 作業並利用 Hadoop 的分布計算資源。Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.

  • 縮放計算資源。Scale compute resources. 若要提高查詢性能,可以使用 SQL Server PolyBase 橫向擴展組To improve query performance, you can use SQL Server PolyBase scale-out groups. 這使並行數據可以在 SQL Server 實例和 Hadoop 節點之間傳輸,並為處理外部數據添加計算資源。This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds compute resources for operating on the external data.

PolyBase 指南主題PolyBase Guide Topics

本指南包括幫助你高效且有效地使用 PolyBase 的主題。This guide includes topics to help you use PolyBase efficiently and effectively.

   
主題Topic DescriptionDescription
PolyBase 入門Get started with PolyBase 安裝和配置 PolyBase 的基本步驟。Basic steps to install and configure PolyBase. 這演示了如何創建指向 Hadoop 或 Azure blob 存儲中數據的外部對象,並提供了查詢示例。This shows how to create external objects that point to data in Hadoop or Azure blob storage, and gives query examples.
PolyBase 受版本控制的功能摘要PolyBase Versioned Feature Summary 描述 SQL Server、SQL 數據庫和 SQL 數據倉庫上支持哪些 PolyBase 功能。Describes which PolyBase features are supported on SQL Server, SQL Database, and SQL Data Warehouse.
PolyBase 橫向擴展組PolyBase scale-out groups 通過使用 SQL Server 橫向擴展組在 SQL Server 和 Hadoop 之間橫向擴展並行度。Scale out parallelism between SQL Server and Hadoop by using SQL Server scale-out groups.
PolyBase 安裝PolyBase installation 使用安裝向導或命令行工具安裝 PolyBase 的參考和步驟。Reference and steps for installing PolyBase with the installation wizard or with a command-line tool.
PolyBase 配置PolyBase configuration 為 PolyBase 配置 SQL Server 設置。Configure SQL Server settings for PolyBase. 例如,配置計算下推和 kerberos 安全性。For example, configure computation pushdown and kerberos security.
PolyBase T-SQL 對象PolyBase T-SQL objects 創建 PolyBase 用來定義和訪問外部數據的 T-SQL 對象。Create the T-SQL objects that PolyBase uses to define and access external data.
PolyBase QueriesPolyBase Queries 使用 T-SQL 語句來查詢、導入或導出外部數據。Use T-SQL statements to query, import, or export external data.
PolyBase 故障排除PolyBase troubleshooting 管理 PolyBase Queries的技術。Techniques to manage PolyBase queries. 使用動態管理視圖 (DMV) 來監視 PolyBase Queries,並了解如何讀取 PolyBase Queries 計划,以找出性能瓶頸。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM