[MapReduce] Google三駕馬車:GFS、MapReduce和Bigtable


  聲明:此文轉載自博客開發團隊的博客,尊重原創工作。該文適合學分布式系統之前,作為背景介紹來讀。

  談到分布式系統,就不得不提Google的三駕馬車:Google FS[1],MapReduce[2],Bigtable[3]。

  雖然Google沒有公布這三個產品的源碼,但是他發布了這三個產品的詳細設計論文。而且,Yahoo資助的Hadoop也有按照這三篇論文的開源Java實現:Hadoop對應MapReduce, Hadoop Distributed File System (HDFS)對應Google FS,Hbase對應Bigtable。不過在性能上Hadoop比Google要差很多,參見表1。

  表1:Hbase和BigTable性能比較(來源於http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation)

Experiment

HBase20070916

BigTable

random reads

272

1212

random reads (mem)

Not implemented

10811

random writes

1460

8850

sequential reads

267

4425

sequential writes

1278

8547

Scans

3692

15385

  以下分別介紹這三個產品:

1. Google FS

  GFS是一個可擴展的分布式文件系統,用於大型的、分布式的、對大量數據進行訪問的應用。它運行於廉價的普通硬件上,提供容錯功能。

  分布式系統漫談一 <wbr>—— Google三駕馬車: <wbr>GFS,mapreduce,Bigtable

  圖1 GFS Architecture

  (1)GFS的結構

  1. GFS的結構圖見圖1,由一個master和大量的chunkserver構成,

  2. 不像Amazon Dynamo的沒有主的設計,Google設置一個主來保存目錄和索引信息,這是為了簡化系統結果,提高性能來考慮的,但是這就會造成主成為單點故障或者瓶頸。為了消除主的單點故障Google把每個chunk設置的很大(64M),這樣,由於代碼訪問數據的本地性,application端和master的交互會減少,而主要數據流量都是Application和chunkserver之間的訪問。

  3. 另外,master所有信息都存儲在內存里,啟動時信息從chunkserver中獲取。提高了master的性能和吞吐量,也有利於master當掉后,很容易把后備j機器切換成master。

  4. 客戶端和chunkserver都不對文件數據單獨做緩存,只是用linux文件系統自己的緩存

  “The master stores three major types of metadata: the file and chunk namespaces, the mapping from files to chunks, and the locations of each chunk’s replicas.”

  “Having a single master vastly simplifies our design and enables the master to make sophisticated chunk placement and replication decisions using global knowledge. However,we must minimize its involvement in reads and writes so that it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. It caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations.”

  “Neither the client nor the chunkserver caches file data.Client caches offer little benefit because most applications stream through huge files or have working sets too large to be cached. Not having them simplifies the client and the overall system by eliminating cache coherence issues.(Clients do cache metadata, however.) Chunkservers need not cache file data because chunks are stored as local files and so Linux’s buffer cache already keeps frequently accessed data in memory.”

  (2)GFS的復制

  GFS典型的復制到3台機器上,參看圖2

分布式系統漫談一 <wbr>—— Google三駕馬車: <wbr>GFS,mapreduce,Bigtable

  圖2 一次寫操作的控制流和數據流

  (3) 對外的接口

  和文件系統類似,GFS對外提供create, delete,open, close, read, 和 write 操作。另外,GFS還新增了兩個接口snapshot and record append,snapshot。有關snapshot的解釋:

  “Moreover, GFS has snapshot and record append operations. Snapshot creates a copy of a file or a directory tree at low cost.

Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client’s append.”

2. MapReduce

  MapReduce是針對分布式並行計算的一套編程模型。

  講到並行計算,就不能不談到微軟的Herb Sutter在2005年發表的文章” The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software”[6],主要意思是通過提高cpu主頻的方式來提高程序的性能很快就要過去了,cpu的設計方向也主要是多核,超線程等並發上。但是以前的程序並不能自動的得到多核的好處,只有編寫並發程序,才能真正獲得多核的好處。分布式計算也是一樣。

  分布式系統漫談一 <wbr>—— Google三駕馬車: <wbr>GFS,mapreduce,Bigtable

  圖3 MapReduce Execution overview

  1)MapReduce是由Map和reduce組成,來自於Lisp,Map是影射,把指令分發到多個worker上去,Reduce是規約,把Map的worker計算出來的結果合並。(參見圖3)

  2)Google的MapReduce實現使用GFS存儲數據。

  3)MapReduce可用於Distributed Grep,Count of URL Access Frequency,ReverseWeb-Link Graph,Distributed Sort,Inverted Index

3. Bigtable

  就像文件系統需要數據庫來存儲結構化數據一樣,GFS也需要Bigtable來存儲結構化數據。

  1)BigTable 是建立在 GFS ,Scheduler ,Lock Service 和 MapReduce 之上的。

  2)每個Table都是一個多維的稀疏圖

  3)為了管理巨大的Table,把Table根據行分割,這些分割后的數據統稱為:Tablets。每個Tablets大概有 100-200 MB,每個機器存儲100個左右的 Tablets。底層的架構是:GFS。由於GFS是一種分布式的文件系統,采用Tablets的機制后,可以獲得很好的負載均衡。比如:可以把經常響應的表移動到其他空閑機器上,然后快速重建。

參考文獻

  [1]The Google File System; http://labs.google.com/papers/gfs-sosp2003.pdf

  [2]MapReduce: Simplifed Data Processing on Large Clusters; http://labs.google.com/papers/mapreduce-osdi04.pdf

  [3]Bigtable: A Distributed Storage System for Structured Data;http://labs.google.com/papers/bigtable-osdi06.pdf

  [4]Hadoop ; http://lucene.apache.org/hadoop/

  [5]Hbase: Bigtable-like structured storage for Hadoop HDFS;http://wiki.apache.org/lucene-hadoop/Hbase

  [6]The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software;http://www.gotw.ca/publications/concurrency-ddj.htm


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM