Hadoop基礎總結

本文轉載自查看原文 2018-06-26 17:09 818

一、Hadoop是什么？

　　Hadoop是開源的分布式存儲和分布式計算平台

二、Hadoop包含兩個核心組成：

　　1、HDFS: 分布式文件系統，存儲海量數據

　　　　a、基本概念

　　　　　　-塊(block）

　　　　　　　　HDFS的文件被分成塊進行存儲，每個塊的默認大小64MB

　　　　　　　　　塊是文件存儲處理的邏輯單元

　　　　　　-NameNode

　　　　　　　　　管理節點，存放文件元數據，包括：

　　　　　　　　　　（1）文件與數據塊的映射表

　　　　　　　　　　（2）數據塊與數據節點的映射表

　　　　　　-DataNode

　　　　　　　　　是HDFS的工作節點，存放數據塊

　　　　b、數據管理策略

　　　　　　11、數據塊副本

　　　　　　　　　每個數據塊三個副本，分布在兩個機架內的三個節點，以防數據故障丟失

　　　　　　22、心跳檢測：

　　　　　　　　DataNode定期向NameNode發送心跳信息

　　　　　　33、二級NameNode（Secondary NameNode）

　　　　　　　　　二級NameNode定期同步元數據映像文件和修改日志，NameNode發生故障時，備胎轉正

　　　　　　44、HDFS文件讀取的流程

　　　　　　55、HDFS寫入文件的流程

　　　　　　66、HDFS的特點

　　　　　　　　　數據冗余，硬件容錯

　　　　　　　　　流式的數據訪問，一次寫入多次讀取，一旦寫入無法修改，要修改只有刪除重寫

　　　　　　　　　存儲大文件，小文件NameNode壓力會很大

　　　　　　77、適用性和局限性

　　　　　　　　　適合數據批量讀寫，吞吐量高

　　　　　　　　　不適合交互式應用，低延遲很難滿足

　　　　　　　　　適合一次寫入多次讀取，順序讀寫

　　　　　　　　　不支持多用戶並發寫相同文件

　　2、Mapreduce：並行處理框架，實現任務分解和調度

　　　　a、Mapreduce的原理

　　　　　　分而治之，一個大任務分成多個小的子任務（map)，由多個節點並行執行后，合並結果（reduce）

　　　　b、Mapreduce的運行流程

　　　　　　11、基本概念

　　　　　　　　- Job & Task

　　　　　　　　　job → Task(maptask, reducetask)

　　　　　　　　- JobTracker

　　　　　　　　　　作業任務

　　　　　　　　　　分配任務、監控任務執行進度

　　　　　　　　　　監控TaskTracker的狀態

　　　　　　　　- TaskTracker

　　　　　　　　　　執行任務

　　　　　　　　　　匯報任務狀態

　　　　　　22、作業執行過程

　　　　　　33、Mapreduce的容錯機制

　　　　　　　　　重復執行

　　　　　　　　　推測執行

三、可用來做什么

　　搭建大型數據倉庫，PB級數據的存儲、處理、分析、統計等業務

　　如：搜索引擎、商業智能、日志分析、數據挖掘

四、Hadoop優勢

　　1、高擴展

　　　　可通過增加一些硬件，使得性能和容量提升

　　2、低成本

　　　　普通PC即可實現，堆疊系統，通過軟件方面的容錯來保證系統的可靠性

　　3、成熟的生態圈

　　　　如：Hive, Hbase

五、HDFS操作

　　1、shell命令操作

　　　　常用HDFS Shell命令：

　　　　　　類Linux系統：ls, cat, mkdir, rm, chmod, chown等

　　　　　HDFS文件交互：copyFromLocal、copyToLocal、get(下載）、put（上傳）

六、Hadoop生態圈

七、Mapreduce操作實戰

　　本例中為了實現讀取某個文檔，並統計文檔中各單詞的數量

　　先建立hdfs_map.py用於讀取文檔數據

# hdfs_map.py
import sys

def read_input(file):
    for line in file:
        yield line.split()


def main():
    data = read_input(sys.stdin)

    for words in data:
        for word in words:
            print('{}\t1'.format(word))


if __name__ == '__main__':
    main()

　　建立hdfs_reduce.py用於統計各單詞數量

# hdfs_reduce.py

import sys
from operator import itemgetter
from itertools import groupby


def read_mapper_output(file, separator='\t'):
    for line in file:
        yield line.rstrip().split(separator, 1)


def main():
    data = read_mapper_output(sys.stdin)

    for current_word, group in groupby(data, itemgetter(0)):
        total_count = sum(int(count) for current_word, count in group)

        print('{} {}'.format(current_word, total_count))


if __name__ == '__main__':
    main()

　　事先建立文檔mk.txt，並編輯部分內容，然后粗如HDFS中

　　在命令行中運行Mapreduce操作

hadoop jar /opt/hadoop-2.9.1/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -files '/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py,/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py' -input /test/mk.txt -output /output/wordcount -mapper 'python3 hdfs_map.py' -reducer 'python3 hdfs_reduce.py'

　　運行如下

  1 ➜  Documents hadoop jar /opt/hadoop-2.9.1/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -files '/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py,/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py' -input /test/mk.txt -output /output/wordcount -mapper 'python3 hdfs_map.py' -reducer 'python3 hdfs_reduce.py' 
  2 # 結果
  3 18/06/26 16:22:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  4 18/06/26 16:22:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
  5 18/06/26 16:22:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  6 18/06/26 16:22:46 INFO mapred.FileInputFormat: Total input files to process : 1
  7 18/06/26 16:22:46 INFO mapreduce.JobSubmitter: number of splits:1
  8 18/06/26 16:22:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local49685846_0001
  9 18/06/26 16:22:46 INFO mapred.LocalDistributedCacheManager: Creating symlink: /home/zzf/hadoop_tmp/mapred/local/1530001366609/hdfs_map.py <- /home/zzf/Documents/hdfs_map.py
 10 18/06/26 16:22:46 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py as file:/home/zzf/hadoop_tmp/mapred/local/1530001366609/hdfs_map.py
 11 18/06/26 16:22:47 INFO mapred.LocalDistributedCacheManager: Creating symlink: /home/zzf/hadoop_tmp/mapred/local/1530001366610/hdfs_reduce.py <- /home/zzf/Documents/hdfs_reduce.py
 12 18/06/26 16:22:47 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py as file:/home/zzf/hadoop_tmp/mapred/local/1530001366610/hdfs_reduce.py
 13 18/06/26 16:22:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
 14 18/06/26 16:22:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null
 15 18/06/26 16:22:47 INFO mapreduce.Job: Running job: job_local49685846_0001
 16 18/06/26 16:22:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
 17 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
 18 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
 19 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Waiting for map tasks
 20 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Starting task: attempt_local49685846_0001_m_000000_0
 21 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
 22 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
 23 18/06/26 16:22:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
 24 18/06/26 16:22:47 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/test/mk.txt:0+2267
 25 18/06/26 16:22:47 INFO mapred.MapTask: numReduceTasks: 1
 26 18/06/26 16:22:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 27 18/06/26 16:22:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 28 18/06/26 16:22:47 INFO mapred.MapTask: soft limit at 83886080
 29 18/06/26 16:22:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 30 18/06/26 16:22:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 31 18/06/26 16:22:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 32 18/06/26 16:22:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python3, hdfs_map.py]
 33 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
 34 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
 35 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
 36 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 37 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
 38 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
 39 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
 40 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
 41 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
 42 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
 43 18/06/26 16:22:47 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
 44 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
 45 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 46 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
 47 18/06/26 16:22:47 INFO streaming.PipeMapRed: Records R/W=34/1
 48 18/06/26 16:22:47 INFO streaming.PipeMapRed: MRErrorThread done
 49 18/06/26 16:22:47 INFO streaming.PipeMapRed: mapRedFinished
 50 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 
 51 18/06/26 16:22:47 INFO mapred.MapTask: Starting flush of map output
 52 18/06/26 16:22:47 INFO mapred.MapTask: Spilling map output
 53 18/06/26 16:22:47 INFO mapred.MapTask: bufstart = 0; bufend = 3013; bufvoid = 104857600
 54 18/06/26 16:22:47 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212876(104851504); length = 1521/6553600
 55 18/06/26 16:22:47 INFO mapred.MapTask: Finished spill 0
 56 18/06/26 16:22:47 INFO mapred.Task: Task:attempt_local49685846_0001_m_000000_0 is done. And is in the process of committing
 57 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Records R/W=34/1
 58 18/06/26 16:22:47 INFO mapred.Task: Task 'attempt_local49685846_0001_m_000000_0' done.
 59 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local49685846_0001_m_000000_0
 60 18/06/26 16:22:47 INFO mapred.LocalJobRunner: map task executor complete.
 61 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Waiting for reduce tasks
 62 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Starting task: attempt_local49685846_0001_r_000000_0
 63 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
 64 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
 65 18/06/26 16:22:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
 66 18/06/26 16:22:47 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@257adccd
 67 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 68 18/06/26 16:22:47 INFO reduce.EventFetcher: attempt_local49685846_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
 69 18/06/26 16:22:47 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local49685846_0001_m_000000_0 decomp: 3777 len: 3781 to MEMORY
 70 18/06/26 16:22:47 INFO reduce.InMemoryMapOutput: Read 3777 bytes from map-output for attempt_local49685846_0001_m_000000_0
 71 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 3777, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->3777
 72 18/06/26 16:22:47 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
 73 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
 74 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
 75 18/06/26 16:22:47 INFO mapred.Merger: Merging 1 sorted segments
 76 18/06/26 16:22:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3769 bytes
 77 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merged 1 segments, 3777 bytes to disk to satisfy reduce memory limit
 78 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merging 1 files, 3781 bytes from disk
 79 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
 80 18/06/26 16:22:47 INFO mapred.Merger: Merging 1 sorted segments
 81 18/06/26 16:22:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3769 bytes
 82 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
 83 18/06/26 16:22:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python3, hdfs_reduce.py]
 84 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
 85 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
 86 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 87 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
 88 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
 89 18/06/26 16:22:47 INFO streaming.PipeMapRed: Records R/W=381/1
 90 18/06/26 16:22:47 INFO streaming.PipeMapRed: MRErrorThread done
 91 18/06/26 16:22:47 INFO streaming.PipeMapRed: mapRedFinished
 92 18/06/26 16:22:47 INFO mapred.Task: Task:attempt_local49685846_0001_r_000000_0 is done. And is in the process of committing
 93 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
 94 18/06/26 16:22:47 INFO mapred.Task: Task attempt_local49685846_0001_r_000000_0 is allowed to commit now
 95 18/06/26 16:22:47 INFO output.FileOutputCommitter: Saved output of task 'attempt_local49685846_0001_r_000000_0' to hdfs://localhost:9000/output/wordcount/_temporary/0/task_local49685846_0001_r_000000
 96 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Records R/W=381/1 > reduce
 97 18/06/26 16:22:47 INFO mapred.Task: Task 'attempt_local49685846_0001_r_000000_0' done.
 98 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local49685846_0001_r_000000_0
 99 18/06/26 16:22:47 INFO mapred.LocalJobRunner: reduce task executor complete.
100 18/06/26 16:22:48 INFO mapreduce.Job: Job job_local49685846_0001 running in uber mode : false
101 18/06/26 16:22:48 INFO mapreduce.Job:  map 100% reduce 100%
102 18/06/26 16:22:48 INFO mapreduce.Job: Job job_local49685846_0001 completed successfully
103 18/06/26 16:22:48 INFO mapreduce.Job: Counters: 35
104     File System Counters
105         FILE: Number of bytes read=279474
106         FILE: Number of bytes written=1220325
107         FILE: Number of read operations=0
108         FILE: Number of large read operations=0
109         FILE: Number of write operations=0
110         HDFS: Number of bytes read=4534
111         HDFS: Number of bytes written=2287
112         HDFS: Number of read operations=13
113         HDFS: Number of large read operations=0
114         HDFS: Number of write operations=4
115     Map-Reduce Framework
116         Map input records=34
117         Map output records=381
118         Map output bytes=3013
119         Map output materialized bytes=3781
120         Input split bytes=85
121         Combine input records=0
122         Combine output records=0
123         Reduce input groups=236
124         Reduce shuffle bytes=3781
125         Reduce input records=381
126         Reduce output records=236
127         Spilled Records=762
128         Shuffled Maps =1
129         Failed Shuffles=0
130         Merged Map outputs=1
131         GC time elapsed (ms)=0
132         Total committed heap usage (bytes)=536870912
133     Shuffle Errors
134         BAD_ID=0
135         CONNECTION=0
136         IO_ERROR=0
137         WRONG_LENGTH=0
138         WRONG_MAP=0
139         WRONG_REDUCE=0
140     File Input Format Counters 
141         Bytes Read=2267
142     File Output Format Counters 
143         Bytes Written=2287
144 18/06/26 16:22:48 INFO streaming.StreamJob: Output directory: /output/wordcount

View Code

　　查看結果

  1 ➜  Documents hdfs dfs -cat /output/wordcount/part-00000
  2 # 結果
  3 "Even 1    
  4 "My 1    
  5 "We 1    
  6 (16ft) 1    
  7 11 1    
  8 16, 1    
  9 17-member 1    
 10 25-year-old 1    
 11 5m 1    
 12 AFP. 1    
 13 BBC's 1    
 14 Bangkok 1    
 15 But 1    
 16 Chiang 1    
 17 Constant 1    
 18 Deputy 1    
 19 Desperate 1    
 20 Head, 1    
 21 How 1    
 22 I'm 1    
 23 Jonathan 1    
 24 June 1    
 25 Luang 2    
 26 Minister 1    
 27 Myanmar, 1    
 28 Nang 2    
 29 Navy 2    
 30 Non 2    
 31 October. 1    
 32 PM 1    
 33 Post, 1    
 34 Prawit 1    
 35 Prime 1    
 36 Rai 1    
 37 Rescue 2    
 38 Royal 1    
 39 Saturday 2    
 40 Saturday. 1    
 41 Thai 1    
 42 Thailand's 2    
 43 Tham 2    
 44 The 6    
 45 They 2    
 46 Tuesday 1    
 47 Tuesday. 2    
 48 Wongsuwon 1    
 49 a 8    
 50 able 1    
 51 according 2    
 52 after 2    
 53 afternoon. 1    
 54 aged 1    
 55 alive, 1    
 56 alive," 1    
 57 all 1    
 58 along 1    
 59 and 6    
 60 anything 1    
 61 are 5    
 62 areas 1    
 63 as 1    
 64 at 2    
 65 attraction 1    
 66 authorities 1    
 67 be 1    
 68 been 2    
 69 began 1    
 70 believed 1    
 71 between 1    
 72 bicycles 1    
 73 border 1    
 74 boys 1    
 75 boys, 1    
 76 briefly 1    
 77 bring 1    
 78 but 1    
 79 by 1    
 80 camping 1    
 81 can 1    
 82 case 1    
 83 cave 9    
 84 cave, 3    
 85 cave. 1    
 86 cave.According 1    
 87 ceremony 1    
 88 chamber 1    
 89 child, 1    
 90 coach 3    
 91 completely 1    
 92 complex, 1    
 93 correspondent. 1    
 94 cross 1    
 95 crying 1    
 96 day. 1    
 97 deputy 1    
 98 dive 1    
 99 divers 2    
100 down. 1    
101 drink."The 1    
102 drones, 1    
103 during 1    
104 early 1    
105 eat, 1    
106 efforts 1    
107 efforts, 2    
108 enter 1    
109 entered 2    
110 enters 1    
111 equipment 1    
112 extensive 1    
113 flood 1    
114 floods. 1    
115 footballers 1    
116 footprints 1    
117 for 4    
118 found 1    
119 fresh 1    
120 from 2    
121 gear, 1    
122 get 1    
123 group 1    
124 group's 1    
125 had 2    
126 halted 2    
127 hampered 1    
128 hampering 1    
129 has 1    
130 have 6    
131 he 1    
132 here 1    
133 holding 1    
134 hopes 1    
135 if 1    
136 in 3    
137 inaccessible 1    
138 include 1    
139 inside 3    
140 into 1    
141 is 4    
142 it 1    
143 kilometres 1    
144 levels 1    
145 lies 1    
146 local 1    
147 making 1    
148 many 1    
149 may 1    
150 missing. 1    
151 must 1    
152 navy 1    
153 near 1    
154 network. 1    
155 night 1    
156 not 1    
157 now," 1    
158 of 4    
159 officials. 1    
160 on 5    
161 one 1    
162 optimistic 2    
163 our 1    
164 out 2    
165 outside 2    
166 parent 1    
167 pools 1    
168 poor 1    
169 prayer 1    
170 preparing 1    
171 province 1    
172 pumping 1    
173 rainfall 1    
174 rainy 1    
175 raising 1    
176 re-enter 1    
177 relatives 1    
178 reported 1    
179 reportedly 1    
180 rescue 1    
181 resumed 1    
182 return. 1    
183 rising 2    
184 runs 2    
185 safe 1    
186 safety. 1    
187 said 3    
188 said, 1    
189 says 1    
190 scene, 1    
191 scuba 1    
192 search 3    
193 search. 1    
194 searching 1    
195 season, 1    
196 seen 1    
197 sent 1    
198 should 1    
199 small 1    
200 sports 1    
201 started 1    
202 still 2    
203 stream 2    
204 submerged, 1    
205 team 3    
206 teams 1    
207 the 23    
208 their 5    
209 them 1    
210 these 1    
211 they 5    
212 third 1    
213 though 1    
214 thought 1    
215 through 1    
216 to 17    
217 tourist 1    
218 train 1    
219 trapped 1    
220 trapped? 1    
221 try 1    
222 underground. 1    
223 underwater 1    
224 unit 1    
225 up 1    
226 use 1    
227 visibility 1    
228 visitors 1    
229 was 2    
230 water 2    
231 waters 2    
232 were 4    
233 which 5    
234 who 1    
235 with 2    
236 workers 1    
237 you 1    
238 young 1

View Code

八、思考一：如何通過Hadoop存儲小文件？

　　　　　　 1、應用程序自己控制
　　　　　　　2、archive
　　　　　　　3、Sequence File / Map File
　　　　　　　4、CombineFileInputFormat***
　　　　　　　5、合並小文件，如HBase部分的compact

　　思考二：當有節點故障時，Hadoop集群是如何繼續提供服務的，如何讀和寫？

　　思考三：哪些時影響Mapreduce性能的因素？

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Hadoop離線基礎總結】MapReduce參數優化【Hadoop離線基礎總結】oozie定時任務設置常用基礎Linux操作命令總結與hadoop基礎操作命令 Hadoop基礎（四）：安裝JDK 與Hadoop [Hadoop] Hadoop學習筆記之Hadoop基礎 Hadoop之Storm基礎 Hadoop的基礎命令 Hadoop基礎教程 Hadoop fs 基礎命令 Hadoop 基礎知識(一)