Hadoop在Linux環境下的配置


  寫在前面:在這之前需要有自己 的Linux環境,了解常用的Linux命令。並且已經配置好了java環境,什么叫配置好呢,就是 echo ${JAVA_HOME}  命令是可以輸出jdk路徑的,才叫配置好。如果只是java -version可以查看java版本,就需要source /etc/profile 命令來使其生效,不生效也是不行滴。

一、下載解壓

首先下載Hadoop安裝包,直接在Windows官網上下載就行,這是鏡像網站,可自取:http://mirror.bit.edu.cn/apache/hadoop/common/,

我下載的版本是2.7.7     

下載完之后,直接將下載下來的壓縮文件傳到Linux上,我用的傳輸軟件是 WinSCP,長這樣:,至於怎么用,百度一查就很明了了。

好了,現在就是真正的Linux時間了,cd進到存放Hadoop壓縮包的目錄下,用解壓縮命令(tar -zxvf hadoop-2.7.7-tar.gz)將其進行解壓,

二、文件配置

接下來就要開始配置了,cd進到Hadoop路徑下的 etc/hadoop 下,

1、首先是java路徑配置,vim hadoop-env.sh編輯文件, 

  這里的java路徑一定要自己配一遍,不要用  ${JAVA_HOME},否則在集群環境下,啟動的時候會找不到java的!!!!

  :wq 保存並退出。然后執行  source hadoop-env.sh令其生效(忘了是不是必須的)。

  然后 vim /etc/profile 打開系統配置,配置HADOOP環境變量。,source令其生效.

 

2、core-site.xml 文件,打開后是空白的,如下添加

<configuration>
  <property>
   <!-- The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.-->
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>  <!--master是Linux主機名-->
  </property>
  <property>
    <!--Size of read/write buffer used in SequenceFiles. byte -->
    <name>io.file.buffer.size</name>
    <value>131072</value>  <!-- 大小 -->
  </property>
    <property>
    <!-- A base for other temporary directories. -->
    <name>hadoop.tmp.dir</name>
    <value>/study/hadoopWork/hadoop</value>  <!-- 路徑 -- >
  </property>

</configuration>

 3、hdfs-site.xml文件

<configuration>
  <property>
    <!-- HDFS blocksize of 256MB for large file-systems.  default 128MB-->
    <name>dfs.blocksize</name>
    <value>268435456</value>
  </property>
   <property>
    <!-- More NameNode server threads to handle RPCs from large number of DataNodes. default 10-->
    <name>dfs.namenode.handler.count</name>
    <value>100</value>
  </property>
</configuration>

4、mapred-site.xml,這個文件沒有,需要將mapred-site.xml.template重命名

<configuration>
    <!-- Configurations for MapReduce Applications -->
    <property>
        <!-- Execution framework set to Hadoop YARN. -->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
 
    <property>
        <!-- The amount of memory to request from the scheduler for each map task. -->
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
    </property>
 
    <property>
        <!-- Larger heap-size for child jvms of maps. -->
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
    </property>
    
    <property>
        <!-- Larger resource limit for reduces. -->
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
    </property>
    
    <property>
        <!-- Larger heap-size for child jvms of reduces. -->
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
    </property>
    
    <property>
        <!-- The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.-->
        <name>mapreduce.task.io.sort.mb</name>
        <value>512</value>
    </property>
    
    <property>
        <!-- The number of streams to merge at once while sorting files. This determines the number of open file handles.-->
        <name>mapreduce.task.io.sort.factor</name>
        <value>100</value>
    </property>
    
    <property>
        <!--The default number of parallel transfers run by reduce during the copy(shuffle) phase.-->
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>50</value>
    </property>
 
    <!--Configurations for MapReduce JobHistory Server-->
    <property>
        <!--MapReduce JobHistory Server IPC host:port-->
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    
    <property>
        <!--MapReduce JobHistory Server Web UI host:port-->
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
    
    <property>
        <!--Directory where history files are written by MapReduce jobs.-->
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/study/hadoopWork/hadoop</value>
    </property>
    
    <property>
        <!--Directory where history files are managed by the MR JobHistory Server.-->
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/study/hadoopWork/hadoop</value>
    </property>
</configuration>

5、yarn-site.xml

<configuration>
    <!-- Configurations for ResourceManager and NodeManager -->
    <property>
        <!-- Enable ACLs? Defaults to false. -->
        <name>yarn.acl.enable</name>
        <value>false</value>
    </property>
 
    <property>
        <!-- ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access. -->
        <name>yarn.admin.acl</name>
        <value>*</value>
    </property>
 
    <property>
        <!-- Configuration to enable or disable log aggregation -->
        <name>yarn.log-aggregation-enable</name>
        <value>false</value>
    </property>
 
    <!-- Configurations for ResourceManager -->
 
    <property>
        <!-- host Single hostname that can be set in place of setting all yarn.resourcemanager*address resources. Results in default ports for ResourceManager components. -->
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
 
    <property>
        <!-- CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler -->
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
 
    <property>
        <!--The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this will throw a InvalidResourceRequestException.-->
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
 
    <property>
        <!--The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this will throw a InvalidResourceRequestException.-->
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8192</value>
    </property>
 
    <!--Configurations for NodeManager-->
    
    <property>
        <!-- Defines total available resources on the NodeManager to be made available to running containers -->
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
    </property>
    <property>
        <!--Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.-->
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>
    
    <property>
        <!-- Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.-->
        <name>yarn.nodemanager.log-dirs</name>
        <value>/study/hadoopWork/data/hadoop/log</value>
    </property>
    
    <property>
        <!--HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.  -->
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/study/hadoopWork/data/hadoop/log</value>
    </property>
</configuration>

 至此,基本已經配置完畢,配置文件中遇到的路徑等,需要自己在相應目錄下去新建,也可以配置成自己的路徑。

6、初始化hadoop

  hdfs namenode -format。如果報錯找不到JAVA路徑等,就去看看自己的java環境變量是不是配置正確,hadoop-env.sh文件的java路徑是不是正確。

  如果百度查到的解決辦法都試過了,還是不行,就把安裝好的jdk卸載掉,重新下載安裝。一定要卸載干凈!!卸載方法可自行百度,當時我足足配置了三四遍才成功配置好。

7、啟動集群

  在sbin路徑下,執行 start-all.sh,java報錯解決辦法同6,成功啦!

  執行jps查看執行狀態。

 

 

也不是一次就能配置成功,配置過程中,我也是百度了大量的前輩的資料,如本文中有相似之處,請諒解。實在是前輩們的博客地址沒記住,

        假裝這里有參考文獻吧


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM