大數據集群搭建


1、virtualbox

  1、關閉虛擬機選擇第一個休眠,會記錄各種進程的狀態。如果直接關掉虛擬機會關閉各種進程,導致環境崩潰。

2、Centos7

  1、修改網絡

  網卡橋接,配置主機和虛擬機相互pingtong

  vim /etc/sysconfig/network-scripts/ifcfj-enp0s3

  BOOTPROTO=static

  IPADDR=192.168.0.106 跟自己主機同一網段

  GATEWAY=192.168.0.1

  NETMASK=255.255.255.0

  ONBOOT=yes

  2、修改主機名

  hostnamectl set-name spark

  vi /etc/selinux/config

      systemctl stop firewalld

  SELINUX= disabled

  3、修改host文件

  vi /etc/hosts

  192.168.0.106 spark1

  192.168.0.107 spark2

  192.168.0.108 spark3

  4、配置免密登陸訪問

  ssh-keygen -t rsa

  touch /root/.ssh/authorized_keys

  cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

  ssh-copy-id -i spark3

3、JDK1.7

4、MobaXterm

5、hadoop2.4.1

 tar -zxvf hadoop2.4.2.tar.gz

 mv hadoop2.4.1 hadoop

  vim ~/.bashrc

  export HADOOP_HOME=/usr/local/hadoop
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  source ~/.bashrc

  配置hadoop下面的etc/hadoop目錄下面的配置文件

  修改core-site.xml

  

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://spark1:9000</value>
</property>
</configuration>

修改hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/usr/local/data/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/usr/local/data/datanode</value>
</property>
<property>
  <name>dfs.tmp.dir</name>
  <value>/usr/local/data/tmp</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>
</configuration>

修改mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

修改yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>spark1</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
</configuration>

修改slaves

spark1
spark2
spark3

啟動hadoop集群

  格式化namenode在spark1上面執行

  hdfs namenode -format

  start-dfs.sh

  安裝一個jdk開發工具

  yum install java-1.8.0-openjdk-devel.x86_64

  啟動完成后要確認

  spark1上面有namenode,datanode,secondarynamenode

  spark2上面有datanode

  spark3上面有datanode

       http://spark1:50070/dfshealth.html#tab-overview可以訪問

  啟動yarn集群

  start-yarn.sh

  spark1:resourcemanager、nodemanager

  spark2:nodemanager

  spark3:nodemanager

  http://spark1:8088/cluster可以訪問

6、Hive 0.13

  1、配置hive

  tar -zxvf apache-hive-0.13-bin.tar.gz

  mv apache-hive-0.13-bin hive

  vim ~/.bashrc

  $HIVE_HOME=/usr/local/hive

  配置環境變量

  2、安裝mysql-sever

  $ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rp

  $ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

  yum install -y mysql

  service mysqld start

  chkconfig mysqld on

  yum install -y mysql-connector-java

  cp /usr/share/java/mysql-connector-java.jar /usr/local/hive/lib/

  3、登錄mysql創建hive元數據庫

  

 

   4、配置hive

    去掉createDatabaseIfNotExist增加serverTimezone=Asia/Shanghai

   

  

 

   驗證hive就是輸入hive看是否進入命令行

  create table t(id int);

  select * from t;

  drop table t;

7、Zookeeper3.4.5

  tar -zxvf zookeeper3.4.5

  mv zookeeper3.4.5 zk

  配置環境變量

  vim ~/.bashrc

  export ZOOKEEPER_HOME=/usr/local/zk

  export JAVA_HOME=/usr/lib/jvm/jre
  export HADOOP_HOME=/usr/local/hadoop
  export HIVE_HOME=/usr/local/hive
  export ZOOKEEPER_HOME=/usr/local/zk
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

  

  修改zookeeper的conf下面的配置文件

   mv zoo_sample.cfg zoo.cfg

  vim zoo.cfg

  dataDir=/usr/local/zk/data

  server.0=spark1:2888:3888

  server.1=spark2:2888:3888

  server.2=spark3:2888:3888

  窗口data目錄和myid文件

  mkdir data

  vim myid 0

  拷貝zk文件和環境變量到spark2,spark3並且刷新環境變量,修改myid 為,1,2

  scp -r zk/ root@spark3:/usr/local/

       scp ~/.bashrc root@spark3:~/

 

  啟動zk

  zkServer.sh start

8、kafka_2.9.2-0.8.1

  解壓scala-2.11.4.tgz解壓,該命成scala

  配置環境變量 

  export JAVA_HOME=/usr/lib/jvm/jre
  export HADOOP_HOME=/usr/local/hadoop
  export HIVE_HOME=/usr/local/hive
  export ZOOKEEPER_HOME=/usr/local/zk
  export SCALA_HOME=/usr/local/scala
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

  解壓kafka_2.9.2-0.8.1

  修改配置文件kafak/config/server.properites

  zookeeper.connect=spark1:2181,spark2:2181,spark3:2181

  broker.id=0

  將slf4j-1.7.6.zip解壓

  slf4j-nop-1.7.6.jar 復制到kafka/libs下面

  啟動kafka

  nohup bin/kafka-server-start.sh config/server.properties &

  測試集群

  bin/kafka-topics.sh --zookeeper 192.168.0.106:2181,192.168.0.107:2181,192.168.0.108:2181 --topic Test --replication-factor 1 --partitions 1 --create

   bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic Test

  bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic Test --from-beginning

9、spark 1.3.0

  上傳spark-1.3.0-bin-hadoop2.4.tgz,解壓改名

  配置環境變量

  

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export CLASS_PATH=.:$CLASS_PATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

  cd /usr/local/spark/conf

  cp spark-env.sh.template spark-env.sh

  vim spark-env.sh

export JAVA_HOME=/usr/lib/jvm/jre
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER=192.168.0.106
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

  cp slaves.template  slaves

  vim slaves

  spark1

  spark2

  spark3

  復制spark和~/.bashrc到spark2,spark3,然后source ~/.bashrc

啟動spark

  cd /usr/local/spark/sbin

  ./start-all.sh

驗證通過jps查看是否有spark1:master,worker,spark2:worker,spark3:worker

http://spark1:8080/

運行spark-shell,進入如下界面

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM