大數據集群搭建

本文轉載自查看原文 2020-05-01 22:47 800 spark集群/ 大數據集群

1、virtualbox

　　1、關閉虛擬機選擇第一個休眠，會記錄各種進程的狀態。如果直接關掉虛擬機會關閉各種進程，導致環境崩潰。

2、Centos7

　　1、修改網絡

　　網卡橋接，配置主機和虛擬機相互pingtong

　　vim /etc/sysconfig/network-scripts/ifcfj-enp0s3

　　BOOTPROTO=static

　　IPADDR=192.168.0.106 跟自己主機同一網段

　　GATEWAY=192.168.0.1

　　NETMASK=255.255.255.0

　　ONBOOT=yes

　　2、修改主機名

　　hostnamectl set-name spark

　　vi /etc/selinux/config

systemctl stop firewalld

　　SELINUX= disabled

　　3、修改host文件

　　vi /etc/hosts

　　192.168.0.106 spark1

　　192.168.0.107 spark2

　　192.168.0.108 spark3

　　4、配置免密登陸訪問

　　ssh-keygen -t rsa

　　touch /root/.ssh/authorized_keys

　　cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

　　ssh-copy-id -i spark3

3、JDK1.7

4、MobaXterm

5、hadoop2.4.1

　tar -zxvf hadoop2.4.2.tar.gz

　mv hadoop2.4.1 hadoop

　 vim ~/.bashrc

　　export HADOOP_HOME=/usr/local/hadoop
　　export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

　　source ~/.bashrc

　　配置hadoop下面的etc/hadoop目錄下面的配置文件

　　修改core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://spark1:9000</value>
</property>
</configuration>

修改hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/usr/local/data/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/usr/local/data/datanode</value>
</property>
<property>
  <name>dfs.tmp.dir</name>
  <value>/usr/local/data/tmp</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>
</configuration>

修改mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

修改yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>spark1</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
</configuration>

修改slaves

spark1
spark2
spark3

啟動hadoop集群

　　格式化namenode在spark1上面執行

　　hdfs namenode -format

　　start-dfs.sh

　　安裝一個jdk開發工具

　　yum install java-1.8.0-openjdk-devel.x86_64

　　啟動完成后要確認

　　spark1上面有namenode，datanode，secondarynamenode

　　spark2上面有datanode

　　spark3上面有datanode

http://spark1:50070/dfshealth.html#tab-overview可以訪問

　　啟動yarn集群

　　start-yarn.sh

　　spark1:resourcemanager、nodemanager

　　spark2:nodemanager

　　spark3:nodemanager

　　http://spark1:8088/cluster可以訪問

6、Hive 0.13

　　1、配置hive

　　tar -zxvf apache-hive-0.13-bin.tar.gz

　　mv apache-hive-0.13-bin hive

　　vim ~/.bashrc

　　$HIVE_HOME=/usr/local/hive

　　配置環境變量

　　2、安裝mysql-sever

　　$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rp

　　$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

　　yum install -y mysql

　　service mysqld start

　　chkconfig mysqld on

　　yum install -y mysql-connector-java

　　cp /usr/share/java/mysql-connector-java.jar /usr/local/hive/lib/

　　3、登錄mysql創建hive元數據庫

　　4、配置hive

　　　　去掉createDatabaseIfNotExist增加serverTimezone=Asia/Shanghai

　　驗證hive就是輸入hive看是否進入命令行

　　create table t(id int);

　　select * from t;

　　drop table t;

7、Zookeeper3.4.5

　　tar -zxvf zookeeper3.4.5

　　mv zookeeper3.4.5 zk

　　配置環境變量

　　vim ~/.bashrc

　　export ZOOKEEPER_HOME=/usr/local/zk

　　export JAVA_HOME=/usr/lib/jvm/jre
　　export HADOOP_HOME=/usr/local/hadoop
　　export HIVE_HOME=/usr/local/hive
　　export ZOOKEEPER_HOME=/usr/local/zk
　　export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

　　修改zookeeper的conf下面的配置文件

　　 mv zoo_sample.cfg zoo.cfg

　　vim zoo.cfg

　　dataDir=/usr/local/zk/data

　　server.0=spark1:2888:3888

　　server.1=spark2:2888:3888

　　server.2=spark3:2888:3888

　　窗口data目錄和myid文件

　　mkdir data

　　vim myid 0

　　拷貝zk文件和環境變量到spark2，spark3並且刷新環境變量，修改myid 為，1，2

　　scp -r zk/ root@spark3:/usr/local/

scp ~/.bashrc root@spark3:~/

　　啟動zk

　　zkServer.sh start

8、kafka_2.9.2-0.8.1

　　解壓scala-2.11.4.tgz解壓，該命成scala

　　配置環境變量　

　　export JAVA_HOME=/usr/lib/jvm/jre
　　export HADOOP_HOME=/usr/local/hadoop
　　export HIVE_HOME=/usr/local/hive
　　export ZOOKEEPER_HOME=/usr/local/zk
　　export SCALA_HOME=/usr/local/scala
　　export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

　　解壓kafka_2.9.2-0.8.1

　　修改配置文件kafak/config/server.properites

　　zookeeper.connect=spark1:2181,spark2:2181,spark3:2181

　　broker.id=0

　　將slf4j-1.7.6.zip解壓

　　slf4j-nop-1.7.6.jar 復制到kafka/libs下面

　　啟動kafka

　　nohup bin/kafka-server-start.sh config/server.properties &

　　測試集群

　　bin/kafka-topics.sh --zookeeper 192.168.0.106:2181,192.168.0.107:2181,192.168.0.108:2181 --topic Test --replication-factor 1 --partitions 1 --create

　　 bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic Test

　　bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic Test --from-beginning

9、spark 1.3.0

　　上傳spark-1.3.0-bin-hadoop2.4.tgz,解壓改名

　　配置環境變量

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export CLASS_PATH=.:$CLASS_PATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin

　　cd /usr/local/spark/conf

　　cp spark-env.sh.template spark-env.sh

　　vim spark-env.sh

export JAVA_HOME=/usr/lib/jvm/jre
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER=192.168.0.106
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

　　cp slaves.template slaves

　　vim slaves

　　spark1

　　spark2

　　spark3

　　復制spark和~/.bashrc到spark2，spark3，然后source ~/.bashrc

啟動spark

　　cd /usr/local/spark/sbin

　　./start-all.sh

驗證通過jps查看是否有spark1：master，worker，spark2:worker,spark3:worker

http://spark1:8080/

運行spark-shell，進入如下界面

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 大數據集群環境搭建（7節點）基於Docker搭建大數據集群（一）Docker環境部署 Spark項目之電商用戶行為分析大數據平台之（三）大數據集群的搭建 CDH版本大數據集群下搭建的Hue詳細啟動步驟（圖文詳解） CDH- CDH大數據集群運維大數據平台搭建-hbase集群的搭建朝花夕拾之--大數據平台CDH集群離線搭建 Java+大數據開發——Hadoop集群環境搭建(一) CentOS6.5下Ambari安裝搭建部署大數據集群（圖文分五大步詳解）（博主強烈推薦）華為雲大數據集群 kafka 組件報錯解決 KrbException: Message stream modified (41)