1、virtualbox
1、關閉虛擬機選擇第一個休眠,會記錄各種進程的狀態。如果直接關掉虛擬機會關閉各種進程,導致環境崩潰。
2、Centos7
1、修改網絡
網卡橋接,配置主機和虛擬機相互pingtong
vim /etc/sysconfig/network-scripts/ifcfj-enp0s3
BOOTPROTO=static
IPADDR=192.168.0.106 跟自己主機同一網段
GATEWAY=192.168.0.1
NETMASK=255.255.255.0
ONBOOT=yes
2、修改主機名
hostnamectl set-name spark
vi /etc/selinux/config
systemctl stop firewalld
SELINUX= disabled
3、修改host文件
vi /etc/hosts
192.168.0.106 spark1
192.168.0.107 spark2
192.168.0.108 spark3
4、配置免密登陸訪問
ssh-keygen -t rsa
touch /root/.ssh/authorized_keys
cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
ssh-copy-id -i spark3
3、JDK1.7
4、MobaXterm
5、hadoop2.4.1
tar -zxvf hadoop2.4.2.tar.gz
mv hadoop2.4.1 hadoop
vim ~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc
配置hadoop下面的etc/hadoop目錄下面的配置文件
修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://spark1:9000</value> </property> </configuration>
修改hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/usr/local/data/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/data/datanode</value> </property> <property> <name>dfs.tmp.dir</name> <value>/usr/local/data/tmp</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
修改mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
修改yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>spark1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
修改slaves
spark1
spark2
spark3
啟動hadoop集群
格式化namenode在spark1上面執行
hdfs namenode -format
start-dfs.sh
安裝一個jdk開發工具
yum install java-1.8.0-openjdk-devel.x86_64
啟動完成后要確認
spark1上面有namenode,datanode,secondarynamenode
spark2上面有datanode
spark3上面有datanode
http://spark1:50070/dfshealth.html#tab-overview可以訪問
啟動yarn集群
start-yarn.sh
spark1:resourcemanager、nodemanager
spark2:nodemanager
spark3:nodemanager
http://spark1:8088/cluster可以訪問
6、Hive 0.13
1、配置hive
tar -zxvf apache-hive-0.13-bin.tar.gz
mv apache-hive-0.13-bin hive
vim ~/.bashrc
$HIVE_HOME=/usr/local/hive
配置環境變量
2、安裝mysql-sever
$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rp
$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install -y mysql
service mysqld start
chkconfig mysqld on
yum install -y mysql-connector-java
cp /usr/share/java/mysql-connector-java.jar /usr/local/hive/lib/
3、登錄mysql創建hive元數據庫
4、配置hive
去掉createDatabaseIfNotExist增加serverTimezone=Asia/Shanghai
驗證hive就是輸入hive看是否進入命令行
create table t(id int);
select * from t;
drop table t;
7、Zookeeper3.4.5
tar -zxvf zookeeper3.4.5
mv zookeeper3.4.5 zk
配置環境變量
vim ~/.bashrc
export ZOOKEEPER_HOME=/usr/local/zk
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin
修改zookeeper的conf下面的配置文件
mv zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/usr/local/zk/data
server.0=spark1:2888:3888
server.1=spark2:2888:3888
server.2=spark3:2888:3888
窗口data目錄和myid文件
mkdir data
vim myid 0
拷貝zk文件和環境變量到spark2,spark3並且刷新環境變量,修改myid 為,1,2
scp -r zk/ root@spark3:/usr/local/
scp ~/.bashrc root@spark3:~/
啟動zk
zkServer.sh start
8、kafka_2.9.2-0.8.1
解壓scala-2.11.4.tgz解壓,該命成scala
配置環境變量
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zk
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin
解壓kafka_2.9.2-0.8.1
修改配置文件kafak/config/server.properites
zookeeper.connect=spark1:2181,spark2:2181,spark3:2181
broker.id=0
將slf4j-1.7.6.zip解壓
slf4j-nop-1.7.6.jar 復制到kafka/libs下面
啟動kafka
nohup bin/kafka-server-start.sh config/server.properties &
測試集群
bin/kafka-topics.sh --zookeeper 192.168.0.106:2181,192.168.0.107:2181,192.168.0.108:2181 --topic Test --replication-factor 1 --partitions 1 --create
bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic Test
bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic Test --from-beginning
9、spark 1.3.0
上傳spark-1.3.0-bin-hadoop2.4.tgz,解壓改名
配置環境變量
export JAVA_HOME=/usr/lib/jvm/jre export HADOOP_HOME=/usr/local/hadoop export HIVE_HOME=/usr/local/hive export ZOOKEEPER_HOME=/usr/local/zk export SCALA_HOME=/usr/local/scala export SPARK_HOME=/usr/local/spark export CLASS_PATH=.:$CLASS_PATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
cd /usr/local/spark/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/jre export SCALA_HOME=/usr/local/scala export SPARK_MASTER=192.168.0.106 export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
cp slaves.template slaves
vim slaves
spark1
spark2
spark3
復制spark和~/.bashrc到spark2,spark3,然后source ~/.bashrc
啟動spark
cd /usr/local/spark/sbin
./start-all.sh
驗證通過jps查看是否有spark1:master,worker,spark2:worker,spark3:worker
運行spark-shell,進入如下界面