大数据
linux操作部分
1.创建用户
-
进入root身份: su
-
输入root密码
-
创建新用户: useradd zhang
-
给新用户设置密码: passwd zhang
-
给新用户设置下次登陆时,更改密码: chage -d 0 zhang
-
重启命令: reboot创建群组
2.创建群组
- 进入root身份: su
- 输入root密码
- 新建群组: groupadd san
- 查看群组是否创建成功: tail -5 /etc/group
3.将用户加到新建的群组中
-
将用户加到群组里: usermod -G san zhang
-
查看用户是否加入群组: tail -5 /etc/group
4.在“/home”中新建一个名为“share”的目录,更改其所属群组为san中的组;使“share”目录满足条件:san中的组内成员可以在“share”目录中创建文件或目录,删除和修改自己创建的文件或目录,但只能读取别人创建的文件或目录
-
创建目录: mkdir /home/share
-
更改文件所属组: chgrp san /home/share
chgrp 允许普通用户改变文件所属的组
-
改变文件属性: chmod 1777 /home/share
5.使用zhang在“/home/zhang/”中新建一个名为“mytime.sh”的脚本文件,其功能为“获取当前系统时间在屏幕上显示,并将获取到的时间保存到当前目录中的mytime.txt中”。修改该脚本文件,使其成为可执行文件。修改环境变量“PATH”,把“mytime.sh”加入其中,并测试在任意路径下执行“mytime”。
-
创建mytime.sh: vi mytime.sh
-
将下面的复制到mytime.sh里
#! /bin/bash DATE=$(date) if [ -e mytime.txt ];then echo "文件已存在!" else `touch mytime.txt` echo "文件已创建成功!" fi echo $DATE > mytime.txt
-
打开.bashrc: vi .bashrc
-
在里面写上export PATH=$PATH:/home/zhang/
6.(1)在“/home/zhang/01/”中新建目录“khdir”。将“mytime.sh”和“mytime.txt”文件复制到“khdir”中。将“khdir”目录打包并压缩,压缩后文件名为“mytimes.tar.gz”,并放在“/home/zhang”目录下。
- 创建文件夹khdir: mkdir -p /home/zhang/01/khdir
- 复制mytime.sh到目标目录: cp mytime.sh /home/zhang/01/khdir
- 复制mytime.txt到目标目录: cp mytime.txt /home/zhang/01/khdir
- 将“khdir”目录打包并压缩: tar -czvf mytimes.tar.gz /home/zhang/01/ khdir
- 把“mytimes.tar.gz”放在“/home/zhang”目录下: mv /home/zhang/01/mytimes.tar.gz
7.为zhang授权,使其拥有root权限
-
查看root权限设置文件属性: ll /etc/sudoers
-
给/etc/sudoers加可写属性: chmod u+w ll /etc/sudoers
-
打开/etc/sudoers文件并编辑: vi /etc/sudoers
找到root权限那一行叫root all… 下面也写上这个,把上面的root改成你的用户名
-
测试权限是否可用: sudo useradd usertest1
-
查看测试是否成功: tail -5 /etc/passwd
8.使用SSH将“eclipse-jee-2021-09-R-linux-gtk-x86_64.tar.gz”上传到系统中,安装到/usr/local目录下,运行一次eclipse
-
使用ssh协议中的sftp上传: sftp zhang@192.168.160.11
-
上传文件: put E:/桌面/学习/eclipse-jee-2021-09-R-linux-gtk-x86_64.tar.gz /home/zhang
-
退出上传: exit
-
解压软件到目录: sudo tar -zxvf eclipse-jee-2021-09-R-linux-gtk-x86_64.tar.gz -C /usr/local
-
打开软件目录: cd /usr/local/eclipse
-
打开软件: ./eclipse
大数据分析
标签类型最多的前20
select tag ,count(*) num from bigdata_tags group by tag order by num desc limit 20;
In Netflix queue 131
atmospheric 36
superhero 24
thought-provoking 24
funny 23
Disney 23
surreal 23
religion 22
dark comedy 21
sci-fi 21
quirky 21
psychology 21
suspense 20
crime 19
twist ending 19
visually appealing 19
politics 18
mental illness 16
music 16
time travel 16
用户评价星级的个数
select rating, count(*) num from bigdata_ratings group by rating order by num desc;
4 35369
3 33183
5 13211
2 13101
1 4602
0 1370
查询每年用户评价为五星,且电影类型为Adventure的数量
select year(r.rat_time) , count(*) num from bigdata_movies m join bigdata_ratings r on m.movieId=r.movieId where r.rating=5 and m.genres like concat('%','Adventure','%') group by year(r.rat_time) order by year(r.rat_time) desc;
2018 179
2017 257
2016 194
2015 158
2014 22
2013 44
2012 63
2011 34
2010 39
2009 54
2008 71
2007 70
2006 63
2005 74
2004 24
2003 70
2002 150
2001 131
2000 265
1999 107
1998 16
1997 109
1996 226
查询电影网络电影资料库id大于50000且星级大于4并且评价标签里含有“In Netflix queue”并且电影时间是1996年按电影名字分组排序
select m.title,count(*) num from bigdata_links l join bigdata_movies m on l.movieId=m.movieId join bigdata_ratings r on m.movieId=r.movieId join bigdata_tags t on m.movieId=t.movieId where l.imdbId > 50000 and r.rating>4 and t.tag like concat('%','In Netflix queue','%') and m.title like concat('%','1996','%') group by m.title o
rder by num desc;
Lone Star (1996) 8
Secrets & Lies (1996) 6
When We Were Kings (1996) 3
Kolya (Kolja) (1996) 2
Paradise Lost: The Child Murders at Robin Hood Hills (1996) 1
查询评价标签里含有“In Netflix queue”并且三个表中电影id都相同并且电影类型为Adventure按电影名字星级分组排序
select m.title, r.rating, count(*) num from bigdata_movies m join bigdata_ratings r on m.movieId = r.movieId join bigdata_tags t on r.userId=t.userId where t.tag like concat('%','In Netflix queue','%') and m.movieId=r.movieId and m.movieId = t.movieId and m.genres like concat('%','Adventure','%') group by m.title, r.rating order by num desc;
Tokyo Godfathers (2003) 4 1
Howl's Moving Castle (Hauru no ugoku shiro) (2004) 4 1
Porco Rosso (Crimson Pig) (Kurenai no buta) (1992) 3 1
Duma (2005) 3 1
查询用户id相同并且电影名字相同并且星级=5按照电影名字排序前20个
select m.title,count(*) num from bigdata_movies m join bigdata_ratings r on m.movieId=r.movieId join bigdata_tags t on r.userId=t.userId where r.userId = t.userId and r.movieId = t.movieId and r.rating = 5 group by m.title order by num desc limit 20 ;
Pulp Fiction (1994) 176
Fight Club (1999) 49
2001: A Space Odyssey (1968) 39
Léon: The Professional (a.k.a. The Professional) (Léon) (1994) 32
"Big Lebowski 31
Eternal Sunshine of the Spotless Mind (2004) 24
Eraserhead (1977) 16
Mary and Max (2009) 13
Inception (2010) 13
"Talented Mr. Ripley 12
Django Unchained (2012) 11
Battle Royale (Batoru rowaiaru) (2000) 10
Star Wars: Episode V - The Empire Strikes Back (1980) 10
"Lord of the Rings: The Return of the King 10
Margin Call (2011) 9
The Hateful Eight (2015) 9
"Sixth Sense 9
There Will Be Blood (2007) 8
"South Park: Bigger 8
In Bruges (2008) 8
查询电影名字为Hercules (1997)用户评星级的时间排序
select r.rat_time from bigdata_movies m join bigdata_ratings r on m.movieId=r.movieId where m.title='Hercules (1997)' order by r.rat_time desc;
2018-02-15
2017-12-26
2017-11-12
2017-05-02
2017-02-25
2016-10-15
2016-04-05
2015-09-10
2015-08-27
2015-07-04
2015-06-29
2015-05-19
2008-11-09
2008-11-01
2008-07-13
2007-11-25
2005-05-30
2005-04-22
2003-10-21
2003-05-27
2003-04-26
2002-09-28
2001-10-30
2001-01-03
2000-08-19
2000-08-08
2000-07-04
2000-02-17
1999-12-12
1999-02-28
1997-07-01