sqoop學習3(數據導入亂碼問題)


sqoop將mysql數據庫中數據導入hdfs或hive中后中文亂碼問題解決辦法

[root@spark1 ~]# vi /etc/my.cnf 修改配置文件
在文件內的[mysqld]和client下增加如下1行
[mysqld]
default-character-set=utf8
[client]
default-character-set=utf8

然后在創建數據庫和表時都指定字符集為utf8
mysql> create database wujiadong1 character set utf8;
mysql> create table stud_info(
    -> stud_code varchar(50) not null,
    -> stud_name varchar(50) not null,
    -> stud_gend varchar(50) not null default 'M',
    -> birthday date null,
    -> log_date date null,
    -> orig_addr varchar(50) null,
    -> lev_date date null,
    -> college_code varchar(50) null,
    -> college_name varchar(50) null,
    -> state varchar(50) null,
    -> primary key(stud_code)
    -> )character set utf8;

mysql> load data local infile '/root/hive_test/stud_info.csv' into table stud_info
    -> fields terminated by ','
    -> lines terminated by '\n'
    -> ignore 1 lines;

mysql> select * from stud_info; #看中文字符能否正常顯示

再向hdfs中導入數據
[root@spark1 ~]# sqoop import --connect jdbc:mysql://192.168.220.144:3306/wujiadong1 --username root --table stud_info --target-dir 'hdfs://spark1:9000/user/sqoop_test1' -m 1
[root@spark1 ~]# hadoop fs -lsr /user/sqoop_test1
[root@spark1 ~]# hadoop fs -cat /user/sqoop_test1/part-m-00000

image

mysql數據導入hdfs中中文亂碼問題總結

  • 修改mysql里面的my.conf文件
  • 創建數據庫,指定字符集是utf8
  • 再新的數據庫里面創建表,在create table語句里面指定字符集是 utf8
  • 插入中文漢字記錄
  • select看到中文是正常的
  • 依次完成這些操作以后,再用sqoop導入

導入hdfs解決中文亂碼問題后,再去導入hive中就沒出現亂碼問題了,所以應該是一樣的解決方法

mysql中的編碼查看和修改方法

查看編碼方式
mysql> show variables like 'collation_%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+

mysql> show variables like 'character_set_%'; 查看mysql數據庫默認編碼
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

修改編碼方式在/etc/my.cnf這個文件中修改
[root@spark1 ~]# vi /etc/my.cnf
root@spark1 ~]# service mysqld restart 重啟mysql
查看是否變成utf8
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (x86_64) using readline 5.1

Connection id:		6
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.1.73 Source distribution
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/var/lib/mysql/mysql.sock
Uptime:			22 min 3 sec

Threads: 1  Questions: 59  Slow queries: 0  Opens: 20  Flush tables: 1  Open tables: 9  Queries per second avg: 0.44
--------------

mysql> show variables like "char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> show variables like "colla%";
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM