shell join詳解

本文轉載自查看原文 2017-09-28 17:26 1533 Linux學習

首先貼一個，join --help

Usage: join [OPTION]... FILE1 FILE2
For each pair of input lines with identical join fields, write a line to
standard output.  The default join field is the first, delimited by whitespace.  When FILE1 or FILE2 (not both) is -, read standard input.

  -a FILENUM        print unpairable lines coming from file FILENUM, where
                      FILENUM is 1 or 2, corresponding to FILE1 or FILE2
  -e EMPTY          replace missing input fields with EMPTY
  -i, --ignore-case ignore differences in case when comparing fields
  -j FIELD          equivalent to `-1 FIELD -2 FIELD'
  -o FORMAT         obey FORMAT while constructing output line
  -t CHAR           use CHAR as input and output field separator
  -v FILENUM        like -a FILENUM, but suppress joined output lines
  -1 FIELD          join on this FIELD of file 1
  -2 FIELD          join on this FIELD of file 2
      --help     display this help and exit
      --version  output version information and exit

Unless -t CHAR is given, leading blanks separate fields and are ignored,
else fields are separated by CHAR.  Any FIELD is a field number counted
from 1.  FORMAT is one or more comma or blank separated specifications,
each being `FILENUM.FIELD' or `0'.  Default FORMAT outputs the join field,
the remaining fields from FILE1, the remaining fields from FILE2, all
separated by CHAR.

Important: FILE1 and FILE2 must be sorted on the join fields.

Report bugs to <bug-coreutils@gnu.org>.

然后來理解下。

join 【命令選項】文件1 文件2

//命令選項可以很多，但文件只能是兩個

先從重要的開始說，join 的作用是把兩個文件對一列求交集，然后輸出交集部分。

來先看個基本的例子：

$ cat A.txt

1 abc 20
2 ccc 22
3 sed 11
4 xxx 23

$ cat B.txt

1 h 0
2 x 2
3 b 3
5 s 3

$ join A.txt B.txt

1 abc 20 h 0
2 ccc 22 x 2
3 sed 11 b 3

為什么得到上面的結果，因為join默認使用空格作為分隔符（可以使用-t設定分割符），使用第一行作為主列（用於求交集的列）。

如果要將所有內容都出來呢，不管有沒有配對。可以使用-a命令。

$ join -a1 A.txt B.txt
1 abc 20 h 0
2 ccc 22 x 2
3 sed 11 b 3
2 xxx 23

//可以發現，A.txt中沒有配對的內容在文件的末尾被輸出了。

同樣可以把A.txt 和 B.txt都輸出來。

$ join -a1 -a2 A.txt B.txt
1 abc 20 h 0
2 ccc 22 x 2
3 sed 11 b 3
2 xxx 23
5 s 3

但是這時候卻發現，排版和我們想的不一樣。最后兩行根本分不清是來戰A.txt還是B.txt。

這時候就要用-o命令和-e命令了。

$ join -a1 -a2 -e"_" -o'1.1 1.2 1.3 2.1 2.2 2.3' A.txt B.txt
1 abc 20 1 h 0
2 ccc 22 2 x 2
3 sed 11 3 b 3
2 xxx 23 _ _ _
_ _ _ 5 s 3

其中-e表示如果元素不存在時填充什么， -o 表示以哪種形式輸出（1.1 表示文件1中的第一列）。

如何求A.txt中有，而B.txt中沒有的呢？

這時候就需要使用-v了

join -v1 A.txt B.txt
2 xxx 23

輸出了A中有而B中沒有的部分。

另外-i 忽略大小寫

-j x 相當於同時寫了-1x -2x

也就是指定兩個文件的x列作為主列。

join內部是怎么實現的呢，我們來看join中的重要要求，每個文件的主列都必須是排好序的!!!

是不是一下就知道了join是怎么實現的了，就是兩個有序的數組求交集嘛。是不是對join的復雜度也有了更深的理解。忽略列的大小的情況下，O(n + m)就可以完成了，其中n為文件1的行數，m是文件2的行數。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Linux shell join命令詳解 shell命令--join 方法join()使用詳解 Mysql中的Join詳解 sql join詳解 Fork/Join框架詳解 Fork/Join框架詳解 Thread.Join()的詳解 Linux：join命令詳解 hive join詳解