1.從原文件中隨機選出若干行
可以直接用shuf命令就可以完成:
$ shuf -n 100 source.txt > target.txt
shuf命令的說明:
$ shuf --help Usage: shuf [OPTION]... [FILE] or: shuf -e [OPTION]... [ARG]... or: shuf -i LO-HI [OPTION]... Write a random permutation of the input lines to standard output. With no FILE, or when FILE is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -e, --echo treat each ARG as an input line -i, --input-range=LO-HI treat each number LO through HI as an input line -n, --head-count=COUNT output at most COUNT lines -o, --output=FILE write result to FILE instead of standard output --random-source=FILE get random bytes from FILE -r, --repeat output lines can be repeated -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit
2.把文件隨機切分成若干部分
這里我的做法是先把文件全部打亂,再進行順序切分
(1)全部打亂
$ shuf source.txt > source_shuffle.txt
(2)順序切分
切分的方法有很多種:用split、head/tail、awk、sed都可以,根據實際需要選用即可
(可參考:[Linux] 輸出文件的指定行、Linux 大文件的分割與合並)
例如,這里把打亂后的文件根據前100行與剩余的部分作為最終想要的隨機切分結果:
$ head -n100 source_shuffle.txt > target1.txt $ tail -n+101 source_shuffle.txt > target2.txt # 或者$ awk 'NR>=101' source_shuffle.txt > target2.txt
如果有其它更高效便捷的方法也歡迎指教~