[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖
本帖最后由 asnahu 于 2011-7-26 12:09 编辑

replace pioneer是基于PERL的啊,如果直接用PERL不是更好?另外,也可以用awk处理:
  1. gawk "!a[$0]++" FILE
复制代码
  1. awk '!a[$0]++'
复制代码
This one-liner is very idiomatic. It registers the lines seen in the associative-array “a” (arrays are always associative in Awk) and at the same time tests if it had seen the line before. If it had seen the line before, then a[line] > 0 and !a[line] == 0. Any expression that evaluates to false is a no-op, and any expression that evals to true is equal to “{ print }”.

For example, suppose the input is:
  1. foo
  2. bar
  3. foo
  4. baz
复制代码
When Awk sees the first “foo”, it evaluates the expression “!a["foo"]++”. “a["foo"]” is false, but “!a["foo"]” is true - Awk prints out “foo”. Then it increments “a["foo"]” by one with “++” post-increment operator. Array “a” now contains one value “a["foo"] == 1″.

Next Awk sees “bar”, it does exactly the same what it did to “foo” and prints out “bar”. Array “a” now contains two values “a["foo"] == 1″ and “a["bar"] == 1″.

Now Awk sees the second “foo”. This time “a["foo"]” is true, “!a["foo"]” is false and Awk does not print anything! Array “a” still contains two values
“a["foo"] == 2″ and “a["bar"] == 1″.

Finally Awk sees “baz” and prints it out because “!a["baz"]” is true. Array “a” now contains three values “a["foo"] == 2″ and “a["bar"] == 1″ and “a["baz"] == 1″.

The output:
  1. foo
  2. bar
  3. baz
复制代码
Here is another one-liner to do the same. Eric in his one-liners says it’s the most efficient way to do it.
  1. awk '!($0 in a) { a[$0]; print }'
复制代码
It’s basically the same as previous one, except that it uses the ‘in’ operator. Given an array “a”, an expression “foo in a” tests if variable “foo” is in “a”.

Note that an empty statement “a[$0]” creates an element in the array.

可以从这里下载:http://unxutils.sourceforge.net/

TOP

系统要有gawk啊,下面就是解释嘛,哪里看不懂?

TOP

  1. gawk "!a[$0]++"<a.txt>b.txt
复制代码
看这样行不行

TOP

awk处理行是非常快的了

TOP

回复 12# cm535

我2楼的回答里不是有更加高效的方法了吗,要认真看啊

TOP

本帖最后由 asnahu 于 2011-7-27 20:26 编辑

我这没法测试处理这么大的文件到底需要多长时间,需要注意的是机器的硬件环境是很重要的。

TOP

返回列表