批处理之家 - Powered by Discuz! Board

标题: [文本处理] 批处理如何批量提取文本中的域名地址？ [打印本页]

作者: Rasm 时间: 2012-10-24 13:09 标题: 批处理如何批量提取文本中的域名地址？

本帖最后由 pcl_test 于 2016-7-7 22:51 编辑

http://www.autohome.com.cn/102/
http://newcar.xcar.com.cn/731/
http://db.auto.sohu.com/model_1445/
http://car.bitauto.com/ym/
http://data.auto.sina.com.cn/657
http://data.auto.qq.com/car_serial/280/index.shtml
http://www.yemaauto.cn/
http://www.chexun.com/mustang/
http://db.auto.sohu.com/model_1335/

软件整理后的内容如上，但是我的目标是指获取地址，而不需要页面内容，而且最好能去重复文本中的地址

处理后的目标如下：

http://www.autohome.com.cn
http://newcar.xcar.com.cn
http://db.auto.sohu.com
http://car.bitauto.com
http://data.auto.sina.com.cn
http://data.auto.qq.com
http://www.yemaauto.cn
http://www.chexun.com
http://db.auto.sohu.com

如果文本内有相同的域名，最好也同时去重复，并保存为2.txt

作者: andyrave 时间: 2012-10-24 15:05

@echo off
cd.>2.txt
For /f "tokens=2* delims=/" %%a IN (a.txt) do (
for /f "delims=" %%i in ('echo http://%%a') do (
find /i "%%i" 2.txt||echo %%i>>2.txt
))
复制代码

作者: 韩立 时间: 2012-10-24 21:21

本帖最后由韩立于 2012-10-24 21:38 编辑

假设要处理的文本为1.txt，没实现去除重复。

@echo off
for /f "tokens=1-2 delims=/" %%i in (1.txt) do echo %%i//
%%j >>2.txt
del 1.txt & ren 2.txt 1.txt
复制代码

参照二楼修改了一下

@echo off
cd.>2.txt
for /f "tokens=1-2 delims=/" %%i in (1.txt) do (
find /i "%%i//%%j" 2.txt||echo %%i//%%j >>2.txt)
del 1.txt & ren 2.txt 1.txt
复制代码

作者: wc726842270 时间: 2012-10-25 04:54

去除重复
if not define %%i (echo %%i&&set %%i=#)
不过得放在适当的位置

作者: forfiles 时间: 2012-10-25 10:32

回复 3# 韩立

echo %%i//%%j >>2.txt
复制代码

这种写法的问题在于行尾有多余的空格，可以改成：

>>2.txt echo %%i//%%j
复制代码

还有这两句：

del 1.txt & ren 2.txt 1.txt
复制代码

可以合并成：

move /y 2.txt 1.txt
复制代码

作者: forfiles 时间: 2012-10-25 10:33

回复 2# andyrave

find的效率太低了，建议参考4楼define的方法。

欢迎光临批处理之家 (http://www.bathome.net/)