批处理之家 - Powered by Discuz! Board

标题: [文件操作] 批处理文件如何从杂乱的网页源代码总提取所需图片地址并保存？ [打印本页]

作者: onilvo 时间: 2012-1-15 15:29 标题: 批处理文件如何从杂乱的网页源代码总提取所需图片地址并保存？

如何从杂乱的网页源代码总提取所需图片地址并保存？
我这有某网站的源代码文件1.txt~100.txt，一下是1.txt的其中的一部分：

href="http://www.topit.me/item/2277171">12964667477319</a></div><div class="info"></div></div><a href="http://www.topit.me/item/2277171"><img id="item_d_2277171" class="img" alt="12964667477319" width="245px" height="200px" src="http://img.topit.me/m/201101/31/12964667477319.jpg" /></a></div><div class="e m"><div class="hover bar"><div class="heartbtn"><a id="item_heart_771744" class="heart nologin" href="http://www.topit.me/login?ref=%2Ftag%2FComic%3Fp%3D1"><span>heart</span></a></div><div class="title"><a href="http://www.topit.me/item/771744">12416993</a></div><div class="info"></div></div><a href="http://www.topit.me/item/771744"><img id="item_d_771744" class="img" alt="12416993" width="245px" height="200px" src="http://img.topit.me/m/201008/14/12817636911240.jpg" /></a></div><div class="e m"><div class="hover bar"><div class="heartbtn"><a id="item_heart_533918" class="heart nologin" href="http://www.topit.me/login?ref=%2Ftag%2FComic%3Fp%3D1"><span>heart</span></a></div><div class="title"><a href="http://www.topit.me/item/533918">Ⅶ</a></div><div class="info"></div></div><a href="http://www.topit.me/item/533918"><img id="item_d_533918" class="img" alt="Ⅶ" width="245px" height="200px" src="" /></a></div><div class="e m"><div class="hover bar"><div class="heartbtn"><a id="item_heart_2324230" 
复制代码

我需要提取整个文件中所有以类似：

http://img.topit.me/m/201008/14/12817636911240.jpg
复制代码

格式的图片地址（有些地址可能不在同一行中）保存到到List.txt文件中，效果如下：

http://img.topit.me/m/201101/31/12964667477319.jpg
http://img.topit.me/m/201008/14/12817636911240.jpg
http://img.topit.me/m/201007/14/12791168666293.jpg
。
。
。
复制代码

同时我在论坛里找到了荣誉版主namejm大大的类似功能的代码，但是却不能解决我的问题，希望大家修改一下：

@echo off
cd.>list.txt
for %%i in (*.htm) do (
    (echo.&echo %%i 中的图片&echo.)>>list.txt
    for /f "delims=" %%j in ('findstr /i "src=.*http://.*\.jpg" %%i 2^>nul') do (
        set "str=%%j"
        setlocal enabledelayedexpansion
        set str=!str:"=!
        set str=!str:*src=!
        for /f "delims==> " %%k in ("!str!") do echo %%k>>list.txt
        endlocal
    )
)
start list.txt
复制代码

作者: CrLf 时间: 2012-1-15 17:49

因为等号是默认分隔符，所以可以用这个特点以 for %%b 来对网页原文件进行划分

@echo off
for /f "delims=" %%a in (1.txt) do (
   for %%b in (%%a) do (
      for /f delims^=^" %%c in ("%%b") do if /i %%~xc==.jpg echo %%c
   )
)
pause
复制代码

作者: asnahu 时间: 2012-1-15 18:58

grep -Po "http[^\x22]*[0-9]+.jpg" url
复制代码

作者: applba 时间: 2012-1-15 21:40

正则表达式啊，findstr啊

作者: onilvo 时间: 2012-1-16 19:34

:):):):)

欢迎光临批处理之家 (http://www.bathome.net/)

Powered by Discuz! 7.2