批处理之家 - Powered by Discuz! Board

标题: [问题求助] VBS如何用正则提取网址中的这一句？ [打印本页]

作者: batsealine 时间: 2013-5-27 15:37 标题: VBS如何用正则提取网址中的这一句？

set regex = New RegExp
set fso = CreateObject("scripting.filesystemobject")
Set http = CreateObject("Msxml2.XMLHTTP")
url = "http://www.xiami.com/search/album?key=%E6%B5%AE%E8%BA%81"

http.open "GET",url,False
http.send
html = http.responseText

regex.ignoreCase = true
regex.Global = true
regex.Pattern = """浮躁"""
Set matches = regex.Execute(html)
For Each match In matches
	msgbox match
Next
复制代码

我想的是先用title="浮躁"及title="王菲"提取到这一段

		<div class="album_item100_block">
			<p class="cover"><a class="CDcover100" href="/album/11943" title="浮躁">
			<img src="http://img.xiami.com/images/album/img77/2177/119431362392699_1.jpg" width="100" height="100" alt="" /></a>		
						</p>
			<p class="name"><a href="/album/11943" title="浮躁"><b class="key_red">浮躁</b></a>
			<a class="singer" href="/artist/2177" title="王菲">王菲</a>
			</p>
			<p class="album_rank clearfix"><span style="width:48.5px;">总体评分</span><em>9.7</em></p>
			<p class="year">1996-08</p>
		</div>
复制代码

然后再提取： "http://img.xiami.com/images/album/img77/2177/119431362392699_1.jpg"，我想保证精度，因为另一个人也有可能有"浮躁"专辑。

作者: apang 时间: 2013-5-27 22:21

Set http = CreateObject("Msxml2.XMLHTTP")
url = "http://www.xiami.com/search/album?key=%E6%B5%AE%E8%BA%81"

http.open "GET",url,False
http.send()
Do Until http.ReadyState = 4 :Wscript.Sleep 100 :Loop
html = http.responseText
Set http = Nothing

With New RegExp
    .Global = true
    .ignoreCase = true
    .Pattern = "title=""浮躁""(.*\r\n){4}.*title=""王菲"""
    For Each match In .Execute(html)
        MsgBox Split(Split(match,vbCrLf)(1),Chr(34))(1)
    Next
End With
复制代码

作者: apang 时间: 2013-5-31 19:09

貌似也可以这样：

Set http = CreateObject("Msxml2.XMLHTTP")
url = "http://www.xiami.com/search/album?key=%E6%B5%AE%E8%BA%81"

http.open "GET",url,False
http.send()
Do Until http.ReadyState = 4 :Wscript.Sleep 100 :Loop
html = http.responseText
Set http = Nothing

Set re = New RegExp
re.Global = true
re.ignoreCase = true
re.Pattern = "title=""浮躁""[\s\S]*?(http://.*?\.jpg)[\s\S]*?title=""王菲"""
For Each match In re.Execute(html)
      MsgBox match.SubMatches(0)
Next
复制代码

欢迎光临批处理之家 (http://www.bathome.net/)