标题: [文本处理] 批处理如何提取出一行超长字符串里的指定内容? [打印本页]
作者: cjiabing 时间: 2016-1-14 14:55 标题: 批处理如何提取出一行超长字符串里的指定内容?
本帖最后由 pcl_test 于 2016-1-14 16:18 编辑
从网页源文件中提取到一超长字符串(只有一行,不够长可以自己加),如:- <td align="left" valign="top"><!--begin 1581042-0-9--> <a href=content/2015-12/17/content_6404737.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容1 <span class="f12 black">2015-12-17</span></a> <br> <a href=content/2015-12/17/content_6404733.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容2 <span class="f12 black">2015-12-17</span></a> <br> <a href=content/2015-12/17/content_6404731.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容3 <span class="f12 black">2015-12-17</span></a> <br> <a href=content/2015-12/17/content_6404726.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容4 <span class="f12 black">2015-12-17</span></a> <br> <a href=content/2015-11/25/content_6371151.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容5 <span class="f12 black">2015-11-25</span></a> <br> <a href=content/2015-10/14/content_6304117.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容6 <span class="f12 black">2015-10-14</span></a> <br> <a href=content/2015-10/14/content_6304094.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容7 <span class="f12 black">2015-10-14</span></a> <br> <a href=content/2015-10/14/content_6304085.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容8 <span class="f12 black">2015-10-14</span></a> <br> <a href=content/2015-10/14/content_6304078.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容9 <span class="f12 black">2015-10-14</span></a> <br> <a href=content/2015-09/11/content_6264794.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容10 <span class="f12 black">2015-09-11</span></a> <br> <hr size="1"></hr> <a href=content/2015-08/17/content_6197492.htm?node=43149 class="f14 blue001" target=_blank><span class="f14 blue001">·</span>标题内容11 <span class="f12 black">2015-08-17</span></a> <br> <hr size="1"></hr> <div id="displaypagenum"><p><center> <span>1</span> <a href=node_43149_2.htm>2</a> <a href=node_43149_2.htm>下一页</a> <a href=node_43149_2.htm></a></center></p></div><script language="javascript">function turnpage(page){ document.all("div_currpage").innerHTML = document.all("div_page_roll"+page).innerHTML;}</script><!--end 1581042-0-9--></td>
复制代码
从字符串中提取对应的网址和标题内容,如:
content/2015-12/17/content_6404737.htm?node=143149 标题内容
纯批处理可能受到超长行的限制,需要换行处理。
如能用sed等三方也可以。谢谢!
作者: pcl_test 时间: 2016-1-14 16:05
本帖最后由 pcl_test 于 2016-1-14 16:11 编辑
- //&cls&cscript -nologo -e:jscript "%~fs0"<"文本.txt"&pause&exit
- function getStr(patt, txt){
- var str, s='';
- while ((str = patt.exec(txt)) != null){
- s += str[1]+'\t'+str[2]+'\r\n';
- }
- return s;
- }
- var reg = /<a\shref="?([^\s"]+)"?[^>]*><span.+?<\/span>([^<]+?)( )*</g;
- var htmltxt = WScript.StdIn.ReadAll();
- WSH.echo(getStr(reg, htmltxt));
复制代码
欢迎光临 批处理之家 (http://www.bathome.net/) |
Powered by Discuz! 7.2 |