回复 15# lxh623
有几个文件没有下载可能与网络有关,也可能代码有缺陷,具体不清楚。
如果仅仅是html转换成txt,可以这样:
PS: HtoX32c 对文件名长度有限制,如果文件名过长直接退出。刚开始转换一直不成功,找好久才发现这个问题。- @set @n=0;/* & echo off
- pushd HTML\
- dir /b *.html *.aspx|cscript -nologo -e:jscript "%~0"
- "%~dp0HtoX32c" /IP /O0 *.$
- del *.$
- pause & exit/b & rem */
-
- var fso = new ActiveXObject("Scripting.FileSystemObject");
- while(!WScript.StdIn.AtEndOfStream){SaveFile(WScript.StdIn.ReadLine())}
-
- function SaveFile(file){
- var i = 0;
- with(new ActiveXObject("ADODB.Stream")){
- Mode = 3;
- Type = 2;
- CharSet = "UTF-8";
- Open();
- LoadFromFile(file);
- var txt = ReadText().replace(/[\s\S]+>正文内容<.*\n/, "").split(">标签:<")[0];
- txt = txt.replace(/&[a-z]+;/g, "");
- var name = txt.match(/<h1>(.+?)<\/h1>/i)[1];
- name = name.replace(/[\/\|\\:"<>\?\*]/g, "").replace(/(.{128}).*/, "$1");
- var newname = name;
- while(fso.FileExists(newname + ".$")){
- i++;
- newname = name + "[" + i + "]";
- }
- Position = 0;
- CharSet = "GBK";
- WriteText(txt);
- SetEOS;
- SaveToFile(newname + ".$");
- }
- }
复制代码
|