标题: [问题求助] VBS如何实现文本不同编码之间互相转换? [打印本页]
作者: xp3000 时间: 2015-6-26 09:58 标题: VBS如何实现文本不同编码之间互相转换?
EPUB电子书导出后是HTML或HTM格式的,附件在下面,
例如改成TXT,从TXT文本里面截取,粘贴进论坛是能换行的,但是记事本查看却不是换行的,换行是个黑的
文件是ANSI格式,需要能互相转换- <h2>《贫女》</h2>
- <p>作者:秦韬玉</p>
- <p>蓬门未识绮罗香,拟托良媒益自伤。</p>
- <p>谁爱风流高格调,共怜时世俭梳妆。</p>
- <p>敢将十指夸针巧,不把双眉斗画长。</p>
- <p>苦恨年年压金线,为他人作嫁衣裳。</p>
- </body>
- </html>
复制代码
这个是网上找的一个VBS,不能多文件格式转换,而且转换后还是原来样子,黑格子还在,怎么弄成浏览文件夹的?
让VBS选择转换的编码也能按照选项框的数字转换目标?比如
1:目标转换为ANSI
2:目标转换为Unicode
3:目标转换为UTF-8- '用法:将要更改编码的所有文件放到同一个文件夹中,将文件夹拖到该vbs上,输入要转换成的字符编码
- Dim fso,fd,fl,f,fdpath,charset
- On Error Resume Next
- If WScript.Arguments.Length>=1 Then
- fdpath = WScript.Arguments(0)
- Else
- fdpath = InputBox("请输入文件夹路径:" & vbcrlf & "另外一种方法是直接将文件夹拖到VBS文件上","VBS编码转换","D:\TEMP")
- If fdpath = "" Then WScript.Quit
- End If
- If WScript.Arguments.Length>=2 Then
- charset = WScript.Arguments(1)
- Else
- charset = InputBox("请输入字符编码类型:" & vbcrlf & "支持(ANSI、Unicode、UTF-8)转换","转换目标类型(默认值UTF-8可修改)","UTF-8")
- if charset = "" then WScript.Quit
- if UCase(charset) = "ANSI" then charset = "GB2312"
- End If
- Set fso = CreateObject("scripting.filesystemobject")
- Set fd = fso.GetFolder(fdpath)
- Set fl=fd.Files
- For each f in fl
- convertct f.Path,charset
- Next
- MsgBox "字符编码转换结束",,"提示"
-
- '将读取的文件内容以指定编码写入文件
- Function convertct(filepath,charset)
- Dim FileName, FileContents, dFileContents
- FileName = filepath
- FileContents = LoadFile(FileName)
- Set savefile = CreateObject("adodb.stream")
- savefile.Type = 2 '这里1为二进制,2为文本型
- savefile.Mode = 3
- savefile.Open()
- savefile.charset = charset
- savefile.Position = savefile.Size
- savefile.Writetext(FileContents) 'write写二进制,writetext写文本型
- savefile.SaveToFile filepath,2
- savefile.Close()
- set savefile = nothing
- End Function
- '以文件本身编码读取文件
- Function LoadFile(Path)
- Dim Stm2
- Set Stm2 = CreateObject("ADODB.Stream")
- Stm2.Type = 2
- Stm2.Mode = 3
- Stm2.Open
- Stm2.Charset = CheckCode(path)
- 'Stm2.Charset = "UTF-8"
- 'Stm2.Charset = "Unicode"
- 'Stm2.Charset = "GB2312"
- Stm2.position = Stm2.Size
- Stm2.LoadFromFile Path
- LoadFile = Stm2.ReadText
- Stm2.Close
- Set Stm2 = Nothing
- End Function
- '该函数检查并返回文件的编码类型
- Function CheckCode(file)
- Dim slz
- set slz = CreateObject("Adodb.Stream")
- slz.Type = 1
- slz.Mode = 3
- slz.Open
- slz.Position = 0
- slz.Loadfromfile file
- Bin=slz.read(2)
- If is_valid_utf8(read(file)) Then
- Codes="UTF-8"
- ElseIf AscB(MidB(Bin,1,1))=&HFF and AscB(MidB(Bin,2,1))=&HFE Then
- Codes="Unicode"
- Else
- Codes="GB2312"
- End if
- slz.Close
- Set slz = Nothing
- CheckCode = Codes
- End Function
- '将Byte()数组转成String字符串
- Function read(path)
- Dim ado, a(), i, n
- Set ado = CreateObject("ADODB.Stream")
- ado.Type = 1 : ado.Open
- ado.LoadFromFile path
- n = ado.Size - 1
- ReDim a(n)
- For i = 0 To n
- a(i) = ChrW(AscB(ado.Read(1)))
- Next
- read = Join(a, "")
- End Function
- '准确验证文件是否为utf-8(能验证无BOM头的uft-8文件)
- Function is_valid_utf8(ByRef input) 'ByRef以提高效率
- Dim s, re
- Set re = New Regexp
- s = "[\xC0-\xDF]([^\x80-\xBF]|$)"
- s = s & "|[\xE0-\xEF].{0,1}([^\x80-\xBF]|$)"
- s = s & "|[\xF0-\xF7].{0,2}([^\x80-\xBF]|$)"
- s = s & "|[\xF8-\xFB].{0,3}([^\x80-\xBF]|$)"
- s = s & "|[\xFC-\xFD].{0,4}([^\x80-\xBF]|$)"
- s = s & "|[\xFE-\xFE].{0,5}([^\x80-\xBF]|$)"
- s = s & "|[\x00-\x7F][\x80-\xBF]"
- s = s & "|[\xC0-\xDF].[\x80-\xBF]"
- s = s & "|[\xE0-\xEF]..[\x80-\xBF]"
- s = s & "|[\xF0-\xF7]...[\x80-\xBF]"
- s = s & "|[\xF8-\xFB]....[\x80-\xBF]"
- s = s & "|[\xFC-\xFD].....[\x80-\xBF]"
- s = s & "|[\xFE-\xFE]......[\x80-\xBF]"
- s = s & "|^[\x80-\xBF]"
- re.Pattern = s
- is_valid_utf8 = (Not re.Test(input))
- End Function
复制代码
作者: CrLf 时间: 2015-6-26 15:20
黑格子的问题是文件中换行符单独出现,而非编码原因
作者: xp3000 时间: 2015-6-26 17:34
哦,能不能弄成支持转换多类型文件的?不会哦
作者: yu2n 时间: 2015-6-27 08:04
VBS字符串处理效率低,识别文件编码也不好解决。
作者: xp3000 时间: 2015-6-28 18:25
其实还有个想法,就是通过几个VBS对比,从中学习些东西
欢迎光临 批处理之家 (http://www.bathome.net/) |
Powered by Discuz! 7.2 |