标题: [文本处理] 【已解决】求助批处理从一堆文本文件中提取关键词写到csv中 [打印本页]
作者: zhengwei007 时间: 2024-2-25 18:38 标题: 【已解决】求助批处理从一堆文本文件中提取关键词写到csv中
本帖最后由 zhengwei007 于 2024-2-27 09:07 编辑
我有几百个这样的xml文档,文档中是以<item>开头,</item>结尾的,想通过批处理将里面内容整理出来,XML内容如下:- <item id="3" type="Weapon" name="宽剑">
- <set name="icon" val="icon.weapon_broad_sword_i00" />
- <set name="default_action" val="equip" />
- <set name="weapon_type" val="sword" />
- <set name="bodypart" val="rhand" />
- <set name="random_damage" val="10" />
- <set name="attack_range" val="40" />
- <set name="damage_range" val="0;0;40;120" />
- <set name="immediate_effect" val="1" />
- <set name="material" val="steel" />
- <set name="weight" val="1590" />
- <set name="price" val="9600" />
- <set name="soulshots" val="1" />
- <set name="spiritshots" val="1" />
- <for>
- <set order="0x08" stat="pAtk" val="11" />
- <set order="0x08" stat="mAtk" val="9" />
- <set order="0x08" stat="rCrit" val="8" />
- <set order="0x08" stat="pAtkSpd" val="379" />
- </for>
- </item>
- <item id="4" type="Weapon" name="木棒">
- <set name="icon" val="icon.weapon_club_i00" />
- <set name="default_action" val="equip" />
- <set name="weapon_type" val="blunt" />
- <set name="bodypart" val="rhand" />
- <set name="random_damage" val="20" />
- <set name="attack_range" val="40" />
- <set name="damage_range" val="0;0;40;120" />
- <set name="immediate_effect" val="1" />
- <set name="material" val="wood" />
- <set name="weight" val="1870" />
- <set name="price" val="590" />
- <set name="soulshots" val="1" />
- <set name="spiritshots" val="1" />
- <for>
- <set order="0x08" stat="pAtk" val="8" />
- <set order="0x08" stat="mAtk" val="6" />
- <set order="0x08" stat="rCrit" val="4" />
- <add order="0x10" stat="accCombat" val="4.75" />
- <set order="0x08" stat="pAtkSpd" val="379" />
- </for>
- </item>
- </item>
- <item id="1201" type="EtcItem" name="信仰凭证">
- <set name="icon" val="icon.accessary_earing_of_wisdom_i00" />
- <set name="immediate_effect" val="1" />
- <set name="material" val="steel" />
- <set name="is_tradable" val="false" />
- <set name="is_dropable" val="false" />
- <set name="is_sellable" val="false" />
- <set name="is_depositable" val="false" />
- <set name="is_stackable" val="true" />
- <set name="is_questitem" val="true" />
- </item>
复制代码
我已经将文件上传到网盘,大概意思如下:
1、将每个name拿出来,值拿出来列到文档中。
2、遇到0x0几的直接pass不要。
3、遇到新的没有的name,直接往表格最后增加新列。
4、标题列可能会很多,但没关系,一直往后排就行了。
按照上面的3个例子,输出结果举例:- id name icon default_action weapon_type bodypart random_damage attack_range damage_range immediate_effect material weight price soulshots spiritshots pAtk mAtk rCrit pAtkSpd accCombat type is_tradable is_dropable is_sellable is_depositable is_stackable is_questitem
- 3 宽剑 icon.weapon_broad_sword_i00 equip sword rhand 10 40 0;0;40;120 1 steel 1590 9600 1 1 11 9 8 379
- 4 木棒 icon.weapon_club_i00 equip blunt rhand 20 40 0;0;40;120 1 wood 1870 590 1 1 8 6 4 4.75
- 1201 信仰凭证 icon.accessary_earing_of_wisdom_i00 1 steel EtcItem FALSE FALSE FALSE FALSE FALSE FALSE
复制代码
文件我打包了,请大佬帮忙看看。
链接:https://pan.baidu.com/s/1eIJ1l6VcI8OmFoyFJwvO-Q
提取码:x4jk
--来自百度网盘超级会员V9的分享
作者: czjt1234 时间: 2024-2-26 08:59
本帖最后由 czjt1234 于 2024-2-26 09:18 编辑
- rem 另存为 ANSI 编码 的 bat
- ' & cls & %windir%\SysWOW64\CScript.exe /nologo /e:vbscript "%~f0" & pause & exit
-
- Option Explicit
- Dim oWshShell, oFSO, oTextStream, oDOMDocument, oXMLDOMElement, oConnection, oRecordset, s, t
-
- Const PATH = "C:\Users\Administrator\Desktop\items"
- Const OUT = "C:\Users\Administrator\Desktop\items\输出.txt"
-
- wsh.Echo Now()
- Set oWshShell = CreateObject("WScript.Shell")
- Set oFSO = CreateObject("Scripting.FileSystemObject")
- Set oTextStream = oFSO.OpenTextFile(OUT, 2, True)
- Set oDOMDocument = CreateObject("Msxml2.DOMDocument")
- Set oConnection = CreateObject("ADODB.Connection")
- Set oRecordset = CreateObject("ADODB.Recordset")
-
- s = oFSO.BuildPath(PATH, "temp.mdb")
- If oFSO.FileExists(s) Then oFSO.DeleteFile s, True
- s = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & s
- CreateObject("ADOX.Catalog").Create s
- oConnection.Open s
- oConnection.Execute "CREATE TABLE list(id INT PRIMARY KEY, name VARCHAR)"
-
- t = "|"
- For Each s In oFSO.GetFolder(PATH).Files
- If LCase(oFSO.GetExtensionName(s)) = "xml" Then Call k(s.Path)
- Next
- oRecordset.CursorLocation = 3 'adUseClient
- oRecordset.Open "SELECT * FROM list ORDER BY id ASC", oConnection
- s = ""
- For t = 0 To oRecordset.Fields.Count - 1
- s = s & oRecordset(t).Name & vbTab
- Next
- s = Left(s, Len(s) - 1) & vbCrLf
- Do Until oRecordset.EOF = True
- For t = 0 To oRecordset.Fields.Count - 1
- s = s & oRecordset(t).Value & vbTab
- Next
- s = Left(s, Len(s) - 1) & vbCrLf
- If Len(s) >= 2048 Then
- oTextStream.Write s
- s = ""
- End If
- oRecordset.MoveNext()
- Loop
- If s <> "" Then oTextStream.Write s
- oRecordset.Close()
- oConnection.Close()
- oTextStream.Close()
-
- s = oFSO.BuildPath(PATH, "temp.mdb")
- If oFSO.FileExists(s) Then oFSO.DeleteFile s, True
- wsh.Echo Now()
- wsh.Echo "ok"
-
- Sub k(ByVal s)
- Dim i, j, m
- oDOMDocument.Load s
- Set oXMLDOMElement = oDOMDocument.documentElement
- For Each i In oXMLDOMElement.SelectNodes("item[@id and @name]")
- oConnection.Execute "INSERT INTO list(id, name) VALUES(" & i.getAttribute("id") & ", '" & i.getAttribute("name") & "')"
- m = "UPDATE list SET "
- For Each j In i.SelectNodes("set[@name and @val]")
- If InStr(1, t, "|" & j.getAttribute("name") & "|", vbTextCompare) = 0 Then
- oConnection.Execute "ALTER TABLE list ADD COLUMN [" & j.getAttribute("name") & "] VARCHAR"
- t = t & j.getAttribute("name") & "|"
- End If
- If InStr(1, m, "[" & j.getAttribute("name") & "]", vbTextCompare) = 0 Then '处理异常文件 id = 20994
- m = m & "[" & j.getAttribute("name") & "] = """ & j.getAttribute("val") & """, "
- End If
- Next
- oConnection.Execute Left(m, Len(m) - 2) & " WHERE id = " & i.getAttribute("id")
- Next
- End Sub
复制代码
作者: zhengwei007 时间: 2024-2-26 10:19
本帖最后由 zhengwei007 于 2024-2-26 10:20 编辑
czjt1234 发表于 2024-2-26 08:59
<for>
<set order="0x08" stat="pAtk" val="8" />
<set order="0x08" stat="mAtk" val="6" />
<set order="0x08" stat="rCrit" val="4" />
<add order="0x10" stat="accCombat" val="4.75" />
<set order="0x08" stat="pAtkSpd" val="379" />
</for>
请问这些数据没读取,0x0*这些不要,但后面的stat=是要的,patk,matk这些是字段名,把这些字段当成“name=”就好了。
作者: czjt1234 时间: 2024-2-26 10:26
<add 这行算不算?
作者: likeyou32 时间: 2024-2-26 10:39
回复 2# czjt1234
您这是批处理吗? 好像是vba吧, ,而且这oFSO.GetFolder(PATH).Files 好像连vba都不是
作者: ShowCode 时间: 2024-2-26 10:55
回复 5# likeyou32
是用BAT调用的VBS
http://bbs.bathome.net/thread-4610-1-1.html
作者: zhengwei007 时间: 2024-2-26 10:59
回复 4# czjt1234
都算,只是不要前面的order,你就把代码看成 <add stat="accCombat" val="4.75" />,字段是accCombat,值是4.75
作者: 77七 时间: 2024-2-26 12:17
- <set order="0x08" stat="pAtk" val="207" />
- <set order="0x08" stat="mAtk" val="157" />
- <set order="0x08" stat="rCrit" val="4" />
- <add order="0x10" stat="accCombat" val="4.75" />
- <set order="0x08" stat="pAtkSpd" val="325" />
- <enchant val="0" order="0x0C" stat="pAtk" />
- <enchant val="0" order="0x0C" stat="mAtk" />
复制代码
最后两行也算吗?
作者: zhengwei007 时间: 2024-2-26 12:45
都算,stat的值就是列表标题,val就是值。
作者: czjt1234 时间: 2024-2-26 13:26
- rem 另存为 ANSI 编码 的 bat
- ' & cls & %windir%\SysWOW64\CScript.exe /nologo /e:vbscript "%~f0" & pause & exit
-
- Option Explicit
- Dim oWshShell, oFSO, oTextStream, oDOMDocument, oXMLDOMElement, oConnection, oRecordset, s, t
-
- Const PATH = "C:\Users\Administrator\Desktop\items"
- Const OUT = "C:\Users\Administrator\Desktop\items\输出.txt"
-
- wsh.Echo Now()
- Set oWshShell = CreateObject("WScript.Shell")
- Set oFSO = CreateObject("Scripting.FileSystemObject")
- Set oTextStream = oFSO.OpenTextFile(OUT, 2, True)
- Set oDOMDocument = CreateObject("Msxml2.DOMDocument")
- Set oConnection = CreateObject("ADODB.Connection")
- Set oRecordset = CreateObject("ADODB.Recordset")
-
- s = oFSO.BuildPath(PATH, "temp.mdb")
- If oFSO.FileExists(s) Then oFSO.DeleteFile s, True
- s = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & s
- CreateObject("ADOX.Catalog").Create s
- oConnection.Open s
- oConnection.Execute "CREATE TABLE list(id INT PRIMARY KEY, name VARCHAR)"
-
- t = "|"
- For Each s In oFSO.GetFolder(PATH).Files
- If LCase(oFSO.GetExtensionName(s)) = "xml" Then Call k(s.Path)
- Next
-
- oRecordset.CursorLocation = 3 'adUseClient
- oRecordset.Open "SELECT * FROM list ORDER BY id ASC", oConnection
- s = ""
- For t = 0 To oRecordset.Fields.Count - 1
- s = s & oRecordset(t).Name & vbTab
- Next
- s = Left(s, Len(s) - 1) & vbCrLf
- Do Until oRecordset.EOF = True
- For t = 0 To oRecordset.Fields.Count - 1
- s = s & oRecordset(t).Value & vbTab
- Next
- s = Left(s, Len(s) - 1) & vbCrLf
- If Len(s) >= 2048 Then
- oTextStream.Write s
- s = ""
- End If
- oRecordset.MoveNext()
- Loop
- If s <> "" Then oTextStream.Write s
- oRecordset.Close()
- oConnection.Close()
- oTextStream.Close()
-
- s = oFSO.BuildPath(PATH, "temp.mdb")
- If oFSO.FileExists(s) Then oFSO.DeleteFile s, True
- wsh.Echo Now()
- wsh.Echo "ok"
-
- Sub k(ByVal s)
- Dim i, j, m
- oDOMDocument.Load s
- Set oXMLDOMElement = oDOMDocument.documentElement
- For Each i In oXMLDOMElement.SelectNodes("item[@id and @name]")
- oConnection.Execute "INSERT INTO list(id, name) VALUES(" & i.getAttribute("id") & ", '" & i.getAttribute("name") & "')"
- m = "UPDATE list SET "
- For Each j In i.SelectNodes(".//*[@val]")
- If Not IsNull(j.getAttribute("name")) Then
- If InStr(1, t, "|" & j.getAttribute("name") & "|", vbTextCompare) = 0 Then
- oConnection.Execute "ALTER TABLE list ADD COLUMN [" & j.getAttribute("name") & "] VARCHAR"
- t = t & j.getAttribute("name") & "|"
- End If
- If InStr(1, m, "[" & j.getAttribute("name") & "]", vbTextCompare) = 0 Then '处理异常文件 id = 20994
- m = m & "[" & j.getAttribute("name") & "] = """ & j.getAttribute("val") & """, "
- End If
- End If
- If Not IsNull(j.getAttribute("stat")) Then
- If InStr(1, t, "|" & j.getAttribute("stat") & "|", vbTextCompare) = 0 Then
- oConnection.Execute "ALTER TABLE list ADD COLUMN [" & j.getAttribute("stat") & "] VARCHAR"
- t = t & j.getAttribute("stat") & "|"
- End If
- If InStr(1, m, "[" & j.getAttribute("stat") & "]", vbTextCompare) = 0 Then
- m = m & "[" & j.getAttribute("stat") & "] = """ & j.getAttribute("val") & """, "
- End If
- End If
- Next
- oConnection.Execute Left(m, Len(m) - 2) & " WHERE id = " & i.getAttribute("id")
- Next
- End Sub
复制代码
作者: 77七 时间: 2024-2-26 13:58
- @echo off
- rem 批处理保存为utf-8编码格式
- chcp 65001 >nul
- cd /d "%~dp0"
- for %%i in (*.xml) do (
- for /f usebacktokens^=1-6delims^=^" %%a in ("%%i") do (
- if "%%a" equ " <set name=" (
- set #"%%b"=,
- ) else if "%%c" equ " stat=" (
- set #"%%d"=,
- ) else if "%%e" equ " stat=" (
- set #"%%f"=,
- )
- )
- )
- (
- set /p=id,type,name<nul
- for /f "tokens=1 delims=#=" %%a in ('set #') do set /p=,%%~a<nul
- echo=
- )>out.csv
- for %%i in (*.xml) do (
- set /a n+=1
- call echo 正在处理第 [%%n%%] 个
- (for /f usebacktokens^=1-6delims^=^" %%a in ("%%i") do (
- if "%%a" equ " <item id=" (
- setlocal enabledelayedexpansion
- set str=%%b,%%d,%%f
- ) else if "%%a" equ " <set name=" (
- set #"%%b"=%%d
- ) else if "%%c" equ " stat=" (
- set #"%%d"=%%f
- ) else if "%%e" equ " stat=" (
- set #"%%f"=%%b
- ) else if "%%a" equ " </item>" (
- for /f "tokens=2 delims=#=" %%x in ('set #') do (
- if "%%x" equ "," (
- set str=!str!,
- ) else (
- set str=!str!,"%%x"
- )
- )
- echo=!str!
- endlocal
- )
- ))>>out.csv
- )
- pause
复制代码
作者: aloha20200628 时间: 2024-2-26 14:10
本帖最后由 aloha20200628 于 2024-2-26 14:13 编辑
延续前帖(http://www.bathome.net/thread-68463-1-1.html)的算法逻辑,先给一个处理单个源文件的纯P版(命令行参数指定单个源文件的路径文件名),供楼主随机测试那一堆不同源文件...
楼主的源文件是utf-8+编码,故本脚本运行结果是生成utf-8编码的*.new文件
- @echo off &setlocal enabledelayedexpansion
- if "%~1"=="" exit/b
- chcp 65001>nul
- set "tLine=id type name"
- (for /f tokens^=1-6^ delims^=^"^= %%1 in (' findstr "=" "%~1" ') do for /f "tokens=1-2 delims= < " %%a in ("%%~1") do (
- if /i "%%~b"=="id" (
- if defined vLine (echo,!vLine:~1!&set "vLine=")
- set "vLine=!vLine! %%2 %%4 %%6"
- ) else if /i "%%~b"=="name" (
- if not defined _%%2 (set "tLine=!tLine! %%2" &set "_%%2=1")
- set "vLine=!vLine! %%4"
- ) else if /i "%%~b"=="order" (
- if not defined _%%4 (set "tLine=!tLine! %%4" &set "_%%4=1")
- set "vLine=!vLine! %%6"
- ) else if /i "%%~b"=="val" (
- if not defined _%%6 (set "tLine=!tLine! %%6" &set "_%%6=1")
- set "vLine=!vLine! %%2"
- )
- ))>"%~1.v"
- if defined vLine (echo,!vLine:~1!)>>"%~1.v"
- echo,!tLine!>"%~1.new" & type "%~1.v">>"%~1.new" & del "%~1.v"
- endlocal&exit/b
复制代码
作者: aloha20200628 时间: 2024-2-26 14:43
回复 1# zhengwei007
如12楼脚本代码(假设存为 test.bat)用随机性测试均予通过,可用以下一行代码(直接复制到命令行运行即可)完成指定目录中的所有*.xml源文件处理...- for %a in (*.xml) do @test.bat "%~a"
复制代码
作者: zhengwei007 时间: 2024-2-27 09:06
谢谢楼上各位,问题已经完美解决。
作者: czjt1234 时间: 2024-2-27 09:15
回复 14# zhengwei007
讨论一下么
各方案的实际执行的优缺点
有讨论才有动力
作者: zhengwei007 时间: 2024-2-27 23:06
回复 zhengwei007
讨论一下么
各方案的实际执行的优缺点
有讨论才有动力
czjt1234 发表于 2024-2-27 09:15
从结果看,你们的代码效果都能完美实现我想要的功能,运行速度都挺快的,大概5秒就出结果了,优点很明显。要让我这个外行说,我总结一下:
10楼代码:
1)默认情况下,我必须将bat另存为ANSI 编码,因为我这WIN11默认是UTF-8。
2)脚本里面需要手动指定两个路径,我分别重新指定过了。
11楼代码:
我认为没什么说的,我将代码在当前目录保存好后就可以直接运行了。
所以从上面看,11楼代码是最傻瓜式的。
欢迎光临 批处理之家 (http://www.bathome.net/) |
Powered by Discuz! 7.2 |