返回列表 发帖
无聊 写了个 PS 跑了下, 50秒
QQ: 458609586
脚本优先 [PowerShell win10]

TOP

回复 31# xczxczxcz
老师好!
PS又是什么?
恳请老师详解,谢谢!

TOP

一分钟不到解决,真乃高手也!

TOP

我用纯批写了一个,且未去重,测试目录用了22分钟,一直未好意思发。

TOP

gnu命令: sed sort
@echo off
setlocal ENABLEDELAYEDEXPANSION
set path=%pathgnu%;%path%
echo %path%
set sour=新建文件夹
set dest=bbb1
set exclude=QQ
del /q "%dest%\*.*"
echo %~nx0 %time% 1 >>time.txt
for /f "usebackq tokens=* delims=" %%a in (`dir /s /b "%sour%\*.txt"`) do (
    echo "%%~fa"
    type "%%~fa" | sed -r -e "/\S+\s+\S+\s+\S+/^!d" -e "/%exclude%/d" -e "s/SZ/1/" -e "s/SH/0/" -e "s/\s+/^|/g"  >> "%dest%\%%~na.tmp"
)
echo,
echo %~nx0 %time% 2 >>time.txt
echo,
for /f "usebackq tokens=* delims=" %%a in (`dir /s /b "%dest%\*.tmp"`) do (
    echo "%%~fa"
    type "%%~fa" | sort.exe -u >>"%%~dpna.txt"
)
echo %~nx0 %time% 3 >>time.txt
echo,
del %dest%\*.tmpCOPY
1

评分人数

    • PCL0769: 谢谢老师出手帮助!高技术!高人品!技术 + 1

TOP

gnu 命令 sed awk
@echo off
setlocal ENABLEDELAYEDEXPANSION
set path=%pathgnu%;%path%
echo %path%
set sour=新建文件夹
set dest=bbb2
set exclude=QQ
del /q "%dest%\*.*"
echo %~nx0 %time% 1 >>time.txt
for /f "usebackq tokens=* delims=" %%a in (`dir /s /b "%sour%\*.txt"`) do (
    echo "%%~fa"
    type "%%~fa" | sed -r -e "/\S+\s+\S+\s+\S+/^!d" -e "/%exclude%/d" -e "s/SZ/1/" -e "s/SH/0/" -e "s/\s+/^|/g"  >> "%dest%\%%~na.tmp"
)
echo,
echo %~nx0 %time% 2 >>time.txt
echo,
for /f "usebackq tokens=* delims=" %%a in (`dir /s /b "%dest%\*.tmp"`) do (
    echo "%%~fa"
    type "%%~fa" | awk " { arr[$0]++ ; if ( arr[$0] == 1 ) { print $0 } } "  >>"%%~dpna.txt"
)
echo %~nx0 %time% 3 >>time.txt
echo,
del %dest%\*.tmpCOPY
1

评分人数

    • PCL0769: 谢谢老师出手帮助!高技术!高人品!技术 + 1

TOP

效率 比较
sed + sort
1.bat 16:01:09.17 1
1.bat 16:02:06.45 2
1.bat 16:02:21.43 3
sed + awk
2.bat 16:03:13.41 1
2.bat 16:04:11.68 2
2.bat 16:04:26.42 3

两种办法 效率相同,但是 ps 就差很多了。

TOP

速度够快。

TOP

回复 21# newswan


    可以试试net静态类,这里只把读写改了就快了很多,如果把替换部分也改了应该还能再加速
Get-ChildItem -path $sour *.txt -Recurse | foreach-object {
    $_.fullname
    $b=( [IO.File]::ReadAllLines($_.fullname) ) -match "\w+\s+\w+\s+[-]?\w+" -notmatch $exclude -replace "SZ","1" -replace "SH","0" -replace "\s+","|"
    $a=$dest,$_.name -join '\'
    [IO.File]::WriteAllLines($a,$b)
}COPY

TOP

本帖最后由 newswan 于 2021-10-5 21:07 编辑

回复 39# idwma


谢谢指点
net 没学过。。。
合并效率一样了,去重,还是很慢
$sour = "D:\share\tech\New folder\新建文件夹"
$dest = "D:\share\tech\New folder\ccc2"
$exclude = "QQ"
Remove-Item $dest\*.*
( $MyInvocation.MyCommand.Name + "  1  " + (get-date -Format "HH:mm:ss.ff").tostring() ) | out-file -Encoding ascii -append time.txt
Get-ChildItem -path $sour *.txt -Recurse | foreach-object {
    $_.fullname
    $filename = Join-Path -path $dest -ChildPath ($_.basename + ".tmp")
    $a=( [IO.File]::ReadAllLines($_.fullname) )
    $a = $a -match "\S+\s+\S+\s+\S+" -notmatch $exclude -replace "SZ","1" -replace "SH","0" -replace "\s+","|"
    [IO.File]::AppendAllLines([string]$filename , [string[]]$a)
}
( $MyInvocation.MyCommand.Name + "  2  " + (get-date -Format "HH:mm:ss.ff").tostring() ) | out-file -Encoding ascii -append time.txt
Get-ChildItem -path $dest *.tmp | foreach-object {
    $_.fullname
    $filename = Join-Path -path $dest -ChildPath ($_.basename + ".txt")
    $ht = @{}
    $a = ( [IO.File]::ReadAllLines($_.fullname) )
    $a = $a | foreach-object {
        if ( -not ( $ht.ContainsKey($_) ) )
        {
            $_
            $ht.add($_,"1")
        }
    }
    $a | out-file -Encoding utf8 $filename
}
( $MyInvocation.MyCommand.Name + "  3  " + (get-date -Format "HH:mm:ss.ff").tostring() ) | out-file -Encoding ascii -append time.txtCOPY
1

评分人数

    • PCL0769: 谢谢老师出手帮助!高技术!高人品!技术 + 1

TOP

回复 31# xczxczxcz
老师好!能将这个PS发出来吗?谢谢!

TOP

本帖最后由 idwma 于 2021-10-6 16:17 编辑

回复 40# newswan

去重的部分抄前辈的试试看快不快http://www.bathome.net/thread-25194-2-1.html
Get-ChildItem -path $dest *.tmp | foreach-object {
    $reader = New-Object -TypeName System.IO.StreamReader -ArgumentList $_.fullname
    $aa=New-Object System.Collections.Generic.HashSet[string]
    $_.fullname
    $filename = Join-Path -path $dest -ChildPath ($_.basename + ".txt")
    while ( $read = $reader.ReadLine() ) {
        out-null -InputObject $aa.add($read)
    }
    [IO.File]::WriteAllLines($filename,$aa)
}COPY
2

评分人数

    • PCL0769: 谢谢老师出手帮助!高技术!高人品!技术 + 1
    • newswan: 谢谢技术 + 1

TOP

回复 42# idwma


谢谢,这个用时是 unix 工具的 6 倍,比前面的要好多了

TOP

一段时间不用,很多都不熟悉了啊

TOP

可以尝试下多线程,线程数越多越快,取决你的cpu
数据处理函数可以自行优化
#&cls&cd /d "%~dp0" & @powershell -c "Get-Content '%~0' | Select-Object -Skip 1 | Out-String | Invoke-Expression" & pause&exit
cls
$t1 = Get-Date
$src_dir = '新建文件夹'
$dst_dir = 'out'
[void][System.IO.Directory]::CreateDirectory($dst_dir)
#线程函数 处理数据
$HandleGroupJob = {
    #线程参数
    param($dst_dir,$groupInfo)
    Write-Host $groupInfo.Name
    #汇总去重并筛选
$set = New-Object 'System.Collections.Generic.HashSet[string]'
    & { $groupInfo.Group | foreach { [IO.File]::ReadAllLines($_.FullName)} } | foreach {
if($_ -match '\S+\s+\d+\s+-?\d{2,}'){
[void]$set.Add(($_ -replace 'SH','1|' -replace 'SZ','0|' -replace '\s+','|'))
}
}
    #输出
    Out-File -InputObject $set -FilePath ('{0}\{1}' -f $dst_dir,$groupInfo.Name)
$set = $null
    return ($groupInfo.Name + ' 已完成')
}
#多线程设置
$pool = [runspacefactory]::CreateRunspacePool(1,10) #最多10个线程并发
$pool.Open()
$threads = New-Object 'System.Collections.ArrayList'
$results = New-Object 'System.Collections.ArrayList'
'开始创建线程...'
Get-ChildItem -Recurse -Path $src_dir -Filter '*.txt' | Group-Object {$_.Name} | foreach {
    $_.Name
    $thread = [powershell]::Create()
    $thread.RunspacePool = $pool
    [void]$thread.AddScript($HandleGroupJob)
    [void]$thread.AddArgument($dst_dir)
    [void]$thread.AddArgument($_)
    [void]$threads.Add($thread)
    [void]$results.Add($thread.BeginInvoke())
}
'-------------------------------'
'等待线程结束'
while($true){
    $all_done = $true
    for($i = 0; $i -lt $results.Count; $i++){
        if($results[$i] -ne $null){
            if($results[$i].IsCompleted){
                $threads[$i].EndInvoke($results[$i])
$threads[$i].Dispose()
$threads[$i] = $null
                $results[$i] = $null
[System.GC]::Collect()
            } else {
                $all_done = $false
            }
        }
    }
    if($all_done){ break }
    Start-Sleep -Milliseconds 500
}
#关闭线程池
$pool.Close()
'-------------------'
'{0}  -> {1}' -f $t1,(Get-Date)COPY

TOP

返回列表