标题: 已解决-(50块钱)删除每行前/后8个汉字内所包含的标点符号 [打印本页]
作者: 黄大人 时间: 2021-2-11 22:36 标题: 已解决-(50块钱)删除每行前/后8个汉字内所包含的标点符号
本帖最后由 黄大人 于 2021-2-12 10:32 编辑
具体报酬:50元
支付方式:微信/支付宝
联系方式:已解决
有效期限:已解决
系统:win7 64位
删除每行前面8个汉字内所包含的标点符号,或且删除每行后面8个汉字内所包含的标点符号
两个功能要求,可分开写也可写单独写,单独写你要教我,前后功能要改哪里。
第一功能:(把每行前8个汉字所包含的标点删除,不管中英标点)
处理前:
当别人,说你,不好不适合我时、我只说了,一句话我喜欢。
如果、我的时。空没有了你那么。我将石沉,海底。
处理后:
当别人说你不好不适合我时、我只说了,一句话我喜欢。
如果我的时空没有了你那么。我将石沉,海底。
第二功能:(把每行后8个汉字所包含的标点删除,不管中英标点)
处理前:
当别人,说你,不好不适合我时、我只说了,一句话,我喜欢。
如果、我的时。空没有了你那么。我将。石沉,海底。
处理后:
当别人,说你,不好不适合我时、我只说了一句话我喜欢
如果、我的时。空没有了你那么我将石沉海底
注:要有备份功能,处理后的文件自动保存到另一文件夹。
作者: flashercs 时间: 2021-2-11 23:43
本帖最后由 flashercs 于 2021-2-12 01:22 编辑
保存为.bat- <#*,:&cls
- @echo off
- pushd "%~dp0"
- powershell -NoProfile -ExecutionPolicy RemoteSigned -Command ". ([ScriptBlock]::Create((Get-Content -LiteralPath \"%~0\" -ReadCount 0 | Out-String ))) "
- popd
- pause
- exit /b
- #>
- # 如果替换,则值为1;如果不替换,则值为0
- $替换前8 = 1
- $替换后8 = 1
- # 输出目录
- $dirOut = ".\newDir"
- # 前后汉字数量,默认是8
- $CJKCount = 8
-
- if (-not (Test-Path -Path $dirOut)) {
- New-Item -Path $dirOut -ItemType Directory
- } elseif (-not (Test-Path -Path $dirOut -PathType Container)) {
- Remove-Item -Path $dirOut -Force
- New-Item -Path $dirOut -ItemType Directory
- }
- function Get-Encoding {
- # output: [System.Text.Encoding], $null
- [CmdletBinding(DefaultParameterSetName = "PathSet")]
- param (
- [Parameter(ParameterSetName = "StreamSet", Mandatory = $true)]
- [ValidateNotNullOrEmpty()]
- [System.IO.Stream]$Stream,
- [Parameter(ParameterSetName = "PathSet", Mandatory = $true, Position = 0)]
- [ValidateNotNullOrEmpty()]
- [System.String]$Path,
- [Parameter(Mandatory = $false, Position = 1)]
- [System.UInt32]$ReadCount = 1024
- )
- $utf8BOMThrow = New-Object System.Text.UTF8Encoding -ArgumentList @($true, $true)
- $utf8NoBOMThrow = New-Object System.Text.UTF8Encoding -ArgumentList @($false, $true)
- $utf16LEBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($false, $true, $true)
- $utf16LENoBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($false, $false, $true)
- $utf16BEBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($true, $true, $true)
- $utf16BENoBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($true, $false, $true)
- # type encoding,bool bom,bool throw,Text.Encoding encoding,byte[] preamble,string strPreamble
- $arrUTF8Bom = $utf8BOMThrow.GetPreamble()
- $arrUTF16LEBom = $utf16LEBOMThrow.GetPreamble()
- $arrUTF16BEBom = $utf16BEBOMThrow.GetPreamble()
-
- if ($PSCmdlet.ParameterSetName -eq "PathSet") {
- try {
- $Stream = New-Object System.IO.FileStream -ArgumentList @($Path, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::Read)
- } catch {
- return $null
- }
- }
- $byteBuff = New-Object byte[] -ArgumentList 3
- $readCount = $Stream.Read($byteBuff, 0, 3)
- if ($byteBuff[0] -eq $arrUTF8Bom[0] -and $byteBuff[1] -eq $arrUTF8Bom[1] -and $byteBuff[2] -eq $arrUTF8Bom[2]) {
- # utf8bom
- $return = $utf8BOMThrow
- } elseif ($byteBuff[0] -eq $arrUTF16LEBom[0] -and $byteBuff[1] -eq $arrUTF16LEBom[1]) {
- # utf16lebom
- $return = $utf16LEBOMThrow
- } elseif ($byteBuff[0] -eq $arrUTF16BEBom[0] -and $byteBuff[1] -eq $arrUTF16BEBom[1]) {
- # utf16bebom
- $return = $utf16BEBOMThrow
- } else {
- # nobom
- if ($ReadCount -gt 0) {
- $charBuff = New-Object char[] -ArgumentList $ReadCount
- }
- # utf16-nobom 都被认为是ANSI编码
- foreach ($encoding in @($utf8NoBOMThrow<# , $utf16LENoBOMThrow, $utf16BENoBOMThrow #>)) {
- try {
- $Stream.Position = 0
- $sr = New-Object System.IO.StreamReader -ArgumentList @($Stream, $encoding, $false)
- if ($ReadCount -gt 0) {
- [void]$sr.Read($charBuff, 0, $ReadCount)
- } else {
- [void]$sr.ReadToEnd()
- }
- $return = $encoding
- break
- } catch {
-
- } finally {
- if ($sr) {
- $sr.Dispose()
- }
- }
- }
- }
- if ($PSCmdlet.ParameterSetName -eq "PathSet") {
- $Stream.Dispose()
- }
- if (!$return) {
- $return = [System.Text.Encoding]::Default
- }
- return $return
- }
- $reCJK = New-Object System.Text.RegularExpressions.Regex -ArgumentList @('\p{IsCJKUnifiedIdeographs}', 'Compiled, Ignorecase')
- # $reCJK = New-Object System.Text.RegularExpressions.Regex -ArgumentList @('\w', 'Compiled, Ignorecase')
- Get-ChildItem -Path .\*.txt -Filter *.txt -Include *.txt | ForEach-Object {
- if (-not $_.PSIsContainer) {
- $encoding = Get-Encoding -Path $_.FullName
- if ($null -eq $encoding) {
- $encoding = [System.Text.Encoding]::GetEncoding(0)
- }
- try {
- [System.IO.File]::WriteAllLines((Join-Path -Path $dirOut -ChildPath $_.Name), [string[]]@([System.IO.File]::ReadAllLines($_.FullName, $encoding) | ForEach-Object {
- $str = $_
- if ($替换前8) {
- $cjkMatches = $reCJK.Matches($str)
- if ($cjkMatches.Count -gt 0) {
- $index = $cjkMatches[[math]::Min($CJKCount - 1, $cjkMatches.Count - 1)].Index
- $str = ($str.Substring(0, $index + 1) -replace '[\p{P}]+', '') + $str.Substring($index + 1)
- }
- }
- if ($替换后8) {
- $cjkMatches = $reCJK.Matches($str)
- if ($cjkMatches.Count -gt 0) {
- $index = $cjkMatches[[math]::Max(0, $cjkMatches.Count - $CJKCount)].Index
- $str = $str.Substring(0, $index) + ($str.Substring($index) -replace '[\p{P}]+', '')
- }
- }
- $str
- }), $encoding)
- } catch {
- $_ | Write-Host -ForegroundColor Red
- }
- }
- }
复制代码
作者: 黄大人 时间: 2021-2-12 10:30
回复 2# flashercs
代码验证OK,已打款到支付宝,请查收。
作者: 黄大人 时间: 2021-2-12 15:44
回复 2# flashercs
代码比我想像中的好用,维一不足的就是运行中没有进度状态显示,运行时就一黑压压没变化的窗口,处理文件多的时候,都不知道是死了还是卡住了。
作者: flashercs 时间: 2021-2-12 17:12
回复 4# 黄大人
修改一下,显示处理文件.- <#*,:&cls
- @echo off
- pushd "%~dp0"
- powershell -NoProfile -ExecutionPolicy RemoteSigned -Command ". ([ScriptBlock]::Create((Get-Content -LiteralPath \"%~0\" -ReadCount 0 | Out-String ))) "
- popd
- pause
- exit /b
- #>
- # 功能:删除每行前8/后8个汉字CJK字符中的标点符号.
-
- # 如果替换,则值为1;如果不替换,则值为0
- $替换前8 = 1
- $替换后8 = 1
- # 输出目录
- $dirOut = ".\newDir"
- # 前后汉字数量,默认是8
- $CJKCount = 8
-
- if (-not (Test-Path -Path $dirOut)) {
- New-Item -Path $dirOut -ItemType Directory
- } elseif (-not (Test-Path -Path $dirOut -PathType Container)) {
- Remove-Item -Path $dirOut -Force
- New-Item -Path $dirOut -ItemType Directory
- }
- function Get-Encoding {
- # output: [System.Text.Encoding], $null
- [CmdletBinding(DefaultParameterSetName = "PathSet")]
- param (
- [Parameter(ParameterSetName = "StreamSet", Mandatory = $true)]
- [ValidateNotNullOrEmpty()]
- [System.IO.Stream]$Stream,
- [Parameter(ParameterSetName = "PathSet", Mandatory = $true, Position = 0)]
- [ValidateNotNullOrEmpty()]
- [System.String]$Path,
- [Parameter(Mandatory = $false, Position = 1)]
- [System.UInt32]$ReadCount = 1024
- )
- $utf8BOMThrow = New-Object System.Text.UTF8Encoding -ArgumentList @($true, $true)
- $utf8NoBOMThrow = New-Object System.Text.UTF8Encoding -ArgumentList @($false, $true)
- $utf16LEBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($false, $true, $true)
- $utf16LENoBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($false, $false, $true)
- $utf16BEBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($true, $true, $true)
- $utf16BENoBOMThrow = New-Object System.Text.UnicodeEncoding -ArgumentList @($true, $false, $true)
- # type encoding,bool bom,bool throw,Text.Encoding encoding,byte[] preamble,string strPreamble
- $arrUTF8Bom = $utf8BOMThrow.GetPreamble()
- $arrUTF16LEBom = $utf16LEBOMThrow.GetPreamble()
- $arrUTF16BEBom = $utf16BEBOMThrow.GetPreamble()
-
- if ($PSCmdlet.ParameterSetName -eq "PathSet") {
- try {
- $Stream = New-Object System.IO.FileStream -ArgumentList @($Path, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::Read)
- } catch {
- return $null
- }
- }
- $byteBuff = New-Object byte[] -ArgumentList 3
- $readCount = $Stream.Read($byteBuff, 0, 3)
- if ($byteBuff[0] -eq $arrUTF8Bom[0] -and $byteBuff[1] -eq $arrUTF8Bom[1] -and $byteBuff[2] -eq $arrUTF8Bom[2]) {
- # utf8bom
- $return = $utf8BOMThrow
- } elseif ($byteBuff[0] -eq $arrUTF16LEBom[0] -and $byteBuff[1] -eq $arrUTF16LEBom[1]) {
- # utf16lebom
- $return = $utf16LEBOMThrow
- } elseif ($byteBuff[0] -eq $arrUTF16BEBom[0] -and $byteBuff[1] -eq $arrUTF16BEBom[1]) {
- # utf16bebom
- $return = $utf16BEBOMThrow
- } else {
- # nobom
- if ($ReadCount -gt 0) {
- $charBuff = New-Object char[] -ArgumentList $ReadCount
- }
- # utf16-nobom 都被认为是ANSI编码
- foreach ($encoding in @($utf8NoBOMThrow<# , $utf16LENoBOMThrow, $utf16BENoBOMThrow #>)) {
- try {
- $Stream.Position = 0
- $sr = New-Object System.IO.StreamReader -ArgumentList @($Stream, $encoding, $false)
- if ($ReadCount -gt 0) {
- [void]$sr.Read($charBuff, 0, $ReadCount)
- } else {
- [void]$sr.ReadToEnd()
- }
- $return = $encoding
- break
- } catch {
-
- } finally {
- if ($sr) {
- $sr.Dispose()
- }
- }
- }
- }
- if ($PSCmdlet.ParameterSetName -eq "PathSet") {
- $Stream.Dispose()
- }
- if (!$return) {
- $return = [System.Text.Encoding]::Default
- }
- return $return
- }
-
- $reCJK = New-Object System.Text.RegularExpressions.Regex -ArgumentList @('\p{IsCJKUnifiedIdeographs}', 'Compiled, Ignorecase')
- # $reCJK = New-Object System.Text.RegularExpressions.Regex -ArgumentList @('\w', 'Compiled, Ignorecase')
- Write-Host $null
- Get-ChildItem -Path .\*.txt -Filter *.txt -Include *.txt | ForEach-Object {
- if (-not $_.PSIsContainer) {
- $encoding = Get-Encoding -Path $_.FullName
- if ($null -eq $encoding) {
- $encoding = [System.Text.Encoding]::GetEncoding(0)
- }
- try {
- $dstfile = (Join-Path -Path $dirOut -ChildPath $_.Name)
- $_.Name + " -> " + $dstfile | Write-Host
- [System.IO.File]::WriteAllLines($dstfile, [string[]]@([System.IO.File]::ReadAllLines($_.FullName, $encoding) | ForEach-Object {
- $str = $_
- if ($替换前8) {
- $cjkMatches = $reCJK.Matches($str)
- if ($cjkMatches.Count -gt 0) {
- $index = $cjkMatches[[math]::Min($CJKCount - 1, $cjkMatches.Count - 1)].Index
- $str = ($str.Substring(0, $index + 1) -replace '[\p{P}]+', '') + $str.Substring($index + 1)
- }
- }
- if ($替换后8) {
- $cjkMatches = $reCJK.Matches($str)
- if ($cjkMatches.Count -gt 0) {
- $index = $cjkMatches[[math]::Max(0, $cjkMatches.Count - $CJKCount)].Index
- $str = $str.Substring(0, $index) + ($str.Substring($index) -replace '[\p{P}]+', '')
- }
- }
- $str
- }), $encoding)
- } catch {
- $_ | Write-Host -ForegroundColor Red
- }
- }
- }
复制代码
欢迎光临 批处理之家 (http://www.bathome.net/) |
Powered by Discuz! 7.2 |