标题: [文本处理] 批处理去除文本的重复行并保留空行 [打印本页]
作者: ly88888 时间: 2018-1-5 16:47 标题: 批处理去除文本的重复行并保留空行
本帖最后由 pcl_test 于 2018-1-6 01:31 编辑
网上的去重太多了,但是都把空行没了,现在需要保留空行!
作者: yhcfsr 时间: 2018-1-5 19:39
保留空行不难,但你要把要实现的要求写具体点。
作者: ly88888 时间: 2018-1-5 19:54
名言30句
名言30句
真理惟一可靠的标
准就是永远自相符合
歌词30句
歌词30句
土地是以它的肥沃和收获而被估价的
;才能也是土地,不过它生产的不是粮食,
而是真理。如果只能滋生瞑想和幻想的话,
诗词60句
诗词60句
时间是一切财富中
最宝贵的财富
——————————————————————————
处理后的结果
——————————————————————————
名言30句
真理惟一可靠的标
准就是永远自相符合
歌词30句
土地是以它的肥沃和收获而被估价的
;才能也是土地,不过它生产的不是粮食,
而是真理。如果只能滋生瞑想和幻想的话,
诗词60句
时间是一切财富中
最宝贵的财富
——————————————————————————
而不是
——————————————————————————
名言30句
真理惟一可靠的标
准就是永远自相符合
歌词30句
土地是以它的肥沃和收获而被估价的
;才能也是土地,不过它生产的不是粮食,
而是真理。如果只能滋生瞑想和幻想的话,
诗词60句
时间是一切财富中
最宝贵的财富
————————————
作者: yhcfsr 时间: 2018-1-5 20:22
回复 3# ly88888
你是想删除相邻两行重复的一行吧?如果不相邻的行有重复行,需要删除吗?
作者: ly88888 时间: 2018-1-5 20:33
回复 4# yhcfsr
不相邻行不存在重复 当然有重复的话也一起去掉
作者: yhcfsr 时间: 2018-1-5 21:35
本帖最后由 yhcfsr 于 2018-1-5 22:17 编辑
回复 5# ly88888 - @echo off&setlocal enableDelayedExpansion
- set "ScDir=d:\temp" rem 设置源目录
- set "OtFil=d:\temp.txt" rem 设置输出文件
-
- for /f "delims=" %%a in ('findstr /n ".*" "%ScDir%\test.txt"') do (
- for /f "tokens=1,2* delims=:" %%b in ("%%a") do (
- set "str=%%c"
- if "!str!" neq "" (if not defined [!str!] (
- set [!str!]=!str!
- call echo %%[!str!]%%
- )
- ) else (echo.)
- ))>>%OtFil%
-
- pause
复制代码
上面代码的缺陷是不能处理!&#<>等特殊英文字符
想要处理特殊字符,用powershell脚本或VBS做吧
作者: ly88888 时间: 2018-1-5 21:55
回复 6# yhcfsr
不能处理这些的文件名的文件还是文件了的这些内容?
作者: pcl_test 时间: 2018-1-6 22:51
win7及以上系统,保存为bat文件运行- <# :
- @echo off
- rem writen by pcl_test
- rem bat/ps1通用
- rem 测试前先备份源文件或文件夹
- type "%~f0"|powershell -noprofile -sta "-"
- >nul ping /n 6 0&exit
- #>
-
- $global:files=@();
- $ListFile={
- $global:files=@();
- $button2.Enabled = $false;
- $button2.Text = '获取文件中';
- if($radioButton1.Checked){
- $global:files=FileDialog;
- }else{
- $global:files=FolderDialog;
- }
- $textBox2.Text=$files.Count;
- if($files.Count){
- $textBox3.Text=$files -join "`r`n";
- $textBox3.Enabled = $true;
- }else{
- $textBox3.Text='测试前先备份源文件或文件夹';
- $textBox3.Enabled = $false;
- }
- $button2.Text = '选择文件或文件夹';
- $button2.Enabled = $true;
- }
-
- $EditFile={
- switch($comboBox1.Text)
- {
- 'UTF-8'{$outenc=[Text.Encoding]::UTF8;}
- 'Unicode(LittleEndian)'{$outenc=[Text.Encoding]::Unicode;}
- 'Unicode(BigEndian)'{$outenc=[Text.Encoding]::BigEndianUnicode;}
- Default{$outenc=[Text.Encoding]::Default;}
- }
- if($files.Count){
- $button1.Enabled = $false;
- $button2.Enabled = $false;
- $button1.Text = '处理中';
- for($i=0;$i -lt $files.Count;$i++)
- {
- $tmp=ClearExtra $files[$i];
- OutFile $files[$i] $tmp $outenc;
- if([int]$textBox2.Text -gt 0){$textBox2.Text=[int]$textBox2.Text-1;}
- [System.Windows.Forms.Application]::DoEvents();
- write-host $files[$i];
- }
- $button1.Text = '执行';
- $button1.Enabled = $true;
- $button2.Enabled = $true;
- [void][System.Windows.Forms.MessageBox]::Show('Done','提示');
- }
- }
-
-
- function CheckExt
- {
- $ext=@();
- if(($textBox1.Text.Trim() -eq '') -or ($textBox1.Text.Trim() -notmatch '^\*\.[a-z\d]+(;\*\.[a-z\d]+)*$')){
- [void][System.Windows.Forms.MessageBox]::Show('指定扩展名不能为空或非法');
- }else{
- $ext=($textBox1.Text -replace '\s*','').split(';',[StringSplitOptions]::RemoveEmptyEntries);
- }
- return ,$ext;
- }
-
- function FolderDialog
- {
- $filelist=@();
- $ext=CheckExt;
- if($ext.Count){
- $fbd = New-Object System.Windows.Forms.FolderBrowserDialog;
- $fbd.RootFolder = 'MyComputer';
- $fbd.ShowNewFolderButton = $false;
- $Show = $fbd.ShowDialog();
- If ($Show -eq 'OK'){
- $fd=$fbd.SelectedPath;
- dir $fd -r|?{($_ -is [System.IO.FileInfo]) -and ($ext -contains ('*'+$_.Extension))}|%{
- $filelist+=(,$_.FullName);
- }
- }
- }
- return ,$filelist;
- }
-
- function FileDialog
- {
- $filelist=@();
- $ext=CheckExt;
- if($ext.Count){
- $ofd = New-Object Windows.Forms.OpenFileDialog;
- $ofd.InitialDirectory = pwd;
- $ofd.Filter = '文本文件('+($ext -join ',')+')|'+$textBox1.Text;
- $ofd.ShowHelp = $true;
- $ofd.Multiselect = $true;
- $Show=$ofd.ShowDialog();
- if ($Show -eq 'OK'){
- if ($ofd.Multiselect){$filelist=$ofd.FileNames;}else{$filelist=$ofd.FileName;}
- }
- }
- return ,$filelist;
- }
-
- function ClearExtra($f)
- {
- $dict=New-Object 'System.Collections.Generic.Dictionary[[string],[int]]';
- $texttmp='';
- $text=gc -LiteralPath $f;
- if($radioButton3.Checked)
- {
- for($j=0;$j -lt $text.Count;$j++)
- {
- if($radioButton8.Checked)
- {
- $line=$text[$j].Trim().ToLower();
- }else{
- $line=$text[$j].Trim();
- }
- if($line -ne '')
- {
- if(-not $dict.ContainsKey($line)){
- $texttmp+=$text[$j]+"`r`n";
- $dict.Add($line, 1);
- }
- }else{
- if($radioButton5.Checked){$texttmp+=$text[$j]+"`r`n";}
- }
- }
- }else{
- for($j=0;$j -lt $text.Count;$j++)
- {
- if($radioButton8.Checked)
- {
- $line=$text[$j].Trim().ToLower();
- }else{
- $line=$text[$j].Trim();
- }
- if($line -ne '')
- {
- if(-not $dict.ContainsKey($line)){
- $dict.Add($line, 1);
- }else{
- $dict[$line]++;
- }
- }
- }
- for($j=0;$j -lt $text.Count;$j++)
- {
- if($radioButton8.Checked)
- {
- $line=$text[$j].Trim().ToLower();
- }else{
- $line=$text[$j].Trim();
- }
- if($line -ne '')
- {
- if($dict[$line] -eq 1){$texttmp+=$text[$j]+"`r`n";}
- }else{
- if($radioButton5.Checked){$texttmp+=$text[$j]+"`r`n";}
- }
- }
- }
- $dict=$null;
- return $texttmp;
- }
-
- function OutFile($f, $txt, $enc)
- {
- if($radioButton9.Checked)
- {
- $t=[int64]([DateTime]::UtcNow-(get-date '1970/1/1 0:0:0')).TotalMilliseconds;
- $fpath=Split-Path $f;
- $fname=Split-Path $f -Leaf;
- $newname='$backup'+$t.ToString()+'_'+$fname;
- cp $f ($fpath+'\'+$newname);
- }
- [IO.File]::WriteAllText($f, $txt, $enc);
- }
- Add-Type -Assembly 'System.Windows.Forms';
- $Form = New-Object System.Windows.Forms.Form;
- $Form.Text = '文本去除重复行';
- $Form.TopMost = $true;
- $Form.FormBorderStyle = 'FixedSingle';
- $Form.MaximizeBox = $false;
- $Form.Size = New-Object System.Drawing.Size(658,320);
-
- $radioButton1 = New-Object System.Windows.Forms.RadioButton;
- $radioButton1.AutoSize = $true;
- $radioButton1.AutoSize = $true;
- $radioButton1.Checked = $true;
- $radioButton1.Location = New-Object System.Drawing.Point(10, 22);
- $radioButton1.Name = 'radioButton1';
- $radioButton1.Size = New-Object System.Drawing.Size(95, 16);
- $radioButton1.Text = '指定的文本文件';
- $radioButton1.Add_Click({$textBox3.Text='测试前先备份源文件或文件夹'});
-
- $radioButton2 = New-Object System.Windows.Forms.RadioButton;
- $radioButton2.AutoSize = $true;
- $radioButton2.AutoSize = $true;
- $radioButton2.Location = New-Object System.Drawing.Point(120, 22);
- $radioButton2.Name = 'radioButton2';
- $radioButton2.Size = New-Object System.Drawing.Size(155, 16);
- $radioButton2.Text = '指定文件夹里的文本文件';
- $radioButton2.Add_Click({$textBox3.Text='注意:指定文件夹里的文本文件包括子文件夹里的'});
-
- $groupBox1 = New-Object System.Windows.Forms.GroupBox;
- $groupBox1.Location = New-Object System.Drawing.Point(12, 8);
- $groupBox1.Name = 'groupBox1';
- $groupBox1.Size = New-Object System.Drawing.Size(281, 50);
- $groupBox1.TabStop = $false;
- $groupBox1.Text = '选择处理对象';
- $groupBox1.Controls.Add($radioButton1);
- $groupBox1.Controls.Add($radioButton2);
- $Form.Controls.Add($groupBox1)
-
- $label1 = New-Object System.Windows.Forms.Label;
- $label1.AutoSize = $true;
- $label1.Location = New-Object System.Drawing.Point(8, 25);
- $label1.Name = 'label1';
- $label1.Size = New-Object System.Drawing.Size(35, 12);
- $label1.Text = '类型:';
-
- $textBox1 = New-Object System.Windows.Forms.TextBox;
- $textBox1.Location = New-Object System.Drawing.Point(43, 21);
- $textBox1.Name = 'textBox1';
- $textBox1.Size = New-Object System.Drawing.Size(230, 21);
- $textBox1.Text = '*.txt;*.log';
-
- $groupBox2 = New-Object System.Windows.Forms.GroupBox;
- $groupBox2.Location = New-Object System.Drawing.Point(12, 68);
- $groupBox2.Name = 'groupBox2';
- $groupBox2.Size = New-Object System.Drawing.Size(281, 50);
- $groupBox2.Text = '指定文件扩展名,多个中间用英文分号;分隔';
- $groupBox2.Controls.Add($label1);
- $groupBox2.Controls.Add($textBox1);
- $Form.Controls.Add($groupBox2)
-
- $radioButton3 = New-Object System.Windows.Forms.RadioButton;
- $radioButton3.AutoSize = $true;
- $radioButton3.Checked = $true;
- $radioButton3.Location = New-Object System.Drawing.Point(12, 20);
- $radioButton3.Name = 'radioButton3';
- $radioButton3.Size = New-Object System.Drawing.Size(119, 16);
- $radioButton3.Text = '重复行只保留一行';
-
- $radioButton4 = New-Object System.Windows.Forms.RadioButton;
- $radioButton4.AutoSize = $true;
- $radioButton4.Location = New-Object System.Drawing.Point(151, 20);
- $radioButton4.Name = 'radioButton4';
- $radioButton4.Size = New-Object System.Drawing.Size(119, 16);
- $radioButton4.Text = '重复行全清不保留';
-
- $groupBox3 = New-Object System.Windows.Forms.GroupBox;
- $groupBox3.Location = New-Object System.Drawing.Point(12, 129);
- $groupBox3.Name = 'groupBox3';
- $groupBox3.Size = New-Object System.Drawing.Size(281, 50);
- $groupBox3.Text = '选择处理方式';
- $groupBox3.Controls.Add($radioButton3);
- $groupBox3.Controls.Add($radioButton4);
- $Form.Controls.Add($groupBox3)
-
-
- $radioButton5 = New-Object System.Windows.Forms.RadioButton;
- $radioButton5.AutoSize = $true;
- $radioButton5.Checked = $true;
- $radioButton5.Location = New-Object System.Drawing.Point(6, 21);
- $radioButton5.Name = 'radioButton5';
- $radioButton5.Size = New-Object System.Drawing.Size(35, 16);
- $radioButton5.TabStop = $true;
- $radioButton5.Text = '是';
-
-
- $radioButton6 = New-Object System.Windows.Forms.RadioButton;
- $radioButton6.AutoSize = $true;
- $radioButton6.Location = New-Object System.Drawing.Point(48, 21);
- $radioButton6.Name = 'radioButton6';
- $radioButton6.Size = New-Object System.Drawing.Size(35, 16);
- $radioButton6.Text = '否';
-
- $groupBox4 = New-Object System.Windows.Forms.GroupBox;
- $groupBox4.Location = New-Object System.Drawing.Point(12, 192);
- $groupBox4.Name = 'groupBox4';
- $groupBox4.Size = New-Object System.Drawing.Size(89, 50);
- $groupBox4.Text = '保留空行';
- $groupBox4.Controls.Add($radioButton5);
- $groupBox4.Controls.Add($radioButton6);
- $Form.Controls.Add($groupBox4);
-
-
- $radioButton7 = New-Object System.Windows.Forms.RadioButton;
- $radioButton7.AutoSize = $true;
- $radioButton7.Location = New-Object System.Drawing.Point(6, 21);
- $radioButton7.Name = "radioButton7";
- $radioButton7.Size = New-Object System.Drawing.Size(35, 16);
- $radioButton7.Text = "是";
-
- $radioButton8 = New-Object System.Windows.Forms.RadioButton;
- $radioButton8.AutoSize = $true;
- $radioButton8.Checked = $true;
- $radioButton8.Location = New-Object System.Drawing.Point(51, 21);
- $radioButton8.Name = "radioButton8";
- $radioButton8.Size = New-Object System.Drawing.Size(35, 16);
- $radioButton8.Text = "否";
-
- $groupBox5 = New-Object System.Windows.Forms.GroupBox;
- $groupBox5.Location = New-Object System.Drawing.Point(110, 192);
- $groupBox5.Name = "groupBox5";
- $groupBox5.Size = New-Object System.Drawing.Size(89, 50);
- $groupBox5.Text = "区分大小写";
- $groupBox5.Controls.Add($radioButton7);
- $groupBox5.Controls.Add($radioButton8);
- $Form.Controls.Add($groupBox5);
-
- $radioButton9 = New-Object System.Windows.Forms.RadioButton;
- $radioButton9.AutoSize = $true;
- $radioButton9.Checked = $true;
- $radioButton9.Location = New-Object System.Drawing.Point(5, 21);
- $radioButton9.Name = 'radioButton9';
- $radioButton9.Size = New-Object System.Drawing.Size(35, 16);
- $radioButton9.Text = '是';
-
- $radioButton10 = New-Object System.Windows.Forms.RadioButton;
- $radioButton10.AutoSize = $true;
- $radioButton10.Location = New-Object System.Drawing.Point(47, 21);
- $radioButton10.Name = 'radioButton10';
- $radioButton10.Size = New-Object System.Drawing.Size(35, 16);
- $radioButton10.Text = '否';
-
- $groupBox6 = New-Object System.Windows.Forms.GroupBox;
- $groupBox6.Location = New-Object System.Drawing.Point(208, 192);
- $groupBox6.Name = 'groupBox6';
- $groupBox6.Size = New-Object System.Drawing.Size(85, 50);
- $groupBox6.Text = '备份源文件';
- $groupBox6.Controls.Add($radioButton9);
- $groupBox6.Controls.Add($radioButton10);
- $Form.Controls.Add($groupBox6);
-
- $label2 = New-Object System.Windows.Forms.Label;
- $label2.AutoSize = $true;
- $label2.Location = New-Object System.Drawing.Point(10, 258);
- $label2.Name = 'label2';
- $label2.Size = New-Object System.Drawing.Size(61, 12);
- $label2.Text = '输出编码:';
- $Form.Controls.Add($label2);
-
- $comboBox1 = New-Object System.Windows.Forms.ComboBox;
- $comboBox1.FormattingEnabled = $true;
- $comboBox1.Items.AddRange(@('Default(ANSI)','UTF-8','Unicode(LittleEndian)','Unicode(BigEndian)'));
- $comboBox1.Location = New-Object System.Drawing.Point(71, 255);
- $comboBox1.Name = 'comboBox1';
- $comboBox1.Size = New-Object System.Drawing.Size(150, 20);
- $comboBox1.Text = 'Default(ANSI)';
- $Form.Controls.Add($comboBox1);
-
- $button1 = New-Object System.Windows.Forms.Button;
- $button1.Location = New-Object System.Drawing.Point(226, 252);
- $button1.Name = 'button1';
- $button1.Size = New-Object System.Drawing.Size(67, 25);
- $button1.Text = '执行';
- $button1.Add_Click($EditFile);
- $Form.Controls.Add($button1);
-
- $button2 = New-Object System.Windows.Forms.Button;
- $button2.Location = New-Object System.Drawing.Point(300, 8);
- $button2.Name = 'button2';
- $button2.Size = New-Object System.Drawing.Size(182, 28);
- $button2.Text = '选择文件或文件夹';
- $button2.Add_Click($ListFile);
- $Form.Controls.Add($button2);
-
- $label3 = New-Object System.Windows.Forms.Label;
- $label3.AutoSize = $true;
- $label3.Location = New-Object System.Drawing.Point(532, 17);
- $label3.Name = 'label3';
- $label3.Size = New-Object System.Drawing.Size(59, 12);
- $label3.Text = '文件数量:';
- $Form.Controls.Add($label3);
-
- $textBox2 = New-Object System.Windows.Forms.TextBox;
- $textBox2.Location = New-Object System.Drawing.Point(594, 13);
- $textBox2.Name = 'textBox2';
- $textBox2.Size = New-Object System.Drawing.Size(46, 21);
- $textBox2.Text = '0';
- $Form.Controls.Add($textBox2);
-
- $textBox3 = New-Object System.Windows.Forms.TextBox;
- $textBox3.Text='测试前先备份源文件或文件夹';
- $textBox3.Enabled = $false;
- $textBox3.Location = New-Object System.Drawing.Point(300, 40);
- $textBox3.Multiline = $true;
- $textBox3.Name = 'textBox3';
- $textBox3.ScrollBars = 'Both';
- $textBox3.Size = New-Object System.Drawing.Size(340, 234);
- $textBox3.WordWrap = $false;
- $Form.Controls.Add($textBox3);
-
- [void]$Form.ShowDialog();
- $Form.Dispose();
复制代码
作者: Speed乄Jay 时间: 2018-1-9 10:36
回复 8# pcl_test
牛逼 了。。。。
欢迎光临 批处理之家 (http://www.bathome.net/) |
Powered by Discuz! 7.2 |