标题: [文本处理] 请教各路大神:如何从TXT文件中抽取对应数据至excel表格呢??? [打印本页]
作者: qiaodong 时间: 2021-5-31 14:26 标题: 请教各路大神:如何从TXT文件中抽取对应数据至excel表格呢???
本帖最后由 qiaodong 于 2021-5-31 14:34 编辑
我有一大堆TXT文件-命名方式为“ 姓名+手机号”,我想在Excel文件中的 “姓名、性别、身份证号码、住址、手机号码” 所在列下 抽取相应的TXT文件数据(个别文件的文件名只有姓名并没有手机号,此项无需理会)如果都是Excel文件的话我还可以操作,但这是TXT文件顿感无措,望各路大神予以指导 , 谢谢!
作者: wxyz0001 时间: 2021-6-1 08:34
本帖最后由 wxyz0001 于 2021-6-4 15:23 编辑
▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
用Python写的,欢迎高手优化代码
▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃- import os
- import re
- import csv
-
- # 获取所有txt文件
- def get_file_list(file_list):
- files = []
- for i in range(len(file_list)):
- if not file_list[i].find('.txt') == -1:
- files.append(file_list[i])
- else:
- continue
- return files
-
- # 获取所有txt文件的内容
- def get_data(data_list):
- data = []
- for file in data_list:
- with open(file_path + '/' + file, 'r', encoding='gbk') as f:
- lines = f.readlines()
- for line in lines:
- if (re.search('\d+', file) != None) and (line.find(':') != -1):
- key = line.split(':')[0]
- val = line.split(':')[1].strip('\n')
- data.append(val)
- if (line.find(val) != -1) and (key.find('公民身份号码') != -1):
- tel = re.search('\d+', file).group(0)
- data.append(tel)
- elif line.find(':') != -1:
- key = line.split(':')[0]
- val = line.split(':')[1].strip('\n')
- data.append(val)
- if (re.search('\d+', file) == None) and (key.find('公民身份号码') != -1):
- tel = 'None'
- data.append(tel)
- return data
-
- # 格式化为csv格式
- def get_cvs_list(txt_cvs):
- data_row = []
- for i in range(0, len(txt_cvs), 7):
- name = txt_cvs[i]
- sex = txt_cvs[i + 1]
- clan = txt_cvs[i + 2]
- birth = txt_cvs[i + 3]
- address = txt_cvs[i + 4]
- number = txt_cvs[i + 5]
- tel = txt_cvs[i + 6]
- data_col = (name, sex, clan, birth, address, number, tel)
- data_row.append(data_col)
- return data_row
-
- if __name__ == '__main__':
- file_path = 'E:/news/Python/txt_cvs' # txt文件目录
- filename = file_path + '/' + 'data.csv' # 保存CSV文件路径
- dir_list = os.listdir(file_path)
- txt_list = get_file_list(dir_list)
- txt_rows = get_data(txt_list)
- lines = get_cvs_list(txt_rows)
- with open(filename, 'a+', encoding='gbk', newline='') as f: # 如果txt的文本格式为'utf-8',把'gbk'改为'utf-8'
- k = csv.writer(f, dialect='excel')
- fieldnames = ('姓名', '性别', '民族', '出生日期', '住址', '公民身份号码', '手机号码')
- with open(filename, 'r', encoding='gbk', newline='') as ff: # 如果txt的文本格式为'utf-8',把'gbk'改为'utf-8'
- reader = csv.reader(ff)
- seed = set(tuple(row) for row in reader) # 去除重复的行
- for line in lines:
- if fieldnames not in seed:
- k.writerow(fieldnames)
- seed.add(fieldnames)
- if line in seed:
- continue
- k.writerow(line)
- seed.add(line)
复制代码
欢迎光临 批处理之家 (http://www.bathome.net/) |
Powered by Discuz! 7.2 |