多个列表放在一个文件中,读取时怎么获取到并合并成一个列表

解决从文件中读取多个列表并合并为单一列表的问题,通过使用正则表达式提取URL。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.今天抓取数据是碰到的问题,读取的数据是多个列表的形式存放在一个文件中,想要读取出来并且合并成一个列表,方便后面的遍历使用。

先来看一下我要读取的数据格式:

由N个列表组成的一个文件数据,我先是直接用read读取然后看一下读取到的内容是什么并且看一下获取到的数据类型是什么:

 

with open("./data/category_info/category_info.txt","r",encoding="utf-8") as f:
    list1 = f.read()
print(list1)
print(type(list1))

打印出来的结果如下:

..........
[
  "https://shop111676896.taobao.com",
  "https://shop208960730.taobao.com",
  "https://shop58053041.taobao.com",
  "https://shop325469838.taobao.com",
  "https://shop100874380.taobao.com",
  "https://shop114260282.taobao.com",
  "https://shop237827344.taobao.com",
  "https://shop356054501.taobao.com",
  "https://shop103565720.taobao.com",
  "https://shop544563404.taobao.com",
  "https://shop33279777.taobao.com",
  "https://shop61924147.taobao.com",
  "https://shop114426830.taobao.com",
  "https://shop73216230.taobao.com",
  "https://shop73117760.taobao.com",
  "https://shop103944295.taobao.com",
  "https://shop106643107.taobao.com",
  "https://shop159725090.taobao.com",
  "https://shop540286265.taobao.com",
  "https://shop548965527.taobao.com"
][
  "https://shop329927844.taobao.com",
  "https://shop130237536.taobao.com",
  "https://shop109321639.taobao.com",
  "https://shop112297494.taobao.com",
  "https://shop60995523.taobao.com",
  "https://shop100979410.taobao.com",
  "https://shop155515714.taobao.com",
  "https://shop111458770.taobao.com",
  "https://shop62806179.taobao.com",
  "https://shop126622874.taobao.com",
  "https://shop149039686.taobao.com",
  "https://shop35637701.taobao.com",
  "https://shop163580339.taobao.com",
  "https://shop35981166.taobao.com",
  "https://shop124412834.taobao.com",
  "https://shop114735097.taobao.com",
  "https://shop108812765.taobao.com",
  "https://shop130240325.taobao.com",
  "https://shop73264454.taobao.com",
  "https://shop67453584.taobao.com"
]
<class 'str'>

看到数据类型是str,那就很好办了,直接用replace将"]["替换成“,”不就行了,开始第一次尝试

with open("./data/shop_info/shop_url.txt","r",encoding="utf-8") as f:
    list1 = f.read()

list1.replace("][",",")
print(list1)

>>>打印结果如下
.........
  "https://shop67453584.taobao.com"
][
  "https://shop111676896.taobao.com",
  "https://shop208960730.taobao.com",
  "https://shop58053041.taobao.com",
  "https://shop325469838.taobao.com",
  "https://shop100874380.taobao.com",
  "https://shop114260282.taobao.com",
  "https://shop237827344.taobao.com",
  "https://shop356054501.taobao.com",
  "https://shop103565720.taobao.com",
  "https://shop544563404.taobao.com",
  "https://shop33279777.taobao.com",
  "https://shop61924147.taobao.com",
  "https://shop114426830.taobao.com",
  "https://shop73216230.taobao.com",
  "https://shop73117760.taobao.com",
  "https://shop103944295.taobao.com",
  "https://shop106643107.taobao.com",
  "https://shop159725090.taobao.com",
  "https://shop540286265.taobao.com",
  "https://shop548965527.taobao.com"
][
  "https://shop329927844.taobao.com",
  "https://shop130237536.taobao.com",
  "https://shop109321639.taobao.com",
  "https://shop112297494.taobao.com",
  "https://shop60995523.taobao.com",
  "https://shop100979410.taobao.com",
  "https://shop155515714.taobao.com",
  "https://shop111458770.taobao.com",
  "https://shop62806179.taobao.com",
  "https://shop126622874.taobao.com",
  "https://shop149039686.taobao.com",
  "https://shop35637701.taobao.com",
  "https://shop163580339.taobao.com",
  "https://shop35981166.taobao.com",
  "https://shop124412834.taobao.com",
  "https://shop114735097.taobao.com",
  "https://shop108812765.taobao.com",
  "https://shop130240325.taobao.com",
  "https://shop73264454.taobao.com",
  "https://shop67453584.taobao.com"
]

发现还是存在括号,紧接着使用split切割还是一样,最后转变思路,想了一下我只需要双引号中的url数据,直接使用正则表达式匹配不就好了,re正则表达式实现代码如下:

import re

with open("./data/shop_info/shop_url.txt","r",encoding="utf-8") as f:
    list1 = f.read()

shop_list = re.findall(r'"(.*?)"',list1)
print(shop_list)


>>>>>打印结果如下:
..........
'https://shop115626267.taobao.com', 'https://shop101808441.taobao.com', 'https://shop100674644.taobao.com', 'https://shop71949547.taobao.com', 'https://shop63540798.taobao.com', 'https://shop211037737.taobao.com', 'https://shop66171793.taobao.com', 'https://shop35999051.taobao.com', 'https://shop156395003.taobao.com', 'https://shop324472272.taobao.com', 'https://shop583778897.taobao.com', 'https://shop317880714.taobao.com', 'https://shop109103413.taobao.com', 'https://shop113207130.taobao.com', 'https://shop114201820.taobao.com', 'https://shop162134567.taobao.com', 'https://shop109819808.taobao.com', 'https://shop110423139.taobao.com', 'https://shop109672786.taobao.com', 'https://shop111373248.taobao.com', 'https://shop115626267.taobao.com', 'https://shop101808441.taobao.com', 'https://shop100674644.taobao.com', 'https://shop71949547.taobao.com', 'https://shop63540798.taobao.com', 'https://shop211037737.taobao.com', 'https://shop66171793.taobao.com', 'https://shop35999051.taobao.com', 'https://shop156395003.taobao.com', 'https://shop324472272.taobao.com', 'https://shop329927844.taobao.com', 'https://shop130237536.taobao.com', 'https://shop109321639.taobao.com', 'https://shop112297494.taobao.com', 'https://shop60995523.taobao.com', 'https://shop100979410.taobao.com', 'https://shop155515714.taobao.com', 'https://shop111458770.taobao.com', 'https://shop62806179.taobao.com', 'https://shop126622874.taobao.com', 'https://shop149039686.taobao.com', 'https://shop35637701.taobao.com', 'https://shop163580339.taobao.com', 'https://shop35981166.taobao.com', 'https://shop124412834.taobao.com', 'https://shop114735097.taobao.com', 'https://shop108812765.taobao.com', 'https://shop130240325.taobao.com', 'https://shop73264454.taobao.com', 'https://shop67453584.taobao.com', 'https://shop111676896.taobao.com', 'https://shop208960730.taobao.com', 'https://shop58053041.taobao.com', 'https://shop325469838.taobao.com', 'https://shop100874380.taobao.com', 'https://shop114260282.taobao.com', 'https://shop237827344.taobao.com', 'https://shop356054501.taobao.com', 'https://shop103565720.taobao.com', 'https://shop544563404.taobao.com', 'https://shop33279777.taobao.com', 'https://shop61924147.taobao.com', 'https://shop114426830.taobao.com', 'https://shop73216230.taobao.com', 'https://shop73117760.taobao.com', 'https://shop103944295.taobao.com', 'https://shop106643107.taobao.com', 'https://shop159725090.taobao.com', 'https://shop540286265.taobao.com', 'https://shop548965527.taobao.com', 'https://shop329927844.taobao.com', 'https://shop130237536.taobao.com', 'https://shop109321639.taobao.com', 'https://shop112297494.taobao.com', 'https://shop60995523.taobao.com', 'https://shop100979410.taobao.com', 'https://shop155515714.taobao.com', 'https://shop111458770.taobao.com', 'https://shop62806179.taobao.com', 'https://shop126622874.taobao.com', 'https://shop149039686.taobao.com', 'https://shop35637701.taobao.com', 'https://shop163580339.taobao.com', 'https://shop35981166.taobao.com', 'https://shop124412834.taobao.com', 'https://shop114735097.taobao.com', 'https://shop108812765.taobao.com', 'https://shop130240325.taobao.com', 'https://shop73264454.taobao.com', 'https://shop67453584.taobao.com']

能力有限这是能想到的最简单的方法,有更好的方法欢迎在评论区留言哦

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

不喜欢穿格子衫的程序员

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值