如何用python中的文件中的特定键字母提取某些单词

发布于 2025-01-25 16:57:04 字数 759 浏览 0 评论 0原文

抱歉，我是Python的新手，从未接受过很多培训。

我想问一下如何在文件中提取一些键字母'

m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001

。通过过滤“ _KCAT：10”，这些内部在一个大'.modelspec'文件中并能够以M_BSORX_KCAT的形式获得它们：10，M_Enterh_kcat：10，M_trptrs_kcat：10，M_EX_REMNANT1_E_E_KCAT：10，M_SCYSSL_KCAT：10，M_RNMK_KCAT _kcat：10，m_glusy_kcat：10， m_vpamtr_copy2_kcat：10

我的最终目标是能够随机重新调整值的10％的值（-1,1）来执行遗传算法，

很多帮助。

原文

Sorry, im fairly new to python, never been trained much.

I want to ask how do I extract words with certain key letters inside of a file './models/asm/Draft_km.modelspec' in python for example (these lines can be found inside of the .modelspec file):

m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001

I want to extract these inside a large '.modelspec' file by filtering "_kcat : 10"
and be able to obtain them as m_BSORx_kcat : 10, m_ENTERH_kcat : 10, m_TRPTRS_kcat : 10, m_EX_remnant1_e_kcat : 10, m_SCYSSL_kcat : 10, m_RNMK_kcat : 10, m_TAGtex_kcat : 10, m_URIDK2r_kcat : 10, m_TRPt2rpp_kcat : 10, m_GLUSy_kcat : 10, m_VPAMTr_copy2_kcat : 10

My end goal is to be able to randomly reassign 10% of the value (-1,1) to do a genetic algorithm

Much help is appreciated

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

Oo萌小芽oO 2025-02-01 16:57:04

由于您似乎打算修改数据，因此首先将行分为列表，然后单独处理每行可能很有用。

with open("./models/asm/Draft_km.modelspec") as f:
    # read lines, skipping empty lines and remove trailing whitespace
    lines = [line.rstrip() for line in f if line.strip()]

如果您需要做的就是检查一个子字符串，则可以这样检查每行：

for line in lines:
    if "_kcat : 10" in line:
        print(line) # or do whatever you want

如果您需要匹配更复杂的模式，那么正则表达式，如Tim Biegeleisen的答案，如蒂姆·比格利森（Tim Biegeleisen）的答案是必经之路。

Since you seem to be planning to modify the data, it might be useful to first split the lines into a list and then process each line individually.

with open("./models/asm/Draft_km.modelspec") as f:
    # read lines, skipping empty lines and remove trailing whitespace
    lines = [line.rstrip() for line in f if line.strip()]

If all you need to do is check for a substring, you can check each line like so:

for line in lines:
    if "_kcat : 10" in line:
        print(line) # or do whatever you want

If you need to match more complex patterns, regular expressions as in Tim Biegeleisen's answer are the way to go.

回复收藏 0 原文

落墨 2025-02-01 16:57:04

使用re.findall我们可以尝试：

# use this to read all lines into a string
with open('./models/asm/Draft_km.modelspec', 'r') as file:
    inp = file.read()

# otherwise we can hard code the data you showed in your question here
inp = """m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001"""

matches = re.findall(r'\b\w+_kcat : \d+(?:\.\d+)?', inp)
output = ', '.join(matches)
print(output)

此打印：

m_bsorx_kcat：10，m_enterh_kcat：10，m_trptrs_kcat：10，m_ex_remnant1_e__kcat：10，m_scyssl_kcat：10 m_glusy_kcat：10，m_vpamtr_copy2_kcat：10

Using re.findall we can try:

# use this to read all lines into a string
with open('./models/asm/Draft_km.modelspec', 'r') as file:
    inp = file.read()

# otherwise we can hard code the data you showed in your question here
inp = """m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001"""

matches = re.findall(r'\b\w+_kcat : \d+(?:\.\d+)?', inp)
output = ', '.join(matches)
print(output)

This prints:

m_BSORx_kcat : 10, m_ENTERH_kcat : 10, m_TRPTRS_kcat : 10, m_EX_remnant1_e_kcat : 10, m_SCYSSL_kcat : 10, m_RNMK_kcat : 10, m_TAGtex_kcat : 10, m_URIDK2r_kcat : 10, m_TRPt2rpp_kcat : 10, m_GLUSy_kcat : 10, m_VPAMTr_copy2_kcat : 10

回复收藏 0 原文

~没有更多了~