如何用python中的文件中的特定键字母提取某些单词

发布于 2025-01-25 16:57:04 字数 759 浏览 0 评论 0原文

抱歉,我是Python的新手,从未接受过很多培训。

我想问一下如何在文件中提取一些键字母'

m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001

。通过过滤“ _KCAT:10”,这些内部在一个大'.modelspec'文件中 并能够以M_BSORX_KCAT的形式获得它们:10,M_Enterh_kcat:10,M_trptrs_kcat:10,M_EX_REMNANT1_E_E_KCAT:10,M_SCYSSL_KCAT:10,M_RNMK_KCAT _kcat:10,m_glusy_kcat:10, m_vpamtr_copy2_kcat:10

我的最终目标是能够随机重新调整值的10%的值(-1,1)来执行遗传算法,

很多帮助。

Sorry, im fairly new to python, never been trained much.

I want to ask how do I extract words with certain key letters inside of a file './models/asm/Draft_km.modelspec' in python for example (these lines can be found inside of the .modelspec file):

m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001

I want to extract these inside a large '.modelspec' file by filtering "_kcat : 10"
and be able to obtain them as m_BSORx_kcat : 10, m_ENTERH_kcat : 10, m_TRPTRS_kcat : 10, m_EX_remnant1_e_kcat : 10, m_SCYSSL_kcat : 10, m_RNMK_kcat : 10, m_TAGtex_kcat : 10, m_URIDK2r_kcat : 10, m_TRPt2rpp_kcat : 10, m_GLUSy_kcat : 10, m_VPAMTr_copy2_kcat : 10

My end goal is to be able to randomly reassign 10% of the value (-1,1) to do a genetic algorithm

Much help is appreciated

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

Oo萌小芽oO 2025-02-01 16:57:04

由于您似乎打算修改数据,因此首先将行分为列表,然后单独处理每行可能很有用。

with open("./models/asm/Draft_km.modelspec") as f:
    # read lines, skipping empty lines and remove trailing whitespace
    lines = [line.rstrip() for line in f if line.strip()]

如果您需要做的就是检查一个子字符串,则可以这样检查每行:

for line in lines:
    if "_kcat : 10" in line:
        print(line) # or do whatever you want

如果您需要匹配更复杂的模式,那么正则表达式,如Tim Biegeleisen的答案,如蒂姆·比格利森(Tim Biegeleisen)的答案是必经之路。

Since you seem to be planning to modify the data, it might be useful to first split the lines into a list and then process each line individually.

with open("./models/asm/Draft_km.modelspec") as f:
    # read lines, skipping empty lines and remove trailing whitespace
    lines = [line.rstrip() for line in f if line.strip()]

If all you need to do is check for a substring, you can check each line like so:

for line in lines:
    if "_kcat : 10" in line:
        print(line) # or do whatever you want

If you need to match more complex patterns, regular expressions as in Tim Biegeleisen's answer are the way to go.

落墨 2025-02-01 16:57:04

使用re.findall我们可以尝试:

# use this to read all lines into a string
with open('./models/asm/Draft_km.modelspec', 'r') as file:
    inp = file.read()

# otherwise we can hard code the data you showed in your question here
inp = """m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001"""

matches = re.findall(r'\b\w+_kcat : \d+(?:\.\d+)?', inp)
output = ', '.join(matches)
print(output)

此打印:

m_bsorx_kcat:10,m_enterh_kcat:10,m_trptrs_kcat:10,m_ex_remnant1_e__kcat:10,m_scyssl_kcat:10 m_glusy_kcat:10,m_vpamtr_copy2_kcat:10

Using re.findall we can try:

# use this to read all lines into a string
with open('./models/asm/Draft_km.modelspec', 'r') as file:
    inp = file.read()

# otherwise we can hard code the data you showed in your question here
inp = """m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001"""

matches = re.findall(r'\b\w+_kcat : \d+(?:\.\d+)?', inp)
output = ', '.join(matches)
print(output)

This prints:

m_BSORx_kcat : 10, m_ENTERH_kcat : 10, m_TRPTRS_kcat : 10, m_EX_remnant1_e_kcat : 10, m_SCYSSL_kcat : 10, m_RNMK_kcat : 10, m_TAGtex_kcat : 10, m_URIDK2r_kcat : 10, m_TRPt2rpp_kcat : 10, m_GLUSy_kcat : 10, m_VPAMTr_copy2_kcat : 10

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文