基于特定匹配项填充数据框中的新额外列

发布于 2025-02-08 15:53:34 字数 2090 浏览 1 评论 0 原文

我正在尝试使用 BeautifulSoup 从Outlook应用程序读取HTML表。该表包含两个主列: tricker Price 。现在,我试图将第三列添加到现有数据框架中。

我能够添加它很艰难,并且可以正常工作,直到电子邮件有完整的股票列表(共有7个)。如果有时我们没有收到股票的完整列表,例如,从7个收到3个股票的价格,则在第3列中,我需要 pkeys 对这3个诉讼。

怎么可能?

我们有以下代码:

import pandas as pd
import win32com.client
from sqlalchemy.engine import create_engine
import re
from datetime import datetime, timedelta
import requests
import sys
from bs4 import BeautifulSoup
from pprint import pprint


EMAIL_ACCOUNT = 'robinhood.gmail.com'
EMAIL_SUBJ_SEARCH_STRING = 'Morgan Stanley Systematic Strategies Daily Levels'


out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")


root_folder = out_namespace.GetDefaultFolder(6)

out_iter_folder = root_folder.Folders['Email_Snapper']

item_count = out_iter_folder.Items.Count

Flag = False
cnt = 1
if item_count > 0:
    for i in range(item_count, 0, -1):
        message = out_iter_folder.Items[i]
        if EMAIL_SUBJ_SEARCH_STRING in message.Subject and cnt <=1:
            cnt=cnt+1
            Body_content = message.HTMLBody
            Body_content = BeautifulSoup(Body_content,"lxml")
            html_tables = Body_content.find_all('table')[0]
            #Body_content = Body_content[:Body_content.find("Disclaimer")].strip()
            df = pd.read_html(str(html_tables),header=0)[0]
            Pkey = [71763307, 76366654, 137292386, 151971418, 151971419, 152547427, 152547246]
            df['Pkey'] = Pkey
            
            print(df) 

输出:输出看起来还不错,直到我们从银行获得了全面的股票列表

“ nofollow noreferrer”> “在

”在此处输入图像描述

我收到的错误消息是:

ValueError : Length of values does not match length of index*

I am trying to read HTML table from outlook application using beautifulsoup. The table contains two main columns: Ticker and price. Now I am trying to add a third column named as Pkey to the existing dataframe.

I am able to add it tough and it works fine till the email has a full list of tickers (7 in total). In case sometimes we don't receive a full list of tickers, say from 7 we receive prices for only 3 tickers, then in column 3, I need Pkeys against those 3 tickers.

How is that possible?

We have the following code:

import pandas as pd
import win32com.client
from sqlalchemy.engine import create_engine
import re
from datetime import datetime, timedelta
import requests
import sys
from bs4 import BeautifulSoup
from pprint import pprint


EMAIL_ACCOUNT = 'robinhood.gmail.com'
EMAIL_SUBJ_SEARCH_STRING = 'Morgan Stanley Systematic Strategies Daily Levels'


out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")


root_folder = out_namespace.GetDefaultFolder(6)

out_iter_folder = root_folder.Folders['Email_Snapper']

item_count = out_iter_folder.Items.Count

Flag = False
cnt = 1
if item_count > 0:
    for i in range(item_count, 0, -1):
        message = out_iter_folder.Items[i]
        if EMAIL_SUBJ_SEARCH_STRING in message.Subject and cnt <=1:
            cnt=cnt+1
            Body_content = message.HTMLBody
            Body_content = BeautifulSoup(Body_content,"lxml")
            html_tables = Body_content.find_all('table')[0]
            #Body_content = Body_content[:Body_content.find("Disclaimer")].strip()
            df = pd.read_html(str(html_tables),header=0)[0]
            Pkey = [71763307, 76366654, 137292386, 151971418, 151971419, 152547427, 152547246]
            df['Pkey'] = Pkey
            
            print(df) 

Output: output looks ok until we get a full list of tickers from the bank

enter image description here

But sometimes we only get prices for handful of tickers rather than a full list like below. In that case it is giving error

enter image description here

The error message I get is:

ValueError : Length of values does not match length of index*

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

记忆之渊 2025-02-15 15:53:34

尝试使用PD.Series([[755454,556554,2545454,54644,878798]))

Try using pd.series([755454,556554,2545454,54644,878798])

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文